List of potential ML startups 2022

5 minute read

If you love innovation and want to tap into this new era, below is the list of potential ML startups that can help fuel your passion.

Photo by Mika Baumeister, on Unsplash

Following up on the 2 previous posts:

As they say: “Whenever there is a challengethere is also an opportunity”. Similar to Cloud or Big Data era, today as many companies trying to adopt ML, they realize there are many challenges in this domain. That fuels the birth of a new field: MLOps (Machine Learning Operations), which requires people to re-think and build a new set of Infrastructure and products to adopt ML into their organization. If you love innovation and want to tap into this new era, below is the list of potential ML startups that can help fuel your passion.

Data/ ML Infrastructure

  • Fennel AI: early-stage startup, looking for a founding engineer.
    • Well-funded, investors include people like CEOs of multiple unicorn companies, ex-CTO of Facebook, creator of Kafka, and many VP-level executives from Facebook and other companies.
    • Leadership: CEO was an ex-FB senior manager, who managed a team of 100+ ML engineers, and was responsible for a large part of FB’s recommendation efforts. Co-founder was ex-FB, founding engineer of Inference.io.
    • Early-stage - with only 8 people (all ex-FB/Google Brain folks with deep domain expertise).
    • Working on some of the hardest problems in ML Infrastructure - real time features, low latency serving, large-scale distributed systems, automated cluster management, high-quality Python APIs, etc.
  • BlueSky Data: early-stage startup, looking for a founding engineer.
    • Product: build next-gen data infra, focus on making querying/analytics faster and cheaper over data clouds.
    • Leadership: CEO was head of engineering for Google’s ML runtime (e.g. TensorFlow). CTO was a Distinguished Engineer at Uber, responsible for its Big Data architecture and cost reduction efforts.
    • Market demand and direction: Warehouses like Snowflake, Bigquery and Redshift were designed in an era of batch analytics serving analysts. Real-time analytics databases like Rockset and ClickHouse are designed for real-time analytics serving developers. However, there is still a lot of misuse and mismanagement in data warehouse for real-time analytics. Bluesky Data is focusing on solving this problem.
  • Tecton AI: Feature platform for ML
    • raised $100M in series C today, a total of $160M so far, backed by a16z and Sequoia Capital.
    • Leadership: The CEO and CTO were the creators of Uber Michelangelo.
    • Market demand and challenges: I wrote a piece about the current state of Feature Store Service in MLOps a few weeks ago. Feel free to take a look if you are interested.
    • Competing products: FB’s F3, Uber’s Michelangelo, Airbnb’s Zipline, Apple’s Overton. Out of all the emerging feature stores listed at featurestore.org, and the ones built internally at FAANG, I still believe Tecton AI is the best so far.
    • Fun fact: A friend of mine is working here, an extremely talented guy.
  • Anyscale: Serverless autoscaling for ML Infra
    • raised $100M in series C recently, a total of $160M so far, backed by a16z.
    • Product: Serverless autoscaling for ML Infra, was an open-source project from the UC Berkeley RISELab. Aiming to become the next Databricks for ML Infra.
    • Market demand: Allocating resources in ML is extremely hard, many companies even Big Tech like FB still allocate resources manually, so they seem to go in the right direction. I took a look at their open-source code and seems like their Ray Autoscaler algorithm seems to be better than k8s autoscaler. Especially, when it uses the Placement groups to maximize data locality.
    • Client: Uber, Microsoft, Amazon, Two Sigma, Alibaba, and ML startups like Predibase.
    • Fun fact: Chief Architect was principal engineer at FB, Uber, and was Hieu Pham’s mentor.
  • Exafunction: aims to solve ML resource utilization issues and reduce AI dev cost by abstracting hardware config, especially GPU clusters. More info
    • raised $28M in series A.
    • Leadership: The CEO was Tech Lead at Nuro AI.
    • Fun fact: Linh Nguyen knows their founder, so just ping him if you are interested.
  • Predibase: declarative ML, low code, aiming to make ML configuration more declarative. More info.
    • series A, 16.25M, alternatives of AutoML.
    • Team: creators of Ludwig, Horovod, and founder of Lattice.io (which was later acquired by Apple)

ML Product

  • Glean:
    • Product: Personalized work assistant, powered by search and recommendation system.
    • raised $100M in series C led by Sequoia Capital at a $1B valuation in less than 2 years.
    • Clients: Confluent, Databricks, Okta, Samsara, Rubrik, Wealthsimple.
    • Leadership: CEO Arvind was distinguished eng at Google, and Cofounder at Rubrik. Another Co-founder was principal eng at Meta and Microsoft.
    • All-star team: many employees were staff engs from FB, Google, Quora and many were from FAIR and Google Brain team.
  • Descript: Amazing product with real-world NLP/ Deep Learning applications: https://youtu.be/Bl9wqNe5J8U
    • In 2020, grew revenue by 6x, doubled the team to 40 people, and raised Series B last year. Backed by a16z.
    • applying cutting-edge ML/Deep Learning (e.g., text to speech in your own voice).
    • Research team: Lyrebird AI team members were mostly Ph.D. students under Yoshua Bengio, who won the Turing Prize in 2019 for his pioneering research into deep learning and neural networks.

ML monitoring

  • Gantry ML: early-stage startup, built ML monitoring platform in production.
    • All-star team: most team members were Google Brain, Open AI experts.
    • Fun fact: Chau Vu is working here, so just ping her if you want to learn more.
  • Truera: AI quality management; co-founder is a professor at CMU; customers: US Air Force
    • raised $25M in series B, a total of $42M so far.
    • The competitor is Gantry ML.
    • Fun fact: Another friend is an Engineering Manager here. But I will let him come out whenever he is ready.

Experimentation

  • Statsig: Experimentation platform, mostly used for A/B testing. Product is inspired by Deltoid, an internal A/B testing tool from FB.
    • raised $43M in series B on April 2022, backed by Sequoia Capital
    • Leadership: CEO was a VP eng at FB

Note: This list is not complete, there are a few other potential startups I am still evaluating, and talking to their founders. If there are any potential companies that you think should be included, please comment below.

Leave a comment