It’s clear that voice is becoming a major interface, as we witness the rise of the Amazon Echo, Google Home, Siri, Cortana and their ilk. We’re also seeing an increasing use of chat bots and other voice-driven tools, which often require speech recognition with a very specific vocabulary.
That’s where AssemblyAI, a member of the Summer ’17 Y Combinator class comes in. The startup is building an API that will help developers build customized chat interfaces quickly.
“We’re building an API for customized speech recognition. Developers use our API for transcribing phone calls or creating custom voice interfaces. We help them recognize an unlimited number of custom words without any training,” Dylan Fox, AssemblyAI’s founder told TechCrunch.
He says, most off-the-shelf speech recognition APIs are designed to be one size fits all. If you want to customize it, it gets really expensive. AssemblyAI hopes to change that.
When Fox was working at his previous job as an engineer at Cisco, he saw first-hand how difficult it was to create a speech recognition program with custom words. It usually involved a lot of engineering resources and took a long time. He came up with the idea of AssemblyAI as a way to make it easier, less costly and much faster. He added, that recent advancements in AI and machine learning have made it possible to do what his company is doing now.
It’s worth noting that the tool requires GPUs, rather than CPUs, for increased processing power because the task is so resource-intensive. Getting access to a sufficient number of GPUs to build and run the tasks has been a challenge for the three-person startup, but their affiliation with Y Combinator has helped in that regard. It’s also brand new tech, so they have to solve every problem they encounter on their own. There are no books to read or solutions to look up on Google.
Even though they are just three people, they believe user experience is going to be key to their success, so they have one team member fully devoted to developing the front end. They claim that no training is required to run the API. You just upload a list of terms or names and the API takes care of the rest.
Disrupt 2026: The tech ecosystem, all in one room
Your next round. Your next hire. Your next breakout opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register now to save up to $400.
Save up to $300 or 30% to TechCrunch Founder Summit
1,000+ founders and investors come together at TechCrunch Founder Summit 2026 for a full day focused on growth, execution, and real-world scaling. Learn from founders and investors who have shaped the industry. Connect with peers navigating similar growth stages. Walk away with tactics you can apply immediately
Offer ends March 13.
Fox fully recognizes that it’s hard for startup to build a speech recognition tool without constantly worrying about the bigger companies swooping in and grabbing their market share, but he says his company is working hard to differentiate itself as a go-to tool for developers.
“As a smaller company focused on a speech recognition technology, we can provide a better experience [than the bigger companies].” He says that means paying attention to the little things that attract developers to a tool like better documentation, simpler integration and just making it easier to use overall.
So far the product is in private beta with several companies deploying it on GPUs in the cloud, but it’s early days. He says when the customers come, they will have to scale to meet those demands using additional cloud-based GPU resources. If it works as described, that shouldn’t be long now.
