Google launches an improved speech-to-text service for developers

Only a few weeks after launching a major overhaul of its Cloud Text-to-Speech API, Google today also announced an update to that service’s Speech-to-Text voice recognition service. The new and improved Cloud Speech-to-Text API promises significantly improved voice recognition performance. The new API promises a reduction in word errors around 54 percent across all of Google’s tests, but in some areas the results are actually far better than that.

Part of this improvement is a major new feature in the Speech-to-Text API that now allows developers to select between different machine learning models based on this use case. The new API currently offers four of these models. There is one for short queries and voice commands, for example, as well as one for understanding audio from phone calls and another one for handling audio from videos. The fourth model is the new default, which Google recommends for all other scenarios.

In addition to these new speech recognition models, Google is also updating the service with a new punctuation model. As the Google team admits, its transcriptions have long suffered from rather unorthodox punctuation. Punctuating transcribed speech is notoriously hard though (just ask anybody who has ever tried to transcribe a speech by the current U.S. president…). Google promises that its new model results in far more readable transcriptions that feature fewer run-on sentences and more commas, periods and question marks.

With this update, Google now also lets developers tag their transcribed audio or video with some basic metadata. There is no immediate benefit to the developer here, but Google says that it will use the aggregate information from all of its users to decide on which new features to prioritize next.

Google is making a small change to how it charges for this service. Like before, audio transcripts cost $0.006 per 15 seconds. The video model will cost twice as much, though, at $0.012 per 15 seconds, though until May 31, using this new model will also cost $0.006 per 15 seconds.

Google Cloud launches a new text-to-speech engine for developers

Techcrunch event

San Francisco, CA | October 13-15, 2026

Topics

cloud, Cloud Computing, developers, Google, google cloud, Speech Recognition, speech to text, TC

Frederic Lardinois

Editor

Frederic was with TechCrunch from 2012 through 2025. He also founded SiliconFilter and wrote for ReadWriteWeb (now ReadWrite). Frederic covers enterprise, cloud, developer tools, Google, Microsoft, gadgets, transportation and anything else he finds interesting.

View Bio

Topics

More from TechCrunch

Google launches an improved speech-to-text service for developers

Disrupt 2026: The tech ecosystem, all in one room

Your next round. Your next hire. Your next breakout opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register now to save up to $400.

Save up to $300 or 30% to TechCrunch Founder Summit

The AI skills gap is here, says AI company, and power users are pulling ahead

Google unveils TurboQuant, a new AI memory compression algorithm — and yes, the internet is calling it ‘Pied Piper’

Kentucky woman rejects $26M offer to turn her farm into a data center

Someone has publicly leaked an exploit kit that can hack millions of iPhones

Disrupt 2026: The tech ecosystem, all in one room

Your next round. Your next hire. Your next breakout opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register now to save up to $400.

Save up to $300 or 30% to TechCrunch Founder Summit

Cursor admits its new coding model was built on top of Moonshot AI’s Kimi

Delve accused of misleading customers with ‘fake compliance’

An exclusive tour of Amazon’s Trainium lab, the chip that’s won over Anthropic, OpenAI, even Apple

Google launches an improved speech-to-text service for developers

Disrupt 2026: The tech ecosystem, all in one room

Your next round. Your next hire. Your next breakout opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register now to save up to $400.

Save up to $300 or 30% to TechCrunch Founder Summit

Most Popular

The AI skills gap is here, says AI company, and power users are pulling ahead

Google unveils TurboQuant, a new AI memory compression algorithm — and yes, the internet is calling it ‘Pied Piper’

Kentucky woman rejects $26M offer to turn her farm into a data center

Someone has publicly leaked an exploit kit that can hack millions of iPhones

Disrupt 2026: The tech ecosystem, all in one room

Your next round. Your next hire. Your next breakout opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register now to save up to $400.

Save up to $300 or 30% to TechCrunch Founder Summit

Cursor admits its new coding model was built on top of Moonshot AI’s Kimi

Delve accused of misleading customers with ‘fake compliance’

An exclusive tour of Amazon’s Trainium lab, the chip that’s won over Anthropic, OpenAI, even Apple