Google's WaveNet machine learning-based speech synthesis comes to Assistant

Last year, Google showed off WaveNet, a new way of generating speech that didn’t rely on a bulky library of word bits or cheap shortcuts that result in stilted speech. WaveNet used machine learning to build a voice sample by sample, and the results were, as I put it then, “eerily convincing.” Previously bound to the lab, the tech has now been deployed in the latest version of Google Assistant.

The general idea behind the tech was to recreate words and sentences not by coding grammatical and tonal rules manually, but allowing a machine learning system to see those patterns in speech and generate them sample by sample. A sample, in this case, being the tone generated every 1/16,000th of a second.

At the time of its first release, WaveNet was extremely computationally expensive, taking a full second to generate 0.02 seconds of sound — so a two-second clip like “turn right at Cedar street” would take nearly two minutes to generate. As such, it was poorly suited to actual use (you’d have missed your turn by then) — which is why Google engineers set about improving it.

The new, improved WaveNet generates sound at 20x real time — generating the same two-second clip in a tenth of a second. And it even creates sound at a higher sample rate: 24,000 samples per second, and at 16 versus 8 bits. Not that high-fidelity sound can really be appreciated in a smartphone speaker, but given today’s announcements, we can expect Assistant to appear in many more places soon.

The voices generated by WaveNet sound considerably better than the state of the art concatenative systems used previously:

Old and busted:

New and hot:

Techcrunch event

San Francisco, CA | October 13-15, 2026

(More samples are available at the Deep Mind blog post, though presumably the Assistant will also sound like this soon.)

WaveNet also has the admirable quality of being extremely easy to scale to other languages and accents. If you want it to speak with a Welsh accent, there’s no need to go in and fiddle with the vowel sounds yourself. Just give it a couple dozen hours of a Welsh person speaking and it’ll pick up the nuances itself. That said, the new voice is only available for U.S. English and Japanese right now, with no word on other languages yet.

In keeping with the trend of “big tech companies doing what the other big tech companies are doing,” Apple, too, recently revamped its assistant (Siri, don’t you know) with a machine learning-powered speech model. That one’s different, though: it didn’t go so deep into the sound as to recreate it at the sample level, but stopped at the (still quite low) level of half-phones, or fractions of a phoneme.

The team behind WaveNet plans to publish its work publicly soon, but for now you’ll have to be satisfied with their promises that it works and performs much better than before.

Topics

DeepMind, Google, Google Assistant, google deepmind, Hardware, machine learning, wavenet

Devin Coldewey

Writer & Photographer

Devin Coldewey is a Seattle-based writer and photographer.

His personal website is coldewey.cc.

View Bio

Topics

More from TechCrunch

Google’s WaveNet machine learning-based speech synthesis comes to Assistant

Disrupt 2026: The tech ecosystem, all in one room

Your next round. Your next hire. Your next breakout opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register now to save up to $400.

Save up to $300 or 30% to TechCrunch Founder Summit

The AI skills gap is here, says AI company, and power users are pulling ahead

Google unveils TurboQuant, a new AI memory compression algorithm — and yes, the internet is calling it ‘Pied Piper’

Kentucky woman rejects $26M offer to turn her farm into a data center

Someone has publicly leaked an exploit kit that can hack millions of iPhones

Disrupt 2026: The tech ecosystem, all in one room

Your next round. Your next hire. Your next breakout opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register now to save up to $400.

Save up to $300 or 30% to TechCrunch Founder Summit

Cursor admits its new coding model was built on top of Moonshot AI’s Kimi

Delve accused of misleading customers with ‘fake compliance’

An exclusive tour of Amazon’s Trainium lab, the chip that’s won over Anthropic, OpenAI, even Apple

Google’s WaveNet machine learning-based speech synthesis comes to Assistant

Disrupt 2026: The tech ecosystem, all in one room

Your next round. Your next hire. Your next breakout opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register now to save up to $400.

Save up to $300 or 30% to TechCrunch Founder Summit

Most Popular

The AI skills gap is here, says AI company, and power users are pulling ahead

Google unveils TurboQuant, a new AI memory compression algorithm — and yes, the internet is calling it ‘Pied Piper’

Kentucky woman rejects $26M offer to turn her farm into a data center

Someone has publicly leaked an exploit kit that can hack millions of iPhones

Disrupt 2026: The tech ecosystem, all in one room

Your next round. Your next hire. Your next breakout opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register now to save up to $400.

Save up to $300 or 30% to TechCrunch Founder Summit

Cursor admits its new coding model was built on top of Moonshot AI’s Kimi

Delve accused of misleading customers with ‘fake compliance’

An exclusive tour of Amazon’s Trainium lab, the chip that’s won over Anthropic, OpenAI, even Apple