Elon Musk agrees that we've exhausted AI training data

Elon Musk concurs with other AI experts that there’s little real-world data left to train AI models on.

“We’ve now exhausted basically the cumulative sum of human knowledge … in AI training,” Musk said during a livestreamed conversation with Stagwell chairman Mark Penn on X late Wednesday. “That happened basically last year.”

Musk, who owns AI company xAI, echoed themes former OpenAI chief scientist Ilya Sutskever touched on at NeurIPS, the machine learning conference, during an address in December. Sutskever, who said the AI industry had reached what he called “peak data,” predicted a lack of training data will force a shift away from the way models are developed today.

Indeed, Musk suggested that synthetic data — data generated by AI models themselves — is the path forward. “The only way to supplement [real-world data] is with synthetic data, where the AI creates [training data],” he said. “With synthetic data … [AI] will sort of grade itself and go through this process of self-learning.”

Other companies, including tech giants like Microsoft, Meta, OpenAI, and Anthropic, are already using synthetic data to train flagship AI models. Gartner estimates 60% of the data used for AI and analytics projects in 2024 were synthetically generated.

Microsoft’s Phi-4, which was open sourced early Wednesday, was trained on synthetic data alongside real-world data. So were Google’s Gemma models. Anthropic used some synthetic data to develop one of its most performant systems, Claude 3.5 Sonnet. And Meta fine-tuned its most recent Llama series of models using AI-generated data.

Training on synthetic data has other advantages, like cost savings. AI startup Writer claims its Palmyra X 004 model, which was developed using almost entirely synthetic sources, cost just $700,000 to develop — compared to estimates of $4.6 million for a comparably sized OpenAI model.

Techcrunch event

San Francisco, CA | October 13-15, 2026

But there as disadvantages as well. Some research suggests that synthetic data can lead to model collapse, where a model becomes less “creative” — and more biased — in its outputs, eventually seriously compromising its functionality. Because models create synthetic data, if the data used to train these models has biases and limitations, their outputs will be similarly tainted.

Attendees walk under and pose in front of a CES sign during CES 2024 at the Las Vegas Convention Center.

January 5, 2025 – January 10, 2025

From the Storyline: Live Updates CES 2025: The final reveals and analysis as the event nears its end

CES 2025, the annual consumer tech conference held in Las Vegas, is upon us — and this is where you…

Topics

AI, AI, Elon Musk, Generative AI, synthetic data, training data

Kyle Wiggers

AI Editor

Kyle Wiggers was TechCrunch’s AI Editor until June 2025. His writing has appeared in VentureBeat and Digital Trends, as well as a range of gadget blogs including Android Police, Android Authority, Droid-Life, and XDA-Developers. He lives in Manhattan with his partner, a music therapist.

View Bio

Topics

More from TechCrunch

Elon Musk agrees that we’ve exhausted AI training data

Disrupt 2026: The tech ecosystem, all in one room

Your next round. Your next hire. Your next breakout opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register now to save up to $400.

Save up to $300 or 30% to TechCrunch Founder Summit

From the Storyline: Live Updates CES 2025: The final reveals and analysis as the event nears its end

Google unveils TurboQuant, a new AI memory compression algorithm — and yes, the internet is calling it ‘Pied Piper’

Kentucky woman rejects $26M offer to turn her farm into a data center

Someone has publicly leaked an exploit kit that can hack millions of iPhones

Cursor admits its new coding model was built on top of Moonshot AI’s Kimi

Delve accused of misleading customers with ‘fake compliance’

An exclusive tour of Amazon’s Trainium lab, the chip that’s won over Anthropic, OpenAI, even Apple

Cyberattack on vehicle breathalyzer company leaves drivers stranded across the US

Elon Musk agrees that we’ve exhausted AI training data

Disrupt 2026: The tech ecosystem, all in one room

Your next round. Your next hire. Your next breakout opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register now to save up to $400.

Save up to $300 or 30% to TechCrunch Founder Summit

From the Storyline: Live Updates CES 2025: The final reveals and analysis as the event nears its end

Most Popular

Google unveils TurboQuant, a new AI memory compression algorithm — and yes, the internet is calling it ‘Pied Piper’

Kentucky woman rejects $26M offer to turn her farm into a data center

Someone has publicly leaked an exploit kit that can hack millions of iPhones

Cursor admits its new coding model was built on top of Moonshot AI’s Kimi

Delve accused of misleading customers with ‘fake compliance’

An exclusive tour of Amazon’s Trainium lab, the chip that’s won over Anthropic, OpenAI, even Apple

Cyberattack on vehicle breathalyzer company leaves drivers stranded across the US