The latest controversy in the AI industry sees OpenAI and Microsoft investigating whether the Chinese AI startup DeepSeek improperly trained its groundbreaking R1 model on OpenAI’s outputs. The accusation? That DeepSeek leveraged a technique called “distillation” to extract knowledge from OpenAI models—essentially learning from them in a way OpenAI finds unfair.
The Allegations Against DeepSeek
According to Bloomberg and the Financial Times, OpenAI and Microsoft are probing whether DeepSeek trained its R1 model—which has been making waves for its efficiency—on OpenAI’s AI-generated responses. Venture capitalist and Trump administration AI advisor David Sacks claims there is “substantial evidence” that DeepSeek used distillation to “suck the knowledge” from OpenAI’s models.
In simple terms, distillation is a widely accepted machine learning technique where a smaller model learns from a larger model by querying it repeatedly and mimicking its responses. It’s a common strategy in AI development, even endorsed by pioneers like Geoffrey Hinton, who coauthored a landmark paper on the method. IBM also highlights distillation as a key tool in democratizing AI by making powerful models more accessible.
The Ironic Hypocrisy
The irony of OpenAI’s complaints is hard to ignore. OpenAI itself has built its empire on indiscriminately scraping vast amounts of internet data—often without explicit authorization. The company is currently facing a lawsuit from The New York Times for allegedly using copyrighted articles to train its models. Yet, OpenAI argues that such practices are covered under fair use, claiming that large-scale data collection is essential for building competitive AI systems.
So why is OpenAI now crying foul when DeepSeek allegedly adopts a similar approach? The company’s legal defence in the NYT case hinges on the argument that no single data source is crucial in training its AI, as the power of models comes from their sheer scale. If that’s true, why should OpenAI worry if DeepSeek used its outputs?
DeepSeek’s Alternative Approach
What makes DeepSeek’s case fascinating is that it has reportedly achieved cutting-edge AI performance without relying on OpenAI’s “more data is better” philosophy. Instead, DeepSeek leveraged reinforcement learning techniques to optimize efficiency—challenging OpenAI’s narrative that sheer scale is the key to AI success.
This development unsettles OpenAI because it suggests that other companies can achieve top-tier results without needing OpenAI’s level of data monopolization. And if DeepSeek did use OpenAI outputs, it only highlights the broader question: should AI companies be allowed to train on each other’s work? If OpenAI justifies its own data practices as fair, does it have any moral high ground to demand exclusivity over its own AI-generated content?
The Bigger Picture: AI Protectionism
OpenAI now argues that protecting its AI outputs is necessary for national security, claiming that “PRC-based companies—and others—are constantly trying to distill the models of leading US AI companies.” This aligns with a broader push for tighter AI regulations to prevent foreign competitors from catching up.
However, AI development has always built upon previous research. OpenAI itself evolved from Google’s foundational work, and Google relied on decades of academic AI research before that. If OpenAI truly believes in open innovation and fair use, then its stance against DeepSeek is more about business interests than ethics.
At its core, this conflict is less about intellectual property and more about power dynamics in the AI industry. OpenAI is now facing the same kind of competition that it once benefited from. Whether DeepSeek violated OpenAI’s terms of service or not, this case forces a critical debate on AI ownership:
- Should AI companies be allowed to train on each other’s models?
- If OpenAI built its empire on scraping publicly available data, should it be protected from others doing the same to its AI outputs?
- Will AI development become a closed-off industry dominated by a few giants, or will techniques like distillation continue to level the playing field?
As OpenAI pushes for stricter protections while facing competition from DeepSeek, one thing is clear—AI’s future will be shaped not just by innovation, but also by legal and political battles over who gets to control its knowledge.