So much for the Deepseek panic. Days after the Chinese company upended the tech industry with an AI model that rivaled those of U.S. incumbents with a fraction of the development costs, it’s becoming clear that the panic that wiped over $450 billion from Nvidia and sparked a frenzy across the AI community was more toothless jump scare, than legitimate red tech threat.

For George Morgan, CEO of Symbolica, a San Francisco-based startup that’s been hard at work designing similarly cost efficient models, it’s all a bit embarrassing. “The market reaction to this is just completely misguided and misinformed,” Morgan told Forbes. “Quite frankly, I think it’s mostly a political reaction. If this were a U.S.-based LLM company, I assume it wouldn’t have gotten anywhere near as much attention as DeepSeek has gotten.”

The simple truth is that designing more cost efficient foundational models like DeepSeek’s is not at all a new thing. People have been hammering away at them for years. But there’s another issue as well: DeepSeek claims it trained a large language model with a piddling $5.6 million worth of compute or processing power. That number, it turns out, is a bit misleading.

“The $5.6 million has to be taken with a grain of salt,” Richard Socher, CEO of AI search tool You.com said, adding that it represents the cost of just one training run (the process of teaching a model by showing it gobs of data). But a large language model built from scratch typically requires many more such training runs — sometimes thousands. DeepSeek cut its costs by training on top of open source large language models built by others, including Meta’s Llama. The company’s own technical paper explains that the $5.6 million number does not include the cost of previous research that it builds upon, an admission its actual training costs are much higher than it is letting on.

Earlier this week, Writer CEO May Habib rolled her eyes at the DeepSeek freakout for just this reason. “This is not surprising to anyone who’s been paying attention,” she said, adding that her enterprise AI startup has trained cheaper models from the start. Itamar Friedman, CEO of AI coding tool Qodo was similarly dubious. “Maybe the last button [DeepSeek] clicked needed this amount of compute or this amount of hardware,” he said.“But it doesn’t include all the spend that led up to that point. “

Which is not to say that some of the buzz around DeepSeek, whose models are already being slotted into a few American AI products, isn’t justified. The company used a widely known technique called reinforcement learning to achieve better results and made the cutting edge technology free for everyone to use and replicate. That’s a big deal. But perhaps an even bigger one isn’t technical at all: DeepSeek has forced an overdue conversation about doing more with less at a time when Sam Altman, founder of $157 billion-valued AI juggernaut OpenAI, is seeking billions of dollars to build data centers across the country for its intelligent models.

“I think they burst the bubble of ‘you have to have all of the world’s resources and all of the world’s energy to be building these models,’” Timnit Gebru, founder of the Distributed Artificial Intelligence Research Institute, told Forbes. “They are making people question their decisions. It tempers the hysteria around AI investments because they’re saying, ‘here, we can do it too.’”

It’s hardly surprising then that a war of words has emerged alongside the battle to cut training costs. Days after DeepSeek’s model made a splash, OpenAI alleged that the Chinese company scraped outputs from its proprietary models to create its AI systems (a process called distillation), violating the company’s terms of service in doing so, OpenAI confirmed with Forbes. “We know that groups in the People’s Republic of China are actively working to use methods… to try to replicate advanced U.S. AI models,” Hannah Wong, chief communications officer at OpenAI told Forbes in an emailed statement. “We take aggressive, proactive countermeasures to protect our technology and will continue working closely with the U.S. government to protect the most capable models being built here.”

For OpenAI, which trained its powerful models by scraping the entirety of the internet including copyright data and has been sued by news companies and a group of authors as a result, that’s quite a position to take. “It’s so ridiculous,” Gebru said. “It’s kind of laughable.” After all, the company is literally arguing that it is fair to use public data for AI training in the aforementioned suits.

But the real point here is that DeepSeek is not the first company to do what it has done. Microsoft built its family of small language models called Phi by training on outputs of superior models like OpenAI’s GPT-4. As Douwe Kiela, CEO of enterprise startup Contextual AI tersely put it: “DeepSeek didn’t have a “novel research breakthrough.”

“It’s a little bit sensationalist where it’s like, ‘oh, this changes everything. It’s this Sputnik moment,’” said a former Meta research scientist, referring to a widely quoted remark by A16 founder Marc Andreessen. “I think it is very far from the Sputnik moment.”

Read the full article here

Share.
Leave A Reply