January 27, 2025
3 min read
Why DeepSeek’s AI Model Just Became the Top-Rated App in the U.S.
A Chinese start-up has stunned the technology industry—and financial markets—with a cheaper, lower-tech AI assistant that matches the state of the art
DeepSeek’s artificial intelligence assistant made big waves Monday, becoming the top-rated app in the Apple Store and sending tech stocks into a downward tumble. What’s all the fuss about?
The Chinese start-up, DeepSeek, surprised the tech industry with a new model that rivals the abilities of OpenAI’s most recent model—with far less investment and using reduced-capacity chips. The U.S. bans exports of state-of-the-art computer chips to China and limits sales of chipmaking equipment. DeepSeek, based in the eastern Chinese city of Hangzhou, reportedly had a stockpile of high-performance Nvidia A100 chips from times prior to the ban—so its engineers could have used those to develop the model. But in a key breakthrough, the start-up says it instead used much lower-powered Nvidia H800 chips to train the new model, dubbed DeepSeek-R1.
“We’ve seen up to now that the success of large tech companies working in AI was measured in how much money they raised, not necessarily in what the technology actually was,” says Ashlesha Nesarikar, the CEO of AI company Plano Intelligence, Inc. “I think we’ll be paying a lot more attention to what tech is underpinning these companies’ different products.”
On supporting science journalism
If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
On common AI tests in mathematics and coding, DeepSeek-R1 matched the scores of Open AI’s o1 model, according to VentureBeat. U.S. companies don’t disclose the cost of training their own large language models (LLMs), the systems that undergird popular chatbots such as ChatGPT. But OpenAI CEO Sam Altman told an audience at MIT in 2023 that training ChatGPT-4 cost over $100 million. DeepSeek-R1 is free for users to download, while the comparable version of ChatGPT costs $200 a month.
DeepSeek’s $6 million number doesn’t necessarily reflect the cost of building a LLM from scratch, Nesarikar says; that cost may represent a fine-tuning of this latest version. Nevertheless, she says, the model’s improved energy efficiency would make AI more accessible to more people in more industries. The increase in efficiency could be good news when it comes to AI’s environmental impact, as the computation cost of generating new data with an LLM is four to five times higher than a typical search engine query.
Because it requires less computational power, the cost of running DeepSeek-R1 is a tenth of the cost of similar competitors, says Hanchang Cao, an incoming assistant professor in Information Systems and Operations Management at Emory University. “For academic researchers or start-ups, this difference in the cost really means a lot,” Cao says.
DeepSeek achieved its efficiency in several ways, says Anil Ananthaswamy, author of Why Machines Learn: The Elegant Math Behind Modern AI. The model has 670 billion parameters, or variables it learns from during training, making it the largest open-source large language model yet, Ananthaswamy explains. But the model uses an architecture called “mixture of experts” so that only a relevant fraction of these parameters—tens of billions instead of hundreds of billions—are activated for any given query. This cuts down on computing costs. The DeepSeek LLM also uses a method called multi-head latent attention; instead of predicting an answer word-by-word, it generates multiple words at once.
The model further differs from others like o1 in how it reinforces learning during training. While many LLMs have an external “critic” model that runs alongside them, correcting errors and nudging the LLM toward verified answers, DeepSeek-R1 uses a set of rules internal to the model to teach it which of the possible answers it generates is best. “DeepSeek has streamlined that process,” Anasthaswamy says.
Another important aspect of DeepSeek-R1 is that the company has made the code behind the product open-source, Anasthaswamy says. (The training data remains proprietary.) This means that the company’s claims can be checked. If the model is as computationally efficient as DeepSeek claims, he says, it will probably open up new avenues for researchers who use AI in their work to do so more quickly and cheaply. It will also enable more research into the inner workings of LLMs themselves.
“One of the big things has been this divide that has opened up between academia and industry because academia has been unable to work with these really large models or do research in any meaningful way,” Anasthaswamy says. “But something like this, it’s within the reach of academia now, because you have the code.”