Chinese company Deepseek’s AI models have zoomed into the global top 10 in performance, according to popular rankings, suggesting Washington’s export curbs will find it difficult to block China’s rapid progress. Masu.
On January 20th, Deepseek introduced R1, a specialized model designed for complex problem solving.
“Deepseek R1 is one of the most amazingly impressive breakthroughs I’ve ever seen,” Marc Andreessen, a Silicon Valley venture capitalist who has advised President Trump, said in Friday’s X-Post. Ta.
Deepseek’s development was led by Liang Wenfeng, a Chinese hedge fund manager. He became the face of the country’s AI push. On January 20th, Liang met with China’s prime minister to discuss how his country’s companies can narrow the gap with the US
The specialist said that Deepseek’s technology will still be followed by Openai and Google’s technology. But it’s a close rival, albeit using fewer, more sensible chips and, in some cases, skipping steps that US developers consider essential.
Deepseek said one of its latest models cost $5.6 million, but last year Dario Amodei, CEO of AI developer Anthropic, cited $100 million as the cost of building the model. Compared to the range from $1 billion to $1 billion.
Barrett Woodside, co-founder of San Francisco AI hardware company Positron, said he and his colleagues are buzzing about Deepseek. “It’s pretty cool,” Woodside said, referring to Deepseek’s open source model, where the software code behind the AI models is made available for free.
Users of Deepseek’s latest flagship model, called V3 and released in December, found it refusing to answer sensitive political questions about China and leader Xi Jinping. In some cases, like ChatGpt, the product provides answers that align with Beijing’s official propaganda rather than including the viewpoints of government critics.
“The only strike against it is half-hearted PRC censorship,” Woodside said, referring to the People’s Republic of China, which could be removed because other developers are free to modify the code. said.
Deepseek said both the R1 and V3 outperform, or close to, leading Western models. As of Saturday, the two models ranked in the top 10 of Chatbot Arena, a platform hosted by the University of California, Berkeley. The Google Gemini model takes the top spot, with Deepseek beating out Elon Musk’s Xai’s human race Claude and Grok.
Deepseek grew out of the AI research unit of High-Flyer, an $8 billion asset hedge fund manager known for its AI-powered trading.
“When humans make investment decisions, it’s an art, and they just do it by the seat of their pants. When a computer program makes such decisions, it’s a science, and it knows which solution has the best solution. ” Liang said in a 2019 speech.
Born in 1985, Liang grew up in China’s southeastern Guangdong province. He went to the famous Zhijiang University in China and specialized in Machine Vision. A few years after graduating, Liang founded High Flyer in 2015 with two college friends.
According to people close to him, Liang prefers to be considered an engineer rather than a trader. His high flyers were China’s pioneers in applying deep learning to computerized trading. Modeled after the human brain, this technique allows computers to analyze more diverse types of data.
DeepSeek’s flagship model is free, but the company charges users to connect their own applications to DeepSeek’s models and computing infrastructure. An example is a business that wants to tap technology and have AI answer customer questions.
Early last year, Deepseek lowered the price of the service to a fraction of what other vendors were charging, prompting the Chinese industry to launch a price war.
The co-founder of the Silicon Valley-based startup said his company moved from the Humanity Claude model to DeepSeek in September to predict financial returns using Generative AI. Tests showed that DeepSeek performed similarly at about a quarter of the cost.
“Openai’s model is great for performance, but we also don’t want to pay for power we don’t need,” Poo said.
At a meeting on January 20, Deepseek’s Liang told Chinese Premier Li Qiang that while Chinese companies are catching up, American restrictions on exporting advanced chips to China are still a bottleneck.
In 2019, High-Flyer began building a cluster of chips for AI research. This is a portion of the funds generated by financial businesses. The company said it later built a larger cluster of about 10,000 Nvidia graphics processing units that can be used to train large language models.
Only a handful of companies in China had computing infrastructure powerful enough to develop such a model by late 2022, when Openai released ChatGPT.
Deepseek said in a technical report that it used a cluster of over 2,000 Nvidia chips to train its V3 model. Several US AI experts have recently questioned whether High Flyer and Deep Seek have amassed further computing power beyond what they announced.
Some outside researchers said it lacks the specific ability of more expensively trained rivals to track the context of long conversations, for example.
For the latest inference model released on January 20th, Deepseek skipped a process known as supervised fine-tuning. Programmers cultivate the knowledge of human experts to give their models a head start. Deepseek says Openai’s inference models, even though models designed to solve tricky math problems and similar challenges, omit supervised fine-tuning and focus on reinforcement learning. I said it’s comparable to O1.
Jim Fan, a senior research scientist at Nvidia, hailed the Deepseek paper reporting the results as a breakthrough. He said that with X, it was “reminiscent of earlier pioneering AI programs that mastered board games such as chess without first imitating a human grandmaster.”
Zack Kass, a former Openai executive, said that despite the American limitations, Deepseek’s progress “underscores a broader lesson: Resource constraints often fuel creativity.”
Stu Woo contributed to this article.