
“When they’re 50 or 70 times faster than the competition, they can do things they can’t do at all,” says Andrew Feldman, CEO of Celebras.
Tiannan Ray/ZDNET
The AI computer pioneer Celebras system has been “smashed down” by the demand to run Deepseek’s R1 major language model, said Andrew Feldman, co-founder and CEO of the company.
“We’re thinking about ways to meet demand. That’s a big deal,” Feldman told me in an interview last week on Zoom.
The Deepseek R1 is told as an artificial intelligence basin moment. This is because the pre-training cost of the model can achieve good or better results, while the cost of dominant models such as Openai’s GPTO1 is small.
The impact of Deepseek on the economics of AI is important, Feldman showed. But the deeper outcome is to promote even bigger AI systems.
Also: If you are confused, you can try deepseek R1 without security risk
“As we reduce computational costs, the market gets bigger and bigger,” Feldman said.
After the AI model became a sense that includes not only Celebras but much larger companies like Amazon’s AWS, many AI cloud services have been rushing to provide Deepseek’s inference. (You can try out Cerebras inference services here.)
The edge of the Celebrus is speed. According to Feldman, running inference on the company’s CS-3 computer gives you 57 times faster output than other DeepSeek service providers.
Celebras also emphasizes its speed compared to other larger language models. In the demonstration of inference problems performed by Deepseek running on the O1 Mini in Cerebras versus Openai, the Cerebras machine is finished in 2.5 seconds, with O1 taking 22 seconds to complete the task.
“This speed cannot be achieved with the number of GPUs,” Feldman said, referring to the chips Nvidia, Advanced Micro Devices and Intel sells on AI.
The challenge for those hosting DeepSeek is that, like other so-called inference models such as Openai’s GPTO1, DeepSeek uses much more computing power when generating output during inference, and results at user prompts It will be difficult to deliver in a timely manner. .
“The basic GPT model makes one inference that goes through all parameters of every word in the input of the prompt,” explained Feldman.
“These inference models, or chain models of thinking, do “multiple times for each word.”
Cerebras followed one standard procedure for companies that wanted to perform DeepSeek’s inference. Embracing the R1 neural parameters (weight) and using the parameters to train a smaller open source model. Create a “distillation” for R1.
“We were able to do it very quickly, and not only did it faster than anyone else, but we were able to produce results that weren’t just a bit, but not just a bit,” Feldman said. Ta.
Also: I tested Deepseek’s R1 and V3 coding skills, but we are all not destined (still)
The results of Celebras using the Deepseek R1 distilled lama 70b are comparable to the model’s published accuracy benchmarks. Cerebras has not disclosed the Deepseek R1 distilled lama 70b pricing for inference, but said it is “competitively priced, especially to provide performance in the top industry.”
Deepseek’s breakthrough has several meanings.
One is that it is a big win for open source AI and Feldman means that he posts neural parameters for download. Many of the advancements in new AI models can be replicated by researchers who have access to weights, even if the source code is not accessible. Private models such as the GPT-4 do not disclose weight.
“Open source certainly has that much,” Feldman said. “This was the first open source inference model for top flights.”
While Deepseek’s economics has surprised the world of AI, Feldman said the advancement will lead to continued investment in AI’s cutting-edge chips and networking technologies.
Also: Is Deepseek’s new image model another win for cheap AI?
“In the last 50 years, the public market has been wrong every time,” Feldman said, implying a massive sale of shares in NVIDIA and other AI technology providers. “Every time computing becomes cheaper, they (public market investors) systematically assume that the market will be smaller. And for over 50 years, it has made the market bigger.”
Feldman cited examples of lowering the price of X86 PCs, leading to more PCs being sold and used. Today he said, “You have 25 computers in your home. You have one in your pocket, you have one in your work, you have one in your dishwasher “Your washing machine has one, your TV has each.”
Not only is it the same, but it also allows large and large AI systems to be built, allowing results to exceed the scope of product AI. This is the point Feldman has been doing since Celebrus was founded almost 10 years ago.
“When it’s 50 or 70 times faster than the competition, they can do things they can’t do at all,” he said. “At some point, differences in degrees become differences in kind.”
Also: Apple researchers reveal the secret source behind DeepseekAI
Cerebras launched its public inference service last August and showed much faster speeds than most other providers to run generation AI. It claims to be the “world’s fastest AI inference provider.”
Apart from the distilled Llama model, Cerebras currently does not offer a complete R1 in reasoning as costs are kept down for most customers.
“The 67.1 billion parameter model is an expensive model to run,” Feldman said. “What we saw on the Llama 405B was much less because there was a huge amount of interest on the 70B node and it was much more expensive on the 405B node. That’s the market today.”
Celebras has customers paying for the full Lama 405B because “added precision worth the extra cost.”
Celebras also bets that privacy and security are features that can be used to your advantage. Initial enthusiasm for Deepseek was followed by numerous reports on concerns regarding the processing of the model’s data.
“When you use the app, your data is sent to China,” says Feldman of The Android and IOS native apps Feldman of The Android and IOS. “If you use us, your data is hosted in the US. We don’t store your weights or your information.
When asked about the many security vulnerabilities researchers have published about Deepseek R1, Feldman was philosophical. He showed that as technology matures, some problems will be resolved.
Also: Security companies discover that Deepseek has a “direct link” to Chinese government servers
“The industry is moving very quickly. No one has seen anything like that,” Feldman said. “It’s been better for more than a week, over a month. But is it perfect? No. Should I use LLM (large language model) to replace common sense?
Following the announcement of the R1, Cerebras announced last Thursday that it had added support for running for Le Chat, an inference prompt run by French AI startup Mistral. Running Le Chat’s “Flash Answers” feature at 1,100 tokens per second, the model becomes the world’s fastest AI assistant, saying, “It’s 10 times faster than typical models like ChatGpt 4o, Sonnet 3.5, and Deepseek R1.” I did. . ”