The champion is not forever. Last week, Deepseek AI trembled the thorns of investors and tech companies. Currently, two computer chip startups are drafting these vibes.
The Celebras system creates a huge computer chip (the size of a dinner plate) with a fundamental design. Meanwhile, GROQ is creating chip-made for large language models. In head-to-head testing, these Alt-Chips blew the competition out of the water running a version of Deepseek’s virus AI.
The answer may take several minutes to complete on other hardware, but Cerebras said that the Deepseek version knocked out some coding tasks in just 1.5 seconds. According to artificial analysis, the company’s wafer scale chips were 57 times faster than their competitors, running AI on the GPU and passing it at the fastest. That was last week. Yesterday, Groq overtook the cerebrum at the top with a new product.
In numbers, Deepseek’s progress is more subtle than it appears, but the trend is real. Despite LABS’s planning to significantly expand its AI model, the algorithm itself has become significantly more efficient. On the hardware side, these benefits are consistent with Nvidia, but chip startups like Cerebras and Groq could also outperform with inference.
Big Tech is working on buying more hardware, and Nvidia won’t be thrown away anytime soon, but alternatives will start nibbling at the edge, especially if they can offer AI models faster or cheaper than traditional options. It may be.
It’s reasonable
Deepseek’s new AI, R1, is a “inference” model like Openai’s O1. This means instead of spitting out the first generated answer, you bite the problem and stitch the answers together in stages.
For casual chats, this isn’t a big difference, but for complex and valuable problems, problems like coding and mathematics move forward.
Deepseek’s R1 is already very efficient. That was the news last week.
Not only is the R1 training cheaper, it’s cheaper at just $6 million (though the meaning of this number is disputed), and details of its weight and engineering are open. This contrasts with headlines about impending investments in unique AI initiatives, which are bigger than the Apollo program.
The news gave investors a pause. Perhaps AI didn’t need as much cash and as many tips as technology leaders think. Nvidia, the beneficiary of these investments, has been a hit in the massive stock market.
Small, quick – smart
Everything is on the software side, making the algorithms cheaper and more efficient. However, chip training or running AI has also been improved.
Last year, Groq, a startup founded by Jonathan Ross, an engineer who previously developed Google’s in-house AI chips, created a headline with Taker chips for large language models. The popular chatbot responses were spooled out line by line on the GPU, but conversations on GROQ’s chips got closer to real-time.
That was the case. New harvests of inference AI models take much longer to provide answers by design.
These models, known as “test time calculations,” emit multiple answers in the background, select the best answer, and provide the rationale for the answer. Companies say the longer they are allowed to “think,” the better the answer will be. These models do not beat the older models on the whole, but have made progress in areas where older algorithms struggle, such as mathematics and coding.
As the inference model shifts its focus to inference, the process where the completed AI model is the process of handling user queries – speed and cost are even more important. People want answers quickly, and they don’t want to pay more for them. Here, Nvidia in particular is facing growing competition.
In this case, Celebras, GROQ, and several other inference providers have decided to host a crunchdown version of R1.
Instead of the original 67.1 billion parameter model, the parameters are measures of the size and complexity of the algorithm. As the name suggests, the model is small, with just 70 billion parameters. But even so, according to Celebras, you can surpass Openai’s O1-Mini in your benchmark of your choice.
AI analytics platform, Artificial Analytics, ran a direct performance comparison of several inference providers last week, with Celebras at the top. At a similar cost, wafer-scale chips spit about 1,500 tokens per second, compared to 536 and 235 in Samba Nova and GROQ, respectively. In a demonstration of efficiency improvements, Cerebras said that the Deepseek version took 1.5 seconds and completed the coding task that took Openai’s O1-Mini 22 seconds.
Yesterday, artificial analysis performed an update to include a new donation from GROQ that overtaked the cerebrum.
The small R1 model cannot match the larger model of pounds, but artificial analysis noted that this is the first time the results have a hit speed comparable to an irrational model.
Beyond speed and cost, inference companies host models wherever they are based. Deepseek took a photo to the top of a popular chart last week, but the model is hosted on a Chinese server, with experts raising concerns about security and privacy. In a press release, Celebras notes that it hosts Deepseek in the US.
It’s not very
Whatever its long-term impact, the news illustrates something powerful. And while it’s notable that it already exists, it’s trending towards greater efficiency in AI.
Since Openai previewed the O1 last year, the company has moved to its next model, the O3. Last week, users were able to access a smaller version of the latest model, the O3-Mini. Yesterday, Google released a version of its own inference model, with efficiency approaching R1. Additionally, Deepseek’s model is open and includes detailed development papers, so current positions and startups will adopt advancements.
Meanwhile, Frontier labs promise to grow. Google, Microsoft, Amazon and Meta will spend $300 billion late on AI data centers. Openai and Softbank have agreed to a four-year, $500 billion data center project called Stargate.
Anthropic CEO Dario Amodei describes it as a three-part flywheel. Large models can leap in capabilities. Companies will later refine these models. These models include the development of inference models, among other improvements. As it is woven throughout, advances in hardware and software make algorithms cheaper and more efficient.
The latter trend means that companies can scale less on the frontier, but a smaller, nimble algorithm with high capabilities opens new applications and requests lines. Until this process runs out of itself, this is a topic of discussion and there is a demand for all kinds of AI chips.