DeepSeek-R1 inference model is comparable to OpenAI in performance

DeepSeek announced the first generation DeepSeek-R1 and DeepSeek-R1-Zero models designed to tackle complex inference tasks.

DeepSeek-R1-Zero does not rely on supervised fine-tuning (SFT) as a pre-stage and is trained solely by large-scale reinforcement learning (RL). According to DeepSeek, this approach led to the natural emergence of “a number of powerful and interesting reasoning behaviors,” including self-examination, reflection, and the generation of extensive chains of thought (CoTs).

“Notably, (DeepSeek-R1-Zero) is the first published study to verify that the inference ability of LLM does not require SFT and can be encouraged purely through RL,” DeepSeek researchers explained. This milestone not only highlights the innovative foundation of the model, but also paves the way for advancements in RL-centered inference AI.

However, the functionality of DeepSeek-R1-Zero has certain limitations. Key challenges include “endless repetition, poor readability, and mixed languages,” which can be major obstacles in real-world applications. To address these shortcomings, DeepSeek has developed its flagship model, DeepSeek-R1.

Introducing DeepSeek-R1

DeepSeek-R1 builds on its predecessor by incorporating cold-start data before RL training. This additional pre-training step enhances the model’s inference capabilities and resolves many of the limitations noted in DeepSeek-R1-Zero.

Notably, DeepSeek-R1 achieves performance comparable to OpenAI’s acclaimed o1 system across math, coding, and general inference tasks, solidifying its position as a leading competitor.

DeepSeek has chosen to open source both DeepSeek-R1-Zero and DeepSeek-R1, as well as six smaller distillation models. Among them, DeepSeek-R1-Distill-Qwen-32B shows excellent results, even outperforming OpenAI’s o1-mini across multiple benchmarks.

MATH-500 (Pass@1): DeepSeek-R1 achieved 97.3%, outperforming OpenAI (96.4%) and other major competitors. LiveCodeBench (Pass@1-COT): The distilled version DeepSeek-R1-Distill-Qwen-32B received a score of 57.2%, showing outstanding performance among small models. AIME 2024 (Pass@1): DeepSeek-R1 achieved 79.8%, setting an impressive standard in mathematical problem solving.

Source link

What's Hot

The world’s largest air force with the F-35 fleet in 2025

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Among the most troublesome relationships in healthcare AI

DeepSeek-R1 inference model is comparable to OpenAI in performance

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Among the most troublesome relationships in healthcare AI

Does access to AI become a fundamental human right? Sam Altman says, “Everyone would want…”

20 Most Anticipated Sex Movies of 2025

President Trump’s SEC nominee Paul Atkins marries multi-billion dollar roof fortune

How to tell the difference between fake and genuine Adidas Sambas

Alice Munro’s Passive Voice | New Yorker

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Among the most troublesome relationships in healthcare AI

Does access to AI become a fundamental human right? Sam Altman says, “Everyone would want…”

Google’s Gemini AI is on TV

Our Picks

The world’s largest air force with the F-35 fleet in 2025

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Among the most troublesome relationships in healthcare AI

Most Popular

10 things you should never say to an AI chatbot

Character.AI faces lawsuit over child safety concerns

Analyst warns Salesforce investors about AI agent optimism

Subscribe to Updates

What's Hot

DeepSeek-R1 inference model is comparable to OpenAI in performance

Introducing DeepSeek-R1

A pipeline that benefits the entire industry

Importance of distillation

Related Posts