DeepSeek announced the first generation DeepSeek-R1 and DeepSeek-R1-Zero models designed to tackle complex inference tasks.
DeepSeek-R1-Zero does not rely on supervised fine-tuning (SFT) as a pre-stage and is trained solely by large-scale reinforcement learning (RL). According to DeepSeek, this approach led to the natural emergence of “a number of powerful and interesting reasoning behaviors,” including self-examination, reflection, and the generation of extensive chains of thought (CoTs).
“Notably, (DeepSeek-R1-Zero) is the first published study to verify that the inference ability of LLM does not require SFT and can be encouraged purely through RL,” DeepSeek researchers explained. This milestone not only highlights the innovative foundation of the model, but also paves the way for advancements in RL-centered inference AI.
However, the functionality of DeepSeek-R1-Zero has certain limitations. Key challenges include “endless repetition, poor readability, and mixed languages,” which can be major obstacles in real-world applications. To address these shortcomings, DeepSeek has developed its flagship model, DeepSeek-R1.
Introducing DeepSeek-R1
DeepSeek-R1 builds on its predecessor by incorporating cold-start data before RL training. This additional pre-training step enhances the model’s inference capabilities and resolves many of the limitations noted in DeepSeek-R1-Zero.
Notably, DeepSeek-R1 achieves performance comparable to OpenAI’s acclaimed o1 system across math, coding, and general inference tasks, solidifying its position as a leading competitor.
DeepSeek has chosen to open source both DeepSeek-R1-Zero and DeepSeek-R1, as well as six smaller distillation models. Among them, DeepSeek-R1-Distill-Qwen-32B shows excellent results, even outperforming OpenAI’s o1-mini across multiple benchmarks.
MATH-500 (Pass@1): DeepSeek-R1 achieved 97.3%, outperforming OpenAI (96.4%) and other major competitors. LiveCodeBench (Pass@1-COT): The distilled version DeepSeek-R1-Distill-Qwen-32B received a score of 57.2%, showing outstanding performance among small models. AIME 2024 (Pass@1): DeepSeek-R1 achieved 79.8%, setting an impressive standard in mathematical problem solving.
A pipeline that benefits the entire industry
DeepSeek shared insights into a rigorous pipeline for inference model development that combines supervised fine-tuning and reinforcement learning.
According to the company, the process includes two SFT stages to establish basic reasoning and non-reasoning abilities, and a process to discover advanced reasoning patterns and adjust these abilities to human preferences. Contains two RL stages.
“We believe this pipeline will benefit the industry by creating better models,” DeepSeek said, hinting at the potential of their methodology to drive future advances across the AI sector. Ta.
One of the distinguishing achievements of the RL-focused approach is DeepSeek-R1-Zero’s ability to perform complex inference patterns without prior human direction. This is a first for the open source AI research community.
Importance of distillation
DeepSeek researchers also emphasized the importance of distillation. This is the process of moving inference power from large models to smaller, more efficient models, and is a strategy for achieving improved performance even in smaller configurations.
Smaller iterations of DeepSeek-R1 (such as the 1.5B, 7B, and 14B versions) have been able to hold their own in niche applications. The distilled model can outperform the results obtained by RL training with a model of comparable size.
🔥 Bonus: Open source distillation model!
🔬 Six fully open-sourced small-scale models extracted from DeepSeek-R1
📏 32B and 70B models equivalent to OpenAI-o1-mini
🤝 Empowering the open source community🌍 Push the boundaries of **Open AI**!
🐋2/n pic.twitter.com/tfXLM2xtZZ
— DeepSeek (@deepseek_ai) January 20, 2025
For researchers, these extracted models are available in configurations ranging from 1.5 billion to 70 billion parameters and support Qwen2.5 and Llama3 architectures. This flexibility makes it versatile across a wide range of tasks, from coding to natural language understanding.
DeepSeek uses the MIT license for its repository and weights, extending permissions for commercial use and downstream modification. Derivative works, such as using DeepSeek-R1 to train other large-scale language models (LLMs), are permitted. However, users of certain extraction models must ensure compliance with the original base model’s license, such as the Apache 2.0 or Llama3 license.
(Photo credit: Prateek Katyal)
SEE ALSO: Microsoft advances materials discovery with MatterGen

Want to learn more about AI and big data from industry leaders? Check out the AI & Big Data Expos in Amsterdam, California, and London. This comprehensive event will be co-located with major events such as Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Learn about other upcoming enterprise technology events and webinars from TechForge here.