File photo: Chinese company DeepSeek has released a new open source model, DeepSeek V3, which outperforms its leading model. |Photo courtesy: Getty Images
Chinese company DeepSeek has released a new open source model, DeepSeek V3, that outperforms existing leading open source models and closed models such as OpenAI’s GPT-4o on several benchmarks. AI models with 671 billion parameters can generate text, generate code, and perform related tasks.
The team used a MoE architecture consisting of a mixture of experts, or multiple neural networks, each optimized for a different type of task. This reduces hardware costs because each time a prompt is entered, only the relevant neural network is activated instead of the entire large language model. Each neural network consists of 34 billion parameters.
Specifically, DeepSeek states that training the AI model was completed in approximately 2,788,000 H800 GPU hours, at an estimated price tag of $5.57 million at a rental price of $2 per GPU hour. This is far less than the millions of dollars that Big Tech companies in the US have been spending on LLM training.
In a technical document released with the news, the company said the model outperformed open source models including Llama-3.1-405B and Qwen 2.5-72B on most benchmarks. It also outperformed GPT-4o on most benchmarks except SimpleQA, which focuses on English and frames.
Only Anthropic’s Claude 3.5 Sonnet was able to outperform DeepSeek V3 on most benchmarks, including MMLU-Pro, IF-Eval, GPQA-Diamond, SWE-Verified, and Aider-Edit.
The code is currently publicly available on GitHub, and the model is accessible under the company’s model license.
issued – December 27, 2024 2:00 PM IST