Close Menu
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

Today’s Earthquake: Massive Trembling 3.8 Jorz Afghanistan

Do you have three years of citizenship? The new transition in Germany, Visa Freeze rules explained |

OPEC countries will “implement production adjustment” for 411K BPD in July

Facebook X (Twitter) Instagram
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram Pinterest Vimeo
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World
Karachi Chronicle
You are at:Home » How China’s DeepSeek-V3 AI model challenges OpenAI’s dominance | Technology News
AI

How China’s DeepSeek-V3 AI model challenges OpenAI’s dominance | Technology News

Adnan MaharBy Adnan MaharJanuary 3, 2025No Comments6 Mins Read0 Views
Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


Just a few days ago, OpenAI announced its latest o3 model which previewed some of its features. The new model’s surprising benchmark results have sparked a debate over its potential to be artificial general intelligence. While this is a huge leap forward in AI advancement, another AI model is making ripples throughout the AI ​​community. DeepSeek-V3, a proprietary model from Chinese AI lab DeepSeek, is touted to be able to handle even the most advanced AI models, outperforming GPT-4o and Claude 3.5 Sonnet on various benchmarks. This model represents a future where innovation, low costs, and cutting-edge AI are no longer limited to a few tech giants.

DeepSeek-V3 has been hailed as the latest breakthrough in AI technology and highlights several high-tech innovations aimed at redefining AI applications. Andrej Karpathy, one of OpenAI’s founding members, said in a post about X that DeepSeek-V3 was trained with a significantly smaller budget and fewer resources than other frontier models. According to Tesla’s former AI director, leading models typically require clusters of 16,000 GPUs and massive computational resources, but the Chinese lab used just 2,048 GPUs and cost as little as $6 million. I trained for two months at Cost and achieved impressive results.

DeepSeek (a Chinese AI company) is making it look easy today with the open weight release of a frontier-grade LLM trained on a tongue-in-cheek budget (2048 GPUs in 2 months, $6 million).

For reference, this level of functionality is expected to require a cluster of nearly 16,000 GPUs. https://t.co/EW7q2pQ94B

— Andrei Karpathy (@karpathy) December 26, 2024

Here, we’ll take a closer look at what makes DeepSeek-V3: its architecture, features, pricing, benchmarks, and how it stacks up against other products.

What is DeepSeek-V3?

DeepSeek-V3 is a large-scale open source AI model trained on a budget of $5.5 million, compared to GPT-4o’s training cost of $100 million. This is an AI model that can be classified as a Mixture-of-Experts (MoE) language model. Essentially, the MoE model is like a team of expert models working together to answer your questions. Instead of one big model that handles everything, MoE has many “expert” models, each trained to excel at a specific task. The model reportedly has 671 billion parameters, but only 37 billion are active to handle a specific task. Experts say this selective activation allows the model to achieve high performance without using excessive computational resources.

DeepSeek-V3 is trained on 14.8 trillion tokens with a huge, high-quality dataset to provide a broader understanding of language and task-specific features. Additionally, this model uses several new techniques such as multi-head latent attention (MLA) and auxiliary lossless load balancing methods to increase training and deployment efficiency and reduce costs. These advances are new and allow DeepSeek-V3 to compete with some of today’s most advanced closed models.

This model is built with the NVIDIA H800 chip, a lower-performance but more cost-effective alternative to the H100 chip designed for restricted markets such as China. Despite its limitations, this model provides excellent results. DeepSeek-V3, with its innovative technology, is believed to be a major breakthrough in AI architecture and training efficiency. This model reportedly not only offers cutting-edge performance, but also achieves it with exceptional efficiency and scalability.

Function definition

As mentioned earlier, DeepSeek-V3 uses MLA to optimize memory usage and inference performance. According to reports, MoE models are known for poor performance, but DeepSeek-V3 minimizes this issue with its auxiliary lossless load balancing feature. These make this model an excellent choice for computationally intensive tasks. The entire process of training a model is more cost-effective due to reduced memory usage and faster computation. Additionally, DeepSeek-V3 can process up to 128,000 tokens in a single context, and this long context understanding gives it a competitive edge in areas such as legal document review and academic research.

The model also features multi-token prediction (MTP), allowing you to predict multiple words simultaneously, increasing token speed by up to 1.8x per second. It is important to note that traditional models predict one word at a time. Perhaps one of the biggest advantages of DeepSeek-V3 is its open source nature. This model provides researchers, developers, and businesses with unlimited access to its capabilities. Essentially, this gives smaller players access to high-performance AI tools, allowing them to compete with larger players.

Benchmark performance of DeepSeek-V3. Benchmark performance of DeepSeek-V3.

performance

In terms of performance, DeepSeek has compared this model to its peers (Claude-3.5, GPT-4o, Qwen2.5, Llama3.1, etc.) and shows exceptional performance across benchmarks. DeepSeek-V3 competes directly with established closed-source models such as OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet, and outperforms them in several key areas. When it comes to math and coding, this model outperformed its competitors on benchmarks like MATH-500 and LiveCodeBench. This shows the model’s excellent problem-solving and programming abilities. Additionally, this model also excels at tasks that require understanding long texts. In the Chinese language task, this model showed exceptional strength.

In terms of limitations, DeepSeek-V3 can require large amounts of computational resources. Although faster than previous versions, the model’s real-time inference capabilities reportedly require further optimization. Some users also claimed that the focus on excelling in Chinese tasks affected their performance on English fact-based benchmarks.

big picture

The United States and China are at the forefront of an AI arms race. US export controls limit China’s access to advanced NVIDIA AI chips in an effort to thwart advances in AI. Now, with the innovation of DeepSeek-V3, the restriction may not have been as effective as intended. The model is completely open source, which also raises questions about the safety and impact of releasing powerful AI models to the public. The new model also signals a paradigm shift, making it possible to train powerful AI models without prohibitive investments. This shows how open source AI continues to challenge closed model developers like OpenAI and Anthropic.

DeepSeek-V3 models are available free of charge to developers, researchers, and businesses. It can be accessed via GitHub.

Discover the benefits of subscription!

Stay informed with access to our award-winning journalism.

Avoid misinformation with reliable, accurate reporting.

Make smarter decisions with key insights.

Choose your subscription package



Source link

Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
Previous ArticleAI gathering will increase NVIDIA’s market value by $2 trillion in 2024
Next Article LONGi selected for 2024 Fortune Tech 50 list
Adnan Mahar
  • Website

Adnan is a passionate doctor from Pakistan with a keen interest in exploring the world of politics, sports, and international affairs. As an avid reader and lifelong learner, he is deeply committed to sharing insights, perspectives, and thought-provoking ideas. His journey combines a love for knowledge with an analytical approach to current events, aiming to inspire meaningful conversations and broaden understanding across a wide range of topics.

Related Posts

Dig into Google Deepmind CEO “Shout Out” Chip Engineers and Openai CEO Sam Altman, Sundar Pichai responds with emojis

June 1, 2025

Google, Nvidia invests in AI startup Safe Superintelligence, co-founder of Openai Ilya Sutskever

April 14, 2025

This $30 billion AI startup can be very strange by a man who said that neural networks may already be aware of it

February 24, 2025
Leave A Reply Cancel Reply

Top Posts

President Trump’s SEC nominee Paul Atkins marries multi-billion dollar roof fortune

December 14, 2024100 Views

20 Most Anticipated Sex Movies of 2025

January 22, 202588 Views

Alice Munro’s Passive Voice | New Yorker

December 23, 202456 Views

2025 Best Actress Oscar Predictions

December 12, 202434 Views
Don't Miss
AI June 1, 2025

Dig into Google Deepmind CEO “Shout Out” Chip Engineers and Openai CEO Sam Altman, Sundar Pichai responds with emojis

Demis Hassabis, CEO of Google Deepmind, has expanded public approval to its chip engineers, highlighting…

Google, Nvidia invests in AI startup Safe Superintelligence, co-founder of Openai Ilya Sutskever

This $30 billion AI startup can be very strange by a man who said that neural networks may already be aware of it

As Deepseek and ChatGpt Surge, is Delhi behind?

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to Karachi Chronicle, your go-to source for the latest and most insightful updates across a range of topics that matter most in today’s fast-paced world. We are dedicated to delivering timely, accurate, and engaging content that covers a variety of subjects including Sports, Politics, World Affairs, Entertainment, and the ever-evolving field of Artificial Intelligence.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

Today’s Earthquake: Massive Trembling 3.8 Jorz Afghanistan

Do you have three years of citizenship? The new transition in Germany, Visa Freeze rules explained |

OPEC countries will “implement production adjustment” for 411K BPD in July

Most Popular

ATUA AI (TUA) develops cutting-edge AI infrastructure to optimize distributed operations

October 11, 20020 Views

10 things you should never say to an AI chatbot

November 10, 20040 Views

Character.AI faces lawsuit over child safety concerns

December 12, 20050 Views
© 2025 karachichronicle. Designed by karachichronicle.
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions

Type above and press Enter to search. Press Esc to cancel.