Close Menu
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

Dig into Google Deepmind CEO “Shout Out” Chip Engineers and Openai CEO Sam Altman, Sundar Pichai responds with emojis

If Schreyers Iyer is serious about the test, you should play county cricket: Monty Panesar

Qualcomm’s Snapdragon Chips gets into trouble after a judge refuses to dismiss the case

Facebook X (Twitter) Instagram
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram Pinterest Vimeo
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World
Karachi Chronicle
You are at:Home » Researchers at Google Deepmind propose quantization of Matryoshka: a method to improve deep learning efficiency by optimizing multi-cision models without sacrificing accuracy
AI

Researchers at Google Deepmind propose quantization of Matryoshka: a method to improve deep learning efficiency by optimizing multi-cision models without sacrificing accuracy

Adnan MaharBy Adnan MaharFebruary 15, 2025No Comments5 Mins Read0 Views
Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


Quantization is a key technique in deep learning to reduce computational costs and improve model efficiency. Large language models require significant processing power. This makes quantization essential to minimize memory usage and increase inference speed. Quantization reduces storage requirements by converting high-precision weights to low-bit formats such as INT8, INT4, or INT2. However, standard techniques often reduce accuracy, especially at low accuracy such as INT2. Researchers should either compromise accuracy for efficiency or maintain multiple models with different quantization levels. New strategies are strongly needed to maintain model quality while optimizing computational efficiency.

The fundamental problem with quantization is precisely dealing with precision reductions. Previously available approaches should train a unique model for each accuracy or not take advantage of the hierarchy of integer data types. As with INT2, quantization accuracy loss is the most difficult as it prevents its memory from being used extensively. LLMs such as the Gemma-2 9b and Mistral 7b are highly computationally intensive, and techniques that allow a single model to operate at multiple levels of accuracy will greatly improve efficiency. The need for high-performance, flexible quantization methods has led researchers to seek solutions other than traditional methods.

There are several quantization techniques, each with a balanced accuracy and efficiency. Training-free methods such as Minmax and GPTQ use statistical scaling to reduce the bit width to map the weights of the model without changing the parameters, but lose accuracy with low accuracy. Training-based methods such as using Quantization Aware Training (QAT) and overview gradient descent to optimize quantization parameters. Updated model parameters in QAT reduce model parameters to reduce loss of accuracy after quantification, but hydrogen-free learns to scale and shift parameters without changing the weights of the core. However, both methods require separate models for different accuracy, complicating deployment.

Researchers at Google Deepmind introduced Matryoshka Quantization (Matquant) to create a single model that works with multiple levels of accuracy. Unlike traditional methods of dealing with each bit width individually, Matquant uses shared bit representations to optimize INT8, INT4, and INT2 models. This allows you to deploy the model with different accuracy without retraining, reducing computational and storage costs. Matquant extracts low-bit models from high-bit models while maintaining accuracy by leveraging a hierarchical structure of integer data types. Testing on Gemma-2 2b, Gemma-2 9b, and Mistral 7b models showed that Matquant improves INT2 accuracy by up to 10% over standard quantization techniques such as QAT and Omniquant.

Matquant uses the most significant bits shared (MSB) to represent the weights of the model at different accuracy levels and jointly optimize them to maintain accuracy. The training process incorporates co-training and co-distillation to ensure that the INT2 representation holds important information that is usually lost in normal quantization. Instead of discarding low-bit structures, Matquant integrates them into a multi-scale optimization framework for efficient compression without compromising performance.

The experimental evaluation of Matquant demonstrates its ability to reduce the accuracy loss due to quantization. Researchers tested transformer-based LLMS methods and focused on quantization of feedforward network (FFN) parameters, a key factor in inference latency. The results show that Matquant’s INT8 and INT4 models achieve accuracy comparable to independently trained baselines, while surpassing them with INT2 accuracy. In the Gemma-2 9B model, Matquant improved the accuracy of Int2 by 8.01%, while the Mistral 7b model showed a 6.35% improvement over traditional quantization methods. This study also found that Matquant’s right-shifted quantum weight distribution increases accuracy across all bit widths, particularly benefiting low-precision models. Matquant also allows seamless bit-width interpolation and layer-by-layer mix’n’match configuration, allowing for flexible deployment based on hardware constraints.

Several important points emerge from Matquant’s research:

Multiscale quantization: Matquant introduces a new approach to quantization by training a single model that can operate at multiple levels of accuracy (e.g. INT8, INT4, INT2). Exploitation of nested bit structures: This technique can utilize the unique nested structures within integer data types to derive smaller bit-width integers from larger ones. Enhanced Low Precision Accuracy: MATQuant significantly improves the accuracy of INT2 quantization models, up to 8% more than traditional quantization methods such as QAT and Onmiquant. Versatile Applications: Matquant is compatible with existing learning-based quantization technologies such as Quantization Aware Training (QAT) and Omniquant. Proven Performance: This method was successfully applied to quantize FFN parameters of LLMs such as Gemma-2 2b, 9b, and Mistral 7b, and introduced practical utilities. Improved efficiency: Matquant allows for the creation of models that provide a better trade-off between accuracy and computational cost, making it ideal for resource-constrained environments. Pareto Optimal Tradeoff: By enabling seamless extraction of mutual bit widths such as INT6 and INT3 and enabling layer width mix, we allow dense accuracy VS-COST Pareto Optimal Tradeoff. Different accuracy.

In conclusion, Matquant presents a solution for managing multiple quantization models by utilizing a multi-scale training approach that utilizes nested structures of integer data types. This provides flexible, high performance options for low-bit quantization in efficient LLM inference. This study shows that a single model can be trained to operate at multiple accuracy levels, especially without significantly decreasing it at very low bit widths.

Check out the paper. All credits for this study will be sent to researchers in this project. Also, feel free to follow us on Twitter. Don’t forget to join 75K+ ML SubredDit.

Commended open source AI platform recommended: “Intelagent is an open source multiagent framework for evaluating complex conversational AI systems” (promotion)

Asif Razzaq is CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, ASIF is committed to leveraging the possibilities of artificial intelligence for social benefits. His latest efforts are the launch of MarkTechPost, an artificial intelligence media platform. This is distinguished by its detailed coverage of machine learning and deep learning news, and is easy to understand by a technically sound and wide audience. The platform has over 2 million views each month, indicating its popularity among viewers.

✅ (Recommended) Join the Telegram Channel



Source link

Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
Previous ArticleTrump’s Ukraine plan to participate in Macron-led European Crisis Summit | Ukraine
Next Article Is Tom Hardy in the snow? This Marvel actor replaces him with a crime thriller
Adnan Mahar
  • Website

Adnan is a passionate doctor from Pakistan with a keen interest in exploring the world of politics, sports, and international affairs. As an avid reader and lifelong learner, he is deeply committed to sharing insights, perspectives, and thought-provoking ideas. His journey combines a love for knowledge with an analytical approach to current events, aiming to inspire meaningful conversations and broaden understanding across a wide range of topics.

Related Posts

Dig into Google Deepmind CEO “Shout Out” Chip Engineers and Openai CEO Sam Altman, Sundar Pichai responds with emojis

June 1, 2025

Google, Nvidia invests in AI startup Safe Superintelligence, co-founder of Openai Ilya Sutskever

April 14, 2025

This $30 billion AI startup can be very strange by a man who said that neural networks may already be aware of it

February 24, 2025
Leave A Reply Cancel Reply

Top Posts

President Trump’s SEC nominee Paul Atkins marries multi-billion dollar roof fortune

December 14, 202499 Views

20 Most Anticipated Sex Movies of 2025

January 22, 202570 Views

Alice Munro’s Passive Voice | New Yorker

December 23, 202455 Views

2025 Best Actress Oscar Predictions

December 12, 202434 Views
Don't Miss
AI June 1, 2025

Dig into Google Deepmind CEO “Shout Out” Chip Engineers and Openai CEO Sam Altman, Sundar Pichai responds with emojis

Demis Hassabis, CEO of Google Deepmind, has expanded public approval to its chip engineers, highlighting…

Google, Nvidia invests in AI startup Safe Superintelligence, co-founder of Openai Ilya Sutskever

This $30 billion AI startup can be very strange by a man who said that neural networks may already be aware of it

As Deepseek and ChatGpt Surge, is Delhi behind?

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to Karachi Chronicle, your go-to source for the latest and most insightful updates across a range of topics that matter most in today’s fast-paced world. We are dedicated to delivering timely, accurate, and engaging content that covers a variety of subjects including Sports, Politics, World Affairs, Entertainment, and the ever-evolving field of Artificial Intelligence.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

Dig into Google Deepmind CEO “Shout Out” Chip Engineers and Openai CEO Sam Altman, Sundar Pichai responds with emojis

If Schreyers Iyer is serious about the test, you should play county cricket: Monty Panesar

Qualcomm’s Snapdragon Chips gets into trouble after a judge refuses to dismiss the case

Most Popular

ATUA AI (TUA) develops cutting-edge AI infrastructure to optimize distributed operations

October 11, 20020 Views

10 things you should never say to an AI chatbot

November 10, 20040 Views

Character.AI faces lawsuit over child safety concerns

December 12, 20050 Views
© 2025 karachichronicle. Designed by karachichronicle.
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions

Type above and press Enter to search. Press Esc to cancel.