Close Menu
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

The world’s largest air force with the F-35 fleet in 2025

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Among the most troublesome relationships in healthcare AI

Facebook X (Twitter) Instagram
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram Pinterest Vimeo
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World
Karachi Chronicle
You are at:Home » Google DeepMind introduces differentiable cache extensions: a coprocessor-enhanced approach to improve LLM inference and efficiency
AI

Google DeepMind introduces differentiable cache extensions: a coprocessor-enhanced approach to improve LLM inference and efficiency

Adnan MaharBy Adnan MaharDecember 27, 2024No Comments5 Mins Read0 Views
Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


Large-scale language models (LLMs) are essential for solving complex problems across language processing, mathematics, and reasoning domains. Computational technology enhancements are focused on enabling LLMs to more effectively process data and generate more accurate and context-relevant responses. As these models grow in complexity, researchers strive to develop methods that work within a fixed computational budget without sacrificing performance.

One of the major challenges in optimizing LLMs is their inability to effectively infer multiple tasks or perform computations beyond their pre-trained architecture. Current methods to improve model performance involve generating intermediate steps during task processing, often at the expense of increased latency and decreased computational efficiency. This limitation hinders the ability to perform complex inference tasks, especially those that require longer dependencies or higher prediction accuracy.

Researchers have investigated methods such as chain of thought (CoT) prompts that guide LLMs to reason step-by-step. Although CoT can be effective, it relies on sequential processing of intermediate inference steps, which slows down the computation time. KV cache compression has also been proposed to reduce memory usage, but it does little to improve inference power. Although these approaches are valuable, they highlight the need for methods that combine efficiency with enhanced inference capabilities.

Researchers at Google DeepMind have introduced a technique called Differential Cache Augmentation. This technique uses a trained coprocessor to enrich the LLM’s key-value (kv) cache with latent embeddings to enhance the model’s internal memory. The key innovation is to keep the base LLM frozen during training of coprocessors that operate asynchronously. The researchers designed this method to enhance reasoning ability without increasing the computational load during task execution.

This methodology revolves around a three-step process. First, the frozen LLM generates a kv cache from the input sequence and encapsulates its internal representation. This kv cache is passed to the coprocessor and processed using additional trainable soft tokens. These tokens are not associated with specific words and serve as abstract prompts for generating potential embeddings. Once processing is complete, the expanded kv cache is fed back to the LLM, allowing it to generate enhanced output depending on the context. This asynchronous operation efficiently applies coprocessor enhancements without delaying the core functionality of the LLM. Training the coprocessor is performed using a language modeling loss that focuses only on its parameters while preserving the integrity of the frozen LLM. This targeted approach enables scalable and effective optimization.

Performance evaluation demonstrated significant improvements. The method was tested on the Gemma-2 2B model and achieved considerable results across a variety of benchmarks. For example, on the inference-intensive GSM8K dataset, accuracy improved by 10.05% when 64 latent embeddings were used. Similarly, MMLU performance improved by 4.70% with the same configuration. These enhancements highlight the model’s ability to perform better on complex inference tasks. Furthermore, a decrease in complexity was observed at multiple token positions. For example, when 64 latent embeddings were applied, perplexity was reduced by 3.94% at position 1 and 1.20% at position 32, indicating improved predictive ability of the model over longer sequences.

Further analysis showed that the effectiveness of the expansion varied with the number of potential embeddings. For GSM8K, the accuracy improves step by step with the addition of embeddings, from 1.29% at 4 embeddings to a peak of 10.05% at 64 embeddings. Similar trends are observed in other benchmarks such as ARC and MATH, indicating that the technique is more broadly applicable. The researchers confirmed that their approach consistently outperformed the baseline model, demonstrating its robustness and adaptability, even without task-specific fine-tuning.

This study represents an important step forward in strengthening the reasoning capabilities of LLMs. Google DeepMind researchers have developed a technique to improve performance while maintaining computational efficiency by introducing external coprocessors to enhance the kv cache. This result highlights the potential of LLM to tackle more complex tasks and paves the way for further exploration of modular enhancements and scalable inference systems. This breakthrough highlights the importance of continued innovation in AI to meet the growing demands of inference-intensive applications.

Check out the paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram channel and LinkedIn group. Don’t forget to join the 60,000+ ML SubReddit.

🚨 Trending: LG AI Research releases EXAONE 3.5: 3 open source bilingual frontier AI level models that deliver unparalleled command following and long context understanding for global leadership in exceptional generative AI….

Nikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in materials from the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast and is constantly researching applications in areas such as biomaterials and biomedicine. With a strong background in materials science, he explores new advances and creates opportunities to contribute.

🧵🧵 (Download) Large-scale language model vulnerability assessment report (recommended)



Source link

Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
Previous ArticleWistron approves initial investment of $20 million for new factory line in Karnataka – Industry News
Next Article From Evolution to Revolution: Major Food Industry Changes in 2024 and Trends Changes in 2025
Adnan Mahar
  • Website

Adnan is a passionate doctor from Pakistan with a keen interest in exploring the world of politics, sports, and international affairs. As an avid reader and lifelong learner, he is deeply committed to sharing insights, perspectives, and thought-provoking ideas. His journey combines a love for knowledge with an analytical approach to current events, aiming to inspire meaningful conversations and broaden understanding across a wide range of topics.

Related Posts

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

September 25, 2025

Among the most troublesome relationships in healthcare AI

September 25, 2025

Does access to AI become a fundamental human right? Sam Altman says, “Everyone would want…”

September 23, 2025
Leave A Reply Cancel Reply

Top Posts

20 Most Anticipated Sex Movies of 2025

January 22, 2025458 Views

President Trump’s SEC nominee Paul Atkins marries multi-billion dollar roof fortune

December 14, 2024122 Views

How to tell the difference between fake and genuine Adidas Sambas

December 26, 202486 Views

Alice Munro’s Passive Voice | New Yorker

December 23, 202474 Views
Don't Miss
AI September 25, 2025

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Machine learning models can speed up discovery of new materials by making predictions and proposing…

Among the most troublesome relationships in healthcare AI

Does access to AI become a fundamental human right? Sam Altman says, “Everyone would want…”

Google’s Gemini AI is on TV

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to Karachi Chronicle, your go-to source for the latest and most insightful updates across a range of topics that matter most in today’s fast-paced world. We are dedicated to delivering timely, accurate, and engaging content that covers a variety of subjects including Sports, Politics, World Affairs, Entertainment, and the ever-evolving field of Artificial Intelligence.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

The world’s largest air force with the F-35 fleet in 2025

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Among the most troublesome relationships in healthcare AI

Most Popular

10 things you should never say to an AI chatbot

November 10, 20040 Views

Character.AI faces lawsuit over child safety concerns

December 12, 20050 Views

Analyst warns Salesforce investors about AI agent optimism

July 1, 20070 Views
© 2025 karachichronicle. Designed by karachichronicle.
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions

Type above and press Enter to search. Press Esc to cancel.