Close Menu
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

Chip stocks rise after earnings, Nvidia H200 approved in China

India is betting big on homegrown AI as Dell and NVIDIA ramp up NxtGen’s giant AI factory

Visual reasoning added to Gemini Flash models

Facebook X (Twitter) Instagram
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram Pinterest Vimeo
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World
Karachi Chronicle
You are at:Home » URM shows that small-scale recursive models can outperform large-scale LLMs on inference tasks
AI

URM shows that small-scale recursive models can outperform large-scale LLMs on inference tasks

Adnan MaharBy Adnan MaharDecember 22, 2025No Comments5 Mins Read2 Views
Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


small recursive inference model

This article is part of our coverage of the latest AI research.

Ubiquant researchers have proposed a new deep learning architecture that improves the ability of AI models to solve complex inference tasks. Their architecture, the Universal Reasoning Model (URM), is an improvement on the Universal Transformer (UT) framework used by other research teams to tackle difficult benchmarks such as ARC-AGI and Sudoku.

While recent models such as hierarchical reasoning models (HRMs) and minimal recursive models (TRMs) highlight the potential of recurrent architectures, the Ubiquant team has identified key areas where these models can be optimized. The resulting approach significantly improved inference performance compared to these existing small-scale inference models and achieved best-in-class results on inference benchmarks.

For universal transformers

To understand URM, you first need to understand Universal Transformer (UT) and how it differs from the standard architecture used by most large language models (LLMs). Standard transformer models process data by passing it through a stack of separate layers. Each layer has its own set of parameters.

In contrast, UT applies a single layer (often called a transition block) repeatedly in a loop to refine the token representation. This weighting mechanism allows the model to perform iterative inference without increasing the number of parameters, making it theoretically more expressive for tasks that require deep thought.

Recent iterations of this concept, such as HRM and TRM, have shown that small-scale UT-based models can outperform much larger standard transformers on inference tasks.

Minimal Recursive Model (TRM)
Minimal Recursive Model (TRM) (Source: arXiv)

However, the authors of the URM paper argue that the specific causes of these performance improvements have been misunderstood. While previous research has attributed success to sophisticated architectural design, Ubiquant researchers found that the improvement primarily stems from the “repetition-induced bias” inherent in universal transformers. In other words, this advantage comes from the model’s ability to reuse the exact same parameters to refine the thought process.

Furthermore, their analysis revealed that “nonlinear depth calculations” play a much larger role than previously realized. Specifically, the feedforward network (MLP), rather than the attention mechanism (a key component of the transformer block), constitutes the main source of the representational nonlinearity required for complex inference.

Strengthening the inference loop

Based on these insights, URM introduces two key innovations to the UT framework: the ConvSwiGLU module and Truncated Backpropagation Through Loops (TBPTL).

Universal Reasoning Model (URM)
Universal Reasoning Model (URM) (Source: arXiv)

The first innovation addresses the limitations of the standard SwiGLU activation function used in MLP blocks in modern transformers. SwiGLU provides the necessary nonlinearity, but it is a point-by-point operation that processes all tokens independently, preventing mixing of information between different tokens within that particular layer. The researchers enhanced this by inserting a short depthwise convolution within the MLP block. This ConvSwiGLU mechanism forces local context interactions and allows the model to mix information across channels in the token space without significantly increasing computational complexity.

The second innovation, truncated backpropagation with loops, addresses the training instability inherent in recurrent models. Deep inference requires many iterations because URM relies on loops in the same layer. However, computing gradients over long chains of loops can lead to noise accumulation and optimization problems. To solve this, the researchers split the rollout into a “transfer-only” segment and a “trainable” segment.

If you enjoyed this article, consider supporting TechTalks with a paid subscription (and get access to subscriber-only posts).

For a setup with eight inner loops, we found that the best balance was to run the first two loops as forward-only and compute gradients only for the last six loops. This technique stabilizes training by ignoring noisy gradients in the early stages of the inference process, allowing the model to focus on later stages of refinement.

URM in action

Together, these architectural changes resulted in significant performance improvements over previous UT-based approaches. On the ARC-AGI 1 benchmark, URM achieved 53.8% in one run (pass@1), significantly outperforming TRM (40.0%) and HRM (34.4%). URM took an even bigger lead in ARC-AGI 2, earning a 16.0% pass rate (1), nearly tripling HRM scores and more than doubling TRM. The Sudoku benchmark showed a similar advantage, reaching an accuracy of 77.6%.

Beyond raw accuracy, the results highlight the efficiency of the iterative approach. The researchers showed that a UT with only four times the parameters of the basic transformer block can achieve a pass@1 score of 40.0, dramatically outperforming a vanilla transformer with 32 blocks.

The researchers also note that “simply scaling depth and width in vanilla transformers yields diminishing returns and can even lead to performance degradation. This highlights a fundamental inefficiency in how parameters are used to support multi-step inference.”

The researchers have published the URM code on GitHub.

URM performance
URM outperforms other universal transformers on key inference benchmarks (source: arXiv)

This finding shows that iterative computation is often more beneficial than simply adding independent layers. As the authors explain, “In a standard Transformer, the additional FLOPs are often spent on redundant refinement in upper layers, whereas in iterative computations, the same budget translates into an effective increase in depth.” It is worth noting that URM and UT are still far behind frontier models in inference benchmarks such as ARC-AGI. Poetiq recently developed an improved technology for ARC-AGI-2 that significantly outperforms URM by 54%. Universal Transformer models are specifically trained for ARC-AGI type problems (even if they are not overfitted to a particular dataset), making them useless for the general applications that Frontier LLMs tackle. But these are new experiments that show how new architectures and approaches can address complex problems at a fraction of the compute and memory budgets previously required. It will be interesting to see the new research directions and applications that the UT model leads to.

Something like this:

like Loading…



Source link

Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
Previous ArticleStarmer lacks a coherent social mobility plan, top government adviser says
Next Article Who is Tariq Rahman? Khaleda Zia’s exiled son returns to Bangladeshi politics
Adnan Mahar
  • Website

Adnan is a passionate doctor from Pakistan with a keen interest in exploring the world of politics, sports, and international affairs. As an avid reader and lifelong learner, he is deeply committed to sharing insights, perspectives, and thought-provoking ideas. His journey combines a love for knowledge with an analytical approach to current events, aiming to inspire meaningful conversations and broaden understanding across a wide range of topics.

Related Posts

Visual reasoning added to Gemini Flash models

January 28, 2026

Mozilla, OpenAI builds an AI “rebel alliance” against Anthropic

January 27, 2026

Meta signs nuclear energy contract to power Prometheus AI supercluster

January 9, 2026
Leave A Reply Cancel Reply

Top Posts

20 Most Anticipated Sex Movies of 2025

January 22, 2025868 Views

President Trump’s SEC nominee Paul Atkins marries multi-billion dollar roof fortune

December 14, 2024134 Views

How to tell the difference between fake and genuine Adidas Sambas

December 26, 2024133 Views

Alice Munro’s Passive Voice | New Yorker

December 23, 202490 Views
Don't Miss
AI January 28, 2026

Visual reasoning added to Gemini Flash models

Google has added Agentic Vision functionality to its Gemini 3 Flash model. The company says…

Mozilla, OpenAI builds an AI “rebel alliance” against Anthropic

Meta signs nuclear energy contract to power Prometheus AI supercluster

DeepMind’s internal story

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to Karachi Chronicle, your go-to source for the latest and most insightful updates across a range of topics that matter most in today’s fast-paced world. We are dedicated to delivering timely, accurate, and engaging content that covers a variety of subjects including Sports, Politics, World Affairs, Entertainment, and the ever-evolving field of Artificial Intelligence.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

Chip stocks rise after earnings, Nvidia H200 approved in China

India is betting big on homegrown AI as Dell and NVIDIA ramp up NxtGen’s giant AI factory

Visual reasoning added to Gemini Flash models

Most Popular

Anthropic agrees to work with music publishers to prevent copyright infringement

December 16, 20070 Views

Elon Musk launches new UK AI technology company amid speculation he is planning to donate millions to Nigel Farage’s Reform Party

July 14, 20170 Views

chatgpt makers claim data breach claims “seriously”

July 14, 20170 Views
© 2026 karachichronicle. Designed by karachichronicle.
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions

Type above and press Enter to search. Press Esc to cancel.