Rather than immediately generating a direct response to user input, the inference model is trained to generate an intermediate “inference step” first before reaching the final answer provided to the user. You will see some inference in the LLMS, but you will see traces of inference, while others will summarise or hide these intermediate outputs completely.
Simply put, LLMS reasoning is trained to spend more time on “thinking” before responding. This addition of “inference process” has been empirically shown to bring significant advances in LLM performance in complex inference tasks. This success has expanded the real-world use cases and domains in which AI models can be applied, marking key inflection points in the ongoing development of generated AI and AI agents.
It is noteworthy, however, that anthropomorphic terms like the “thinking process” of models are more convenient than literally. Like all machine learning models, the inference model ultimately simply applies sophisticated algorithms to make predictions that reflect the patterns learned from the training data. Inference LLM does not demonstrate any other indications of consciousness or artificial general information (AGI). An AI research published by Apple in June 2025 raises questions about whether the inference capabilities of current models can be extended to truly “generalisable” inference.
It is perhaps most accurate to say that inference LLM is trained to “denote one’s work” by generating a series of tokens (words) similar to the human thought process.
The concept of “inference model” was introduced in September 2024 by Openai’s O1-Preview (and O1-Mini), followed by Alibaba’s “Qwen” (QWQ-32B-Preview) in November and Gemini 2.0 Flash experiments in December. A milestone in the development of LLMS inference was the January 2025 release of the open source DeepSeek-R1 model. While the training process used to fine-tune previous inference models was closely guarded, Deepseek has released a detailed technical paper that provides a blueprint for other model developers. IBM Granite, Humanity and Mystraral AI have since released their own reasoning LLM.