Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI. learn more
Google has quietly released a major update to its popular artificial intelligence model Gemini, explaining its inference process, setting new performance records on mathematical and scientific tasks, and offering a free alternative to OpenAI’s premium service. Now offering alternative services.
The new Gemini 2.0 Flash Thinking model was released in Google AI Studio on Tuesday under the experimental designation “Exp-01-21” and achieved a score of 73.3% on the American Invitational Mathematics Exam (AIME) and 74.2% on the GPQA Diamondscience I did. benchmark. These results represent a clear improvement over previous AI models and demonstrate our growing strength in advanced inference.
Demis Hassabis, CEO of Google DeepMind, said: “We have been pioneering this type of planning system for over 10 years, starting with programs like AlphaGo. We’re excited to see this powerful combination of models.” , in a post on X.com (formerly Twitter).
The latest update of the Gemini 2.0 Flash Thinking model (available here: https://t.co/Rr9DvqbUdO) scored 73.3% on the AIME (math) benchmark and 74.2% on the GPQA Diamond (science) benchmark . Thank you for your feedback. This represents super rapid progress since the initial release just the other day… pic.twitter.com/cM1gNwBoTO
— Demis Hassabis (@demishassabis) January 21, 2025
Gemini 2.0 Flash Thinking breaks record with 1 million tokens processed
The most notable feature of this model is its ability to process up to 1 million tokens of text while maintaining faster response times. This is 5 times more than OpenAI’s o1 Pro model. This expanded context window allows models to analyze multiple research papers and extensive datasets simultaneously, potentially transforming the way researchers and analysts work with large amounts of information.
“As a first experiment, I took various religious and philosophical texts and asked Gemini 2.0 Flash Thinking to weave them together and extract novel and unique insights,” says the model. Dan Mack, the AI researcher who tested it, said in a post on X.com. . “A total of 970,000 tokens were processed. The results are very impressive.”
This release comes at a key moment in the evolution of the AI industry. OpenAI recently announced an o3 model that achieved a score of 87.7% on the GPQA Diamond benchmark. But Google’s decision to make its model available for free (with limited usage) during beta testing could attract developers and companies looking for an alternative to OpenAI’s $200 monthly subscription.

Google offers Gemini 2.0 Flash Thinking with built-in code execution for free
Jeff Dean, Principal Scientist at Google DeepMind, emphasized improving model reliability, saying, “Iterations can be used to increase confidence and reduce discrepancies between what the model thinks and the final answer.” I continue,” he wrote.
This model also includes native code execution capabilities, allowing developers to run and test code directly within the system. This feature, combined with improved conflict protection measures, positions Gemini 2.0 Flash Thinking as a strong candidate for both research and commercial applications.
Industry analysts say Google’s focus on explaining the inference process could help address growing concerns about the transparency and trustworthiness of AI. Unlike traditional “black box” models, Gemini 2.0 Flash Thinking presents its results and makes it easy for users to understand and verify its conclusions.
We continue to iterate to increase confidence and reduce discrepancies between the model’s ideas and the final answer.
Check it out as gemini-2.0-flash-Thinking-exp-01-21 at https://t.co/sw0jY6k74m.
— Jeff Dean (@JeffDean) January 21, 2025
AI transparency becomes new battleground as Google takes on OpenAI
The model has already taken the top spot on the Chatbot Arena leaderboard, a prominent benchmark for AI performance, topping categories such as hard prompts, coding, and creative writing.
However, questions remain about the model’s actual performance and limitations. Although benchmark scores provide valuable metrics, they do not necessarily translate directly to real-world applications. Google’s challenge is to convince business customers that its free services match or exceed the features of its premium products.
As the AI arms race heats up, Google’s latest release signals a shift in its strategy to combine advanced features with accessibility. It remains to be seen whether this approach will help close the gap with OpenAI, but it certainly gives technical decision makers a compelling reason to reconsider their AI partnerships.
For now, one thing is clear. That means we’re in an age where AI is showing us what it’s doing, and it’s available to anyone with a Google account.