Close Menu
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

Google’s Gemini AI is on TV

The petition filed against Imrankan’s X-post during his imprisonment at Islamabad HC has called for investigation and removal of content

Three new peacock movies with at least 90% rotten tomatoes (September 2025)

Facebook X (Twitter) Instagram
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram Pinterest Vimeo
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World
Karachi Chronicle
You are at:Home » Google Deepmind’s new benchmark assesses the factuality of LLM
AI

Google Deepmind’s new benchmark assesses the factuality of LLM

Adnan MaharBy Adnan MaharDecember 23, 2024No Comments3 Mins Read1 Views
Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


A new benchmarking tool, FACTS Grounding, was recently announced as a collaboration between Google DeepMind and Google Research. Evaluate LLM factual accuracy.

Introducing FACTS Grounding. New benchmark launched in collaboration with @GoogleDeepMind to evaluate LLM’s factual accuracy on over 1700 tasks. 🧠📐 pic.twitter.com/MvyRbbuMwK

— Kaggle (@kaggle) December 17, 2024

The FACTS Grounding benchmark and associated leaderboards aim to measure how well an AI model generates responses based on the source material provided. This initiative addresses challenges such as misinformation and hallucinations in AI-generated content.

“To track your progress, we’re also launching FACTS leaderboards on Kaggle,” the developer announced on its blog.

This is intended to increase confidence in LLM and limit its application in the real world, as LLM is prone to hallucinating false information, especially when given complex inputs.

95% more reliable results

The FACTS Grounding evaluation process reveals detailed insights into the factual accuracy of leading language models.

Models tested include Gemini 1.5 Pro and Flash (Gemini Team), Gemini 2.0 Flash Experimental, GPT-4o (OpenAI), OpenAI o1-preview and o1-mini, Claude 3.5 Haiku and Sonnet (Anthropic) .

Source: Official blog

During the aggregation process, the model was found to outperform the output of competing models by an average of 3.23% more. This is a trend observed in previous studies. To counter this bias, a multiple judge model was adopted to increase the computational cost while ensuring the fairness of the evaluation.

Disqualifying an ineligible answer decreased the final fact score by 1% to 5%. This adjustment also caused a slight change in the model rankings, with Gemini 1.5 Flash dropping from 1st to 2nd place. In any case, 95% confidence intervals were presented.

Google tells its Gemini AI testers to “address” any prompts they don’t understand, suggesting they assess their understanding and note any confusion.

The company assures us that this approach does not compromise Gemini’s accuracy, pointing to the newly introduced FACTS Grounding… pic.twitter.com/VcmSIZqR8t

— Daniel Gabai (@DanielGabai_) December 20, 2024

Model rankings were determined through a “fusion rank” metric that aggregates the individual rankings from the various splits and uses the Condorcet algorithm to judge the models.

How was the test conducted?

This benchmark consists of 1,719 examples that test the model on various tasks such as summarization, question answering, and rewriting.

Datasets and methodologies prioritize real-world applicability, and tasks span finance, law, and technology. Automated evaluation includes multiple decision models to evaluate model performance.

chart visualization
chart visualization

Your answer will be disqualified if you fail to adequately address your question or if the content you provide lacks evidence.

Will Google take the lead?

Google has launched several other major developments this year. This makes Google DeepMind the leader in the AGI race, surpassing OpenAI and its rivals.

The company announced a series of breakthrough innovations, including its latest quantum chip Willow, and the advanced Gemini Flash 2, Pro, and Agent. We also introduced Project Astra and Project Mariner, demonstrating our commitment to cutting-edge research.

Further advances include text-to-video model Veo 2 and text-to-image model Imagen 3, demonstrating progress in generative AI. Additionally, the Gemini 2.0 Flash Thinking framework represents a major advance in model inference and robotics.

This latest FACTS Grounding benchmark is seen as an important step towards promoting trustworthiness and accuracy of AI-generated content.





Source link

Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
Previous ArticleAlice Munro’s Passive Voice | New Yorker
Next Article Janhvi, Khushi and Sanaya set style goals at Ambani event
Adnan Mahar
  • Website

Adnan is a passionate doctor from Pakistan with a keen interest in exploring the world of politics, sports, and international affairs. As an avid reader and lifelong learner, he is deeply committed to sharing insights, perspectives, and thought-provoking ideas. His journey combines a love for knowledge with an analytical approach to current events, aiming to inspire meaningful conversations and broaden understanding across a wide range of topics.

Related Posts

Google’s Gemini AI is on TV

September 22, 2025

Google Deepmind is a “historical” AI breakthrough in problem solving | Artificial Intelligence (AI)

September 17, 2025

Openai and Oracle reportedly have ink in historic cloud computing contracts

September 10, 2025
Leave A Reply Cancel Reply

Top Posts

20 Most Anticipated Sex Movies of 2025

January 22, 2025450 Views

President Trump’s SEC nominee Paul Atkins marries multi-billion dollar roof fortune

December 14, 2024122 Views

How to tell the difference between fake and genuine Adidas Sambas

December 26, 202484 Views

Alice Munro’s Passive Voice | New Yorker

December 23, 202474 Views
Don't Miss
AI September 22, 2025

Google’s Gemini AI is on TV

Gemini, Google’s AI assistant, is on TV. On Monday, the company announced it would be…

Google Deepmind is a “historical” AI breakthrough in problem solving | Artificial Intelligence (AI)

Openai and Oracle reportedly have ink in historic cloud computing contracts

The UAE demonstrates its capabilities in AI reasoning using K2 Think Model

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to Karachi Chronicle, your go-to source for the latest and most insightful updates across a range of topics that matter most in today’s fast-paced world. We are dedicated to delivering timely, accurate, and engaging content that covers a variety of subjects including Sports, Politics, World Affairs, Entertainment, and the ever-evolving field of Artificial Intelligence.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

Google’s Gemini AI is on TV

The petition filed against Imrankan’s X-post during his imprisonment at Islamabad HC has called for investigation and removal of content

Three new peacock movies with at least 90% rotten tomatoes (September 2025)

Most Popular

10 things you should never say to an AI chatbot

November 10, 20040 Views

Character.AI faces lawsuit over child safety concerns

December 12, 20050 Views

Analyst warns Salesforce investors about AI agent optimism

July 1, 20070 Views
© 2025 karachichronicle. Designed by karachichronicle.
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions

Type above and press Enter to search. Press Esc to cancel.