Close Menu
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

How AI assistance impacts the formation of coding skills \ Anthropic

Chip stocks rise after earnings, Nvidia H200 approved in China

India is betting big on homegrown AI as Dell and NVIDIA ramp up NxtGen’s giant AI factory

Facebook X (Twitter) Instagram
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram Pinterest Vimeo
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World
Karachi Chronicle
You are at:Home » Google Deepmind’s new benchmark assesses the factuality of LLM
AI

Google Deepmind’s new benchmark assesses the factuality of LLM

Adnan MaharBy Adnan MaharDecember 23, 2024No Comments3 Mins Read3 Views
Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


A new benchmarking tool, FACTS Grounding, was recently announced as a collaboration between Google DeepMind and Google Research. Evaluate LLM factual accuracy.

Introducing FACTS Grounding. New benchmark launched in collaboration with @GoogleDeepMind to evaluate LLM’s factual accuracy on over 1700 tasks. 🧠📐 pic.twitter.com/MvyRbbuMwK

— Kaggle (@kaggle) December 17, 2024

The FACTS Grounding benchmark and associated leaderboards aim to measure how well an AI model generates responses based on the source material provided. This initiative addresses challenges such as misinformation and hallucinations in AI-generated content.

“To track your progress, we’re also launching FACTS leaderboards on Kaggle,” the developer announced on its blog.

This is intended to increase confidence in LLM and limit its application in the real world, as LLM is prone to hallucinating false information, especially when given complex inputs.

95% more reliable results

The FACTS Grounding evaluation process reveals detailed insights into the factual accuracy of leading language models.

Models tested include Gemini 1.5 Pro and Flash (Gemini Team), Gemini 2.0 Flash Experimental, GPT-4o (OpenAI), OpenAI o1-preview and o1-mini, Claude 3.5 Haiku and Sonnet (Anthropic) .

Source: Official blog

During the aggregation process, the model was found to outperform the output of competing models by an average of 3.23% more. This is a trend observed in previous studies. To counter this bias, a multiple judge model was adopted to increase the computational cost while ensuring the fairness of the evaluation.

Disqualifying an ineligible answer decreased the final fact score by 1% to 5%. This adjustment also caused a slight change in the model rankings, with Gemini 1.5 Flash dropping from 1st to 2nd place. In any case, 95% confidence intervals were presented.

Google tells its Gemini AI testers to “address” any prompts they don’t understand, suggesting they assess their understanding and note any confusion.

The company assures us that this approach does not compromise Gemini’s accuracy, pointing to the newly introduced FACTS Grounding… pic.twitter.com/VcmSIZqR8t

— Daniel Gabai (@DanielGabai_) December 20, 2024

Model rankings were determined through a “fusion rank” metric that aggregates the individual rankings from the various splits and uses the Condorcet algorithm to judge the models.

How was the test conducted?

This benchmark consists of 1,719 examples that test the model on various tasks such as summarization, question answering, and rewriting.

Datasets and methodologies prioritize real-world applicability, and tasks span finance, law, and technology. Automated evaluation includes multiple decision models to evaluate model performance.

chart visualization
chart visualization

Your answer will be disqualified if you fail to adequately address your question or if the content you provide lacks evidence.

Will Google take the lead?

Google has launched several other major developments this year. This makes Google DeepMind the leader in the AGI race, surpassing OpenAI and its rivals.

The company announced a series of breakthrough innovations, including its latest quantum chip Willow, and the advanced Gemini Flash 2, Pro, and Agent. We also introduced Project Astra and Project Mariner, demonstrating our commitment to cutting-edge research.

Further advances include text-to-video model Veo 2 and text-to-image model Imagen 3, demonstrating progress in generative AI. Additionally, the Gemini 2.0 Flash Thinking framework represents a major advance in model inference and robotics.

This latest FACTS Grounding benchmark is seen as an important step towards promoting trustworthiness and accuracy of AI-generated content.





Source link

Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
Previous ArticleAlice Munro’s Passive Voice | New Yorker
Next Article Janhvi, Khushi and Sanaya set style goals at Ambani event
Adnan Mahar
  • Website

Adnan is a passionate doctor from Pakistan with a keen interest in exploring the world of politics, sports, and international affairs. As an avid reader and lifelong learner, he is deeply committed to sharing insights, perspectives, and thought-provoking ideas. His journey combines a love for knowledge with an analytical approach to current events, aiming to inspire meaningful conversations and broaden understanding across a wide range of topics.

Related Posts

How AI assistance impacts the formation of coding skills \ Anthropic

January 29, 2026

Visual reasoning added to Gemini Flash models

January 28, 2026

Mozilla, OpenAI builds an AI “rebel alliance” against Anthropic

January 27, 2026
Leave A Reply Cancel Reply

Top Posts

20 Most Anticipated Sex Movies of 2025

January 22, 2025868 Views

President Trump’s SEC nominee Paul Atkins marries multi-billion dollar roof fortune

December 14, 2024134 Views

How to tell the difference between fake and genuine Adidas Sambas

December 26, 2024133 Views

Alice Munro’s Passive Voice | New Yorker

December 23, 202490 Views
Don't Miss
AI January 29, 2026

How AI assistance impacts the formation of coding skills \ Anthropic

Research shows AI helps people do parts of their job faster. In an observational study…

Visual reasoning added to Gemini Flash models

Mozilla, OpenAI builds an AI “rebel alliance” against Anthropic

Meta signs nuclear energy contract to power Prometheus AI supercluster

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to Karachi Chronicle, your go-to source for the latest and most insightful updates across a range of topics that matter most in today’s fast-paced world. We are dedicated to delivering timely, accurate, and engaging content that covers a variety of subjects including Sports, Politics, World Affairs, Entertainment, and the ever-evolving field of Artificial Intelligence.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

How AI assistance impacts the formation of coding skills \ Anthropic

Chip stocks rise after earnings, Nvidia H200 approved in China

India is betting big on homegrown AI as Dell and NVIDIA ramp up NxtGen’s giant AI factory

Most Popular

Anthropic agrees to work with music publishers to prevent copyright infringement

December 16, 20070 Views

Elon Musk launches new UK AI technology company amid speculation he is planning to donate millions to Nigel Farage’s Reform Party

July 14, 20170 Views

chatgpt makers claim data breach claims “seriously”

July 14, 20170 Views
© 2026 karachichronicle. Designed by karachichronicle.
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions

Type above and press Enter to search. Press Esc to cancel.