Close Menu
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

Instead of Timothée Chalamett or Tom Holland, Sean Penn declares the Oscar-winning actress “the last movie star.” Hollywood

Does an American pope change U.S. politics? : The NPR Politics Podcast : NPR

Amazon will face Elon Musk’s Tesla with the robot launch.

Facebook X (Twitter) Instagram
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram Pinterest Vimeo
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World
Karachi Chronicle
You are at:Home » Google Deepmind’s new benchmark assesses the factuality of LLM
AI

Google Deepmind’s new benchmark assesses the factuality of LLM

Adnan MaharBy Adnan MaharDecember 23, 2024No Comments3 Mins Read1 Views
Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


A new benchmarking tool, FACTS Grounding, was recently announced as a collaboration between Google DeepMind and Google Research. Evaluate LLM factual accuracy.

Introducing FACTS Grounding. New benchmark launched in collaboration with @GoogleDeepMind to evaluate LLM’s factual accuracy on over 1700 tasks. 🧠📐 pic.twitter.com/MvyRbbuMwK

— Kaggle (@kaggle) December 17, 2024

The FACTS Grounding benchmark and associated leaderboards aim to measure how well an AI model generates responses based on the source material provided. This initiative addresses challenges such as misinformation and hallucinations in AI-generated content.

“To track your progress, we’re also launching FACTS leaderboards on Kaggle,” the developer announced on its blog.

This is intended to increase confidence in LLM and limit its application in the real world, as LLM is prone to hallucinating false information, especially when given complex inputs.

95% more reliable results

The FACTS Grounding evaluation process reveals detailed insights into the factual accuracy of leading language models.

Models tested include Gemini 1.5 Pro and Flash (Gemini Team), Gemini 2.0 Flash Experimental, GPT-4o (OpenAI), OpenAI o1-preview and o1-mini, Claude 3.5 Haiku and Sonnet (Anthropic) .

Source: Official blog

During the aggregation process, the model was found to outperform the output of competing models by an average of 3.23% more. This is a trend observed in previous studies. To counter this bias, a multiple judge model was adopted to increase the computational cost while ensuring the fairness of the evaluation.

Disqualifying an ineligible answer decreased the final fact score by 1% to 5%. This adjustment also caused a slight change in the model rankings, with Gemini 1.5 Flash dropping from 1st to 2nd place. In any case, 95% confidence intervals were presented.

Google tells its Gemini AI testers to “address” any prompts they don’t understand, suggesting they assess their understanding and note any confusion.

The company assures us that this approach does not compromise Gemini’s accuracy, pointing to the newly introduced FACTS Grounding… pic.twitter.com/VcmSIZqR8t

— Daniel Gabai (@DanielGabai_) December 20, 2024

Model rankings were determined through a “fusion rank” metric that aggregates the individual rankings from the various splits and uses the Condorcet algorithm to judge the models.

How was the test conducted?

This benchmark consists of 1,719 examples that test the model on various tasks such as summarization, question answering, and rewriting.

Datasets and methodologies prioritize real-world applicability, and tasks span finance, law, and technology. Automated evaluation includes multiple decision models to evaluate model performance.

chart visualization
chart visualization

Your answer will be disqualified if you fail to adequately address your question or if the content you provide lacks evidence.

Will Google take the lead?

Google has launched several other major developments this year. This makes Google DeepMind the leader in the AGI race, surpassing OpenAI and its rivals.

The company announced a series of breakthrough innovations, including its latest quantum chip Willow, and the advanced Gemini Flash 2, Pro, and Agent. We also introduced Project Astra and Project Mariner, demonstrating our commitment to cutting-edge research.

Further advances include text-to-video model Veo 2 and text-to-image model Imagen 3, demonstrating progress in generative AI. Additionally, the Gemini 2.0 Flash Thinking framework represents a major advance in model inference and robotics.

This latest FACTS Grounding benchmark is seen as an important step towards promoting trustworthiness and accuracy of AI-generated content.





Source link

Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
Previous ArticleAlice Munro’s Passive Voice | New Yorker
Next Article Janhvi, Khushi and Sanaya set style goals at Ambani event
Adnan Mahar
  • Website

Adnan is a passionate doctor from Pakistan with a keen interest in exploring the world of politics, sports, and international affairs. As an avid reader and lifelong learner, he is deeply committed to sharing insights, perspectives, and thought-provoking ideas. His journey combines a love for knowledge with an analytical approach to current events, aiming to inspire meaningful conversations and broaden understanding across a wide range of topics.

Related Posts

Google, Nvidia invests in AI startup Safe Superintelligence, co-founder of Openai Ilya Sutskever

April 14, 2025

This $30 billion AI startup can be very strange by a man who said that neural networks may already be aware of it

February 24, 2025

As Deepseek and ChatGpt Surge, is Delhi behind?

February 18, 2025
Leave A Reply Cancel Reply

Top Posts

President Trump’s SEC nominee Paul Atkins marries multi-billion dollar roof fortune

December 14, 202495 Views

Alice Munro’s Passive Voice | New Yorker

December 23, 202453 Views

2025 Best Actress Oscar Predictions

December 12, 202434 Views

20 Most Anticipated Sex Movies of 2025

January 22, 202533 Views
Don't Miss
AI April 14, 2025

Google, Nvidia invests in AI startup Safe Superintelligence, co-founder of Openai Ilya Sutskever

Alphabet and Nvidia are investing in Safe Superintelligence (SSI), a stealth mode AI startup co-founded…

This $30 billion AI startup can be very strange by a man who said that neural networks may already be aware of it

As Deepseek and ChatGpt Surge, is Delhi behind?

Openai’s Sam Altman reveals his daily use of ChatGpt, and that’s not what you think

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to Karachi Chronicle, your go-to source for the latest and most insightful updates across a range of topics that matter most in today’s fast-paced world. We are dedicated to delivering timely, accurate, and engaging content that covers a variety of subjects including Sports, Politics, World Affairs, Entertainment, and the ever-evolving field of Artificial Intelligence.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

Instead of Timothée Chalamett or Tom Holland, Sean Penn declares the Oscar-winning actress “the last movie star.” Hollywood

Does an American pope change U.S. politics? : The NPR Politics Podcast : NPR

Amazon will face Elon Musk’s Tesla with the robot launch.

Most Popular

ATUA AI (TUA) develops cutting-edge AI infrastructure to optimize distributed operations

October 11, 20020 Views

10 things you should never say to an AI chatbot

November 10, 20040 Views

Character.AI faces lawsuit over child safety concerns

December 12, 20050 Views
© 2025 karachichronicle. Designed by karachichronicle.
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions

Type above and press Enter to search. Press Esc to cancel.