Close Menu
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

The world’s largest air force with the F-35 fleet in 2025

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Among the most troublesome relationships in healthcare AI

Facebook X (Twitter) Instagram
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram Pinterest Vimeo
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World
Karachi Chronicle
You are at:Home » How can we test AI for human-level intelligence? OpenAI’s o3 inspires exploration
AI

How can we test AI for human-level intelligence? OpenAI’s o3 inspires exploration

Adnan MaharBy Adnan MaharJanuary 14, 2025No Comments4 Mins Read0 Views
Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


3D human head made of interconnected cube-shaped particles.

Some researchers believe that AI systems will soon reach human-level intelligence. Others think it’s far.Credit: Getty

Technology company OpenAI made headlines last month after its latest experimental chatbot model, o3, received high scores in tests marking progress toward artificial general intelligence (AGI). OpenAI’s o3 score of 87.5% exceeds the previous highest score for an artificial intelligence (AI) system of 55.5%.

How close is AI to human-level intelligence?

This is a “true breakthrough,” said the author of the Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI)1 in 2019 while working at Mountain View, Calif.-based Google. said AI researcher François Cholet, who created the test called. Although high test scores do not mean that AGI, broadly defined as a computing system that can reason, plan, and learn skills like humans, has been achieved, o3 recognizes that reasoning and learning is “Absolutely” has the ability to do so, Chollet said. “It has tremendous generalizability.”

Researchers have been amazed by o3’s performance in a variety of tests and benchmarks, including the extremely difficult FrontierMath test published by virtual research institute Epoch AI in November. “This is very impressive,” says David Lane, an AI benchmarking researcher with the Berkeley, Calif.-based Model Evaluation and Threat Research Group.

ChatGPT breaks the Turing test — the race is on for new ways to evaluate AI

However, many, including Rein, caution that it is difficult to determine whether the ARC-AGI test truly measures AI’s reasoning and generalization abilities. “There were a lot of benchmarks that were meant to measure things that were fundamental to intelligence, but it turned out that they weren’t,” Lane said. The search for better tests continues, he says.

San Francisco-based OpenAI hasn’t disclosed how o3 works, but the system comes on the heels of the company’s o1 model and uses “chain of thought” logic to create a series of inference steps. Solve problems by engaging yourself in dialogue. Some experts believe that o3 generates a series of different thought chains that help squeeze the best answer from a variety of options.

Chollet, who is now based in Seattle, Washington, says that giving yourself more time to elaborate on your answers during a test can make a big difference in your results. But o3 comes at a huge cost. Each task on the ARC-AGI test took an average of 14 minutes in its high-score mode and likely cost thousands of dollars. (Cholet said computing costs are estimated based on how much OpenAI charges customers per token or word, which is determined by factors such as power usage and hardware costs.) This “raises sustainability concerns,” said Xiang Yue of Carnegie Mellon University. Pittsburgh, PA, is researching large-scale language models (LLMs) to power chatbots.

overall smart

Although the term AGI is often used to describe computing systems that meet or exceed human cognitive capacity across a wide range of tasks, no technical definition exists. As a result, there is no consensus on when AI tools achieve AGI. Some say that moment has already arrived. Some say it’s still a long way off.

Many tests have been developed to track progress towards AGI. Some, including Rein’s 2023 Google-Proof Q&A2, are aimed at evaluating the performance of AI systems on doctoral-level scientific problems. OpenAI’s 2024 MLE Bench pits AI systems against 75 challenges hosted on Kaggle, an online data science competition platform. Challenges include real-world problems such as translating ancient scrolls and developing vaccines3.

Before and After: An example test where the user extrapolates a diagonal line bouncing off a red wall. ARC-AGI is a test that aims to mark advances in artificial intelligence tools towards human-level reasoning and learning, showing users a series of before and after images. Then ask them to guess the

Source: Reference 1

A good benchmark should avoid many problems. For example, it’s important that the AI ​​doesn’t see the same questions during training, and questions should be designed so that the AI ​​can’t use shortcuts to cheat. “LLM is good at exploiting subtle text cues to arrive at an answer without doing any real inference,” Yue says. Tests should ideally be as messy and noisy as real-world conditions, he added, while also setting energy efficiency goals.

Yue led the development of a test called the Massive Multidisciplinary Multimodal Understanding and Reasoning Benchmark (MMMU) for Expert AGI. This test asks the chatbot to perform college-level visual-based tasks, such as interpreting music scores, graphs, and schematics4. According to Yue, OpenAI’s o1 holds the current MMMU record at 78.2%, compared to the best human performance of 88.6% (o3’s score is unknown).

In contrast, ARC-AGI relies on the basic skills of mathematics and pattern recognition that humans typically develop during early childhood. It provides test takers with a demonstration set of before and after designs and asks them to infer the ‘after’ state of the new ‘before’ design (see ‘Before and After’). “We like the ARC-AGI test from a complementary standpoint,” Yue says.



Source link

Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
Previous ArticleWhat do analysts think about Bank of America stock, which is ahead of its earnings?
Next Article Urban Company hires bank for IPO, expects to file by end of March
Adnan Mahar
  • Website

Adnan is a passionate doctor from Pakistan with a keen interest in exploring the world of politics, sports, and international affairs. As an avid reader and lifelong learner, he is deeply committed to sharing insights, perspectives, and thought-provoking ideas. His journey combines a love for knowledge with an analytical approach to current events, aiming to inspire meaningful conversations and broaden understanding across a wide range of topics.

Related Posts

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

September 25, 2025

Among the most troublesome relationships in healthcare AI

September 25, 2025

Does access to AI become a fundamental human right? Sam Altman says, “Everyone would want…”

September 23, 2025
Leave A Reply Cancel Reply

Top Posts

20 Most Anticipated Sex Movies of 2025

January 22, 2025462 Views

President Trump’s SEC nominee Paul Atkins marries multi-billion dollar roof fortune

December 14, 2024122 Views

How to tell the difference between fake and genuine Adidas Sambas

December 26, 202486 Views

Alice Munro’s Passive Voice | New Yorker

December 23, 202474 Views
Don't Miss
AI September 25, 2025

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Machine learning models can speed up discovery of new materials by making predictions and proposing…

Among the most troublesome relationships in healthcare AI

Does access to AI become a fundamental human right? Sam Altman says, “Everyone would want…”

Google’s Gemini AI is on TV

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to Karachi Chronicle, your go-to source for the latest and most insightful updates across a range of topics that matter most in today’s fast-paced world. We are dedicated to delivering timely, accurate, and engaging content that covers a variety of subjects including Sports, Politics, World Affairs, Entertainment, and the ever-evolving field of Artificial Intelligence.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

The world’s largest air force with the F-35 fleet in 2025

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Among the most troublesome relationships in healthcare AI

Most Popular

10 things you should never say to an AI chatbot

November 10, 20040 Views

Character.AI faces lawsuit over child safety concerns

December 12, 20050 Views

Analyst warns Salesforce investors about AI agent optimism

July 1, 20070 Views
© 2025 karachichronicle. Designed by karachichronicle.
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions

Type above and press Enter to search. Press Esc to cancel.