Close Menu
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

Instead of Timothée Chalamett or Tom Holland, Sean Penn declares the Oscar-winning actress “the last movie star.” Hollywood

Does an American pope change U.S. politics? : The NPR Politics Podcast : NPR

Amazon will face Elon Musk’s Tesla with the robot launch.

Facebook X (Twitter) Instagram
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram Pinterest Vimeo
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World
Karachi Chronicle
You are at:Home » The Singapore Institute of Technology (SUTD) explores the advances and challenges of multimodal inference in AI models through puzzle-based assessment and algorithmic problem analysis.
Business

The Singapore Institute of Technology (SUTD) explores the advances and challenges of multimodal inference in AI models through puzzle-based assessment and algorithmic problem analysis.

Adnan MaharBy Adnan MaharFebruary 8, 2025No Comments4 Mins Read0 Views
Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


After the success of large-scale language models (LLMS), current research extends beyond text-based understanding to multimodal inference tasks. These tasks integrate vision and language. This is essential for artificial general information (AGI). Cognitive benchmarks such as PuzzleVQA and algopuzzleVQA evaluate the AI’s ability to process abstract visual information and algorithmic inference. Despite advances, LLMS struggles with multimodal inference, particularly pattern recognition and spatial problem solving. High computational costs exacerbate these challenges.

Previous evaluations relied on iconic benchmarks such as ARC-AGI and visual assessments such as Raven’s progressive matrix. However, these do not properly challenge the AI’s ability to handle multimodal inputs. Recently, datasets such as PuzzleVQA and AlgopuzzleVQA have been introduced to evaluate abstract visual inference and algorithm problem solving. These datasets require models that integrate visual perception, logical deductions, and structured inference. Previous models such as GPT-4-Turbo and GPT-4O demonstrated improvements, but still faced the limitations of abstract inference and multimodal interpretation.

Researchers at the Singapore Institute of Technology (SUTD) have introduced a systematic evaluation of Openai’s GPT-(N) and O-(N) model series on multimodal puzzle solving tasks. Their study examined how reasoning ability evolved across different model generations. This study aims to identify gaps in AI recognition, abstract reasoning and problem-solving skills. The team compared the performance of models such as GPT-4-Turbo, GPT-4O, and O1 in PuzzlevQA and AlgopuzzlevQA datasets, including abstract visual puzzles and algorithmic inference tasks.

The researchers conducted a structured assessment using two main data sets.

PuzzleVQA: PuzzleVQA focuses on abstract visual inference and requires a model that recognizes patterns of numbers, shapes, colors and sizes. algopuzzlevqa:algopuzzlevqa describes an algorithmic problem-solving task that requires logical deductions and computational inference.

Evaluations were performed using both multiple choice and open-ended question formats. This study adopted a zero-shot thinking chain (COT) to encourage inference and analyze performance degradation when switching from multiple choice to open-ended responses. The models were also tested under conditions where visual perception and induced inference were provided separately to diagnose specific weaknesses.

This study observed a steady improvement in the reasoning ability of various model generations. The GPT-4o performed better than the GPT-4-turbo, while O1 achieved the most notable advances, particularly in the algorithmic inference task. However, these benefits resulted in a sharp increase in computational costs. Despite overall advances, AI models still struggled with tasks that required accurate visual interpretation, such as recognizing missing shapes and deciphering abstract patterns. O1 worked well with numerical inference, but was difficult to handle shape-based puzzles. The difference in accuracy between multiple selection and open-ended tasks indicates a strong dependency on answer prompts. Also, recognition remained a major challenge in all models, as the accuracy was greatly improved when explicit visual details were provided.

With a simple summary, the task can be summarised into several detailed points.

In this study, a significant upward trend in inference ability from GPT-4-turbo to GPT-4O and O1 was observed. GPT-4O showed moderate benefits, but the transition to O1 resulted in significant improvements, but increased computational costs by 750 times compared to GPT-4O. Overall, O1 achieved an average accuracy of 79.2% in the multiple selection setting, surpassing GPT-4O’s 60.6% and GPT-4-Turbo’s 54.2%. However, in the open-ended task, all models showed performance drops, with O1 being 66.3%, GPT-4O being 46.8% and GPT-4-Turbo being 38.6%. In algopuzzlevqa, O1 was greatly improved in previous models, especially puzzles that require numerical and spatial deductions. O1 scored 55.3% compared to 43.6% for GPT-4O and 36.5% for GPT-4-Turbo on multiple choice tasks. However, for open-ended tasks, its accuracy was reduced by 23.1%. In this study, perception was identified as a major limitation across all models. Injecting explicit visual details improves accuracy by 22%-30%, indicating a dependence on external perceptual AIDS. In particular, in numerical and spatial pattern recognition, the inductive inference guidance increased performance by 6% to 19%. O1 was excellent at numerical reasoning, but struggled with shape-based puzzles, showing a 4.5% drop compared to the shape recognition task of GPT-4O. It also worked well in structured problem solving, but faced challenges in open-ended scenarios that require independent deductions.

Please see the paper and the github page. All credits for this study will be sent to researchers in this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn groups. Don’t forget to join the 75k+ ml subreddit.

Commended open source AI platform recommended: “Intelagent is an open source multi-agent framework for evaluating complex conversational AI systems” (promotion)

Sana Hassan, a consulting intern at MarkTechPost and a dual-level student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a strong interest in solving real problems, he brings a new perspective to the intersection of AI and real solutions.

✅ (Recommended) Join the Telegram Channel



Source link

Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
Previous ArticleWhy India is hard to compete with global tech companies without strong IP | Technology News
Next Article “Comforts to the financial authorities’ Budget: Treasury Secretary Tuhhin Kanta Pandi
Adnan Mahar
  • Website

Adnan is a passionate doctor from Pakistan with a keen interest in exploring the world of politics, sports, and international affairs. As an avid reader and lifelong learner, he is deeply committed to sharing insights, perspectives, and thought-provoking ideas. His journey combines a love for knowledge with an analytical approach to current events, aiming to inspire meaningful conversations and broaden understanding across a wide range of topics.

Related Posts

Casting Director Frees Hindi Cinemas from Stock Characters: Shabana Azmi | Hindi Movie News

February 18, 2025

Wall Street today: Focusing with US stock and Trump tariffs

February 18, 2025

Impact Subsea and Ashtead Technology have ties with Singapore

February 18, 2025
Leave A Reply Cancel Reply

Top Posts

President Trump’s SEC nominee Paul Atkins marries multi-billion dollar roof fortune

December 14, 202495 Views

Alice Munro’s Passive Voice | New Yorker

December 23, 202453 Views

2025 Best Actress Oscar Predictions

December 12, 202434 Views

20 Most Anticipated Sex Movies of 2025

January 22, 202533 Views
Don't Miss
AI April 14, 2025

Google, Nvidia invests in AI startup Safe Superintelligence, co-founder of Openai Ilya Sutskever

Alphabet and Nvidia are investing in Safe Superintelligence (SSI), a stealth mode AI startup co-founded…

This $30 billion AI startup can be very strange by a man who said that neural networks may already be aware of it

As Deepseek and ChatGpt Surge, is Delhi behind?

Openai’s Sam Altman reveals his daily use of ChatGpt, and that’s not what you think

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to Karachi Chronicle, your go-to source for the latest and most insightful updates across a range of topics that matter most in today’s fast-paced world. We are dedicated to delivering timely, accurate, and engaging content that covers a variety of subjects including Sports, Politics, World Affairs, Entertainment, and the ever-evolving field of Artificial Intelligence.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

Instead of Timothée Chalamett or Tom Holland, Sean Penn declares the Oscar-winning actress “the last movie star.” Hollywood

Does an American pope change U.S. politics? : The NPR Politics Podcast : NPR

Amazon will face Elon Musk’s Tesla with the robot launch.

Most Popular

ATUA AI (TUA) develops cutting-edge AI infrastructure to optimize distributed operations

October 11, 20020 Views

10 things you should never say to an AI chatbot

November 10, 20040 Views

Character.AI faces lawsuit over child safety concerns

December 12, 20050 Views
© 2025 karachichronicle. Designed by karachichronicle.
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions

Type above and press Enter to search. Press Esc to cancel.