Close Menu
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

The world’s largest air force with the F-35 fleet in 2025

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Among the most troublesome relationships in healthcare AI

Facebook X (Twitter) Instagram
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram Pinterest Vimeo
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World
Karachi Chronicle
You are at:Home » IBM AI releases Granite-Vision-3.1-2B: a small vision language model with very impressive performance on a variety of tasks
Tech

IBM AI releases Granite-Vision-3.1-2B: a small vision language model with very impressive performance on a variety of tasks

Adnan MaharBy Adnan MaharFebruary 8, 2025No Comments3 Mins Read0 Views
Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


The integration of visual and textual data in artificial intelligence presents complex challenges. Traditional models struggle to accurately interpret structured visual documents such as tables, charts, infographics, and diagrams. This limitation affects automated content extraction and understanding. This is important for applications in data analysis, information search and decision making. As organizations increasingly rely on AI-driven insights, the need for models that can effectively process both visual and textual information has increased dramatically.

IBM addressed this challenge with the release of Granite-Vision-3.1-2B, a compact vision language model designed for understanding documents. This model can extract content from a variety of visual formats, including tables, charts, and diagrams. It is trained on a well-curated dataset consisting of both public and synthetic sources, and is designed to handle a wide range of document-related tasks. Granite-Vision-3.1-2B, fine-tuned from the large-language model of granite, integrates image and text modalities to improve interpretation capabilities, making it suitable for a variety of practical applications.

The model consists of three important components:

Vision Encoder: Use Siglip to efficiently process and encode visual data. Vision-Language Connector: A two-layer multilayer perceptron (MLP) with a GELU activation function designed to bridge visual and textual information. Large-scale language model: Built on top of Granite-3.1-2B-Instruct and has a length of 128K context to handle complex and extensive input.

The training process is built on LLAVA, incorporates multi-layer encoder functionality and has grid resolution of any density. These enhancements improve the model’s ability to understand detailed visual content. This architecture allows models to perform a variety of visual document tasks, including analyzing tables and charts, performing optical character recognition (OCR), and answering document-based queries.

The assessment shows that granite-Vision-3.1-2B works well across multiple benchmarks, particularly in understanding the documents. For example, I achieved a score of 0.86 on the Chartqa benchmark, surpassing other models within the 1B-4B parameter range. The TextVQA benchmark achieved a score of 0.76 and demonstrated strong performance in interpreting and responding questions based on textual information built into the image. These results highlight the potential of models for enterprise applications that require accurate visual and text data processing.

IBM’s Granite-Vision-3.1-2B represents a notable advance in the vision language model, providing a balanced approach to visual document understanding. Its architecture and training methodology allows for efficient interpretation and analysis of complex visual and textual data. Native support for transformers and VLLM allows this model to be adapted to a variety of use cases and can be deployed in cloud-based environments such as the Colab T4. This accessibility makes it a practical tool for researchers and experts looking to enhance AI-driven document processing capabilities.

See IBM-Granite/Granite-Vision-3.1-2B-Preview and IBM-Granite/Granite-3.1-2B-Instruct. All credits for this study will be sent to researchers in this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn groups. Don’t forget to join the 75k+ ml subreddit.

Commended open source AI platform recommended: “Intelagent is an open source multi-agent framework for evaluating complex conversational AI systems” (promotion)

Asif Razzaq is CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, ASIF is committed to leveraging the possibilities of artificial intelligence for social benefits. His latest efforts are the launch of MarkTechPost, an artificial intelligence media platform. This is distinguished by its detailed coverage of machine learning and deep learning news, and is easy to understand by a technically sound and wide audience. The platform has over 2 million views each month, indicating its popularity among viewers.

✅ (Recommended) Join the Telegram Channel



Source link

Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
Previous ArticleHewlett Packard notifies employees of data breach by Russian hackers
Next Article Tech vs Finance: Social War
Adnan Mahar
  • Website

Adnan is a passionate doctor from Pakistan with a keen interest in exploring the world of politics, sports, and international affairs. As an avid reader and lifelong learner, he is deeply committed to sharing insights, perspectives, and thought-provoking ideas. His journey combines a love for knowledge with an analytical approach to current events, aiming to inspire meaningful conversations and broaden understanding across a wide range of topics.

Related Posts

Googleबनी$ 3

September 16, 2025

Tesla engineers will resign in eight years. He points out CEO Elon Musk as the main reason, accusing him of “liing to the public and manipulating him…”

September 12, 2025

Ant Group unveils its own Tesla Optimus competitor, R1 humanoid robot

September 11, 2025
Leave A Reply Cancel Reply

Top Posts

20 Most Anticipated Sex Movies of 2025

January 22, 2025456 Views

President Trump’s SEC nominee Paul Atkins marries multi-billion dollar roof fortune

December 14, 2024122 Views

How to tell the difference between fake and genuine Adidas Sambas

December 26, 202486 Views

Alice Munro’s Passive Voice | New Yorker

December 23, 202474 Views
Don't Miss
AI September 25, 2025

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Machine learning models can speed up discovery of new materials by making predictions and proposing…

Among the most troublesome relationships in healthcare AI

Does access to AI become a fundamental human right? Sam Altman says, “Everyone would want…”

Google’s Gemini AI is on TV

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to Karachi Chronicle, your go-to source for the latest and most insightful updates across a range of topics that matter most in today’s fast-paced world. We are dedicated to delivering timely, accurate, and engaging content that covers a variety of subjects including Sports, Politics, World Affairs, Entertainment, and the ever-evolving field of Artificial Intelligence.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

The world’s largest air force with the F-35 fleet in 2025

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Among the most troublesome relationships in healthcare AI

Most Popular

10 things you should never say to an AI chatbot

November 10, 20040 Views

Character.AI faces lawsuit over child safety concerns

December 12, 20050 Views

Analyst warns Salesforce investors about AI agent optimism

July 1, 20070 Views
© 2025 karachichronicle. Designed by karachichronicle.
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions

Type above and press Enter to search. Press Esc to cancel.