Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI. learn more
IBM is making a claim to the top of the open source AI leaderboard with its new Granite 3.1 series released today.
Granite 3.1 Large Language Model (LLM) provides enterprise users with an expanded context length of 128K tokens, new embedding models, integrated hallucination detection, and improved performance. According to IBM, the new Granite 8B Instruct model tops similarly sized open source rivals such as Meta Llama 3.1, Qwen 2.5, and Google Gemma 2. IBM ranked models across a set of academic benchmarks included in the OpenLLM Leaderboard.
The new model is part of an accelerated release cycle for IBM’s Granite open source model. Granite 3.0 was just released in October. At the time, IBM claimed to have a $2 billion book of business related to generative AI. With the Granite 3.1 update, IBM is focused on packing more functionality into smaller models. The basic idea is that smaller models are easier for businesses to operate and are more cost-effective to operate.
“We’ve increased all the numbers as well. Almost everything has improved performance across the board,” David Cox, vice president of AI models at IBM Research, told VentureBeat. “We use Granite for a variety of use cases: we use it internally at IBM in products, we use it for consulting, we deliver it to our customers, we release it as open source. Everything.”
Why performance and small models matter for enterprise AI
There are many ways companies can use benchmarks to evaluate LLM performance.
The direction IBM is taking is to run the model through a full range of academic and real-world tests. Cox emphasized that IBM tested and trained the model to optimize it for enterprise use cases. Performance is not just an abstract measure of speed. Rather, it is a more nuanced measure of efficiency.
One aspect of efficiency that IBM is trying to promote is helping users get to their desired results faster.
“You should spend less time fiddling with prompts,” Cox says. “So the more powerful your model is in that area, the less time you spend on engineering prompts.”
Efficiency is also related to model size. Larger models typically require more compute and GPU resources, increasing costs.
“When people are working on something like a minimum viable prototype, they often jump to very large models, so they might have a 70 billion parameter model or 405 billion You might end up using a parametric model,” Cox says. “But the reality is that many of them are not economical. So the other thing we’ve been trying to do is fit as much capacity as possible into the smallest possible package.”
Context matters for enterprise agent AI
In addition to promising performance and efficiency improvements, IBM has significantly extended Granite’s context length.
In the first Granite 3.0 release, context length was limited to 4k. In Granite 3.1, IBM expanded this to 128k, allowing processing of longer documents. Augmented context is an important upgrade for enterprise AI users, both in search extension generation (RAG) and agent AI.
Agenttic AI systems and AI agents often need to process and reason about longer bodies of information, such as larger documents, log traces, or longer conversations. With a 128k increase in context length, these agent AI systems have access to more context information, allowing them to better understand and respond to complex queries and tasks.
IBM has also released a series of embedded models that help speed up the process of converting data to vectors. The Granite-Embedding-30M-English model can achieve performance of 0.16 seconds per query, which IBM claims is faster than competing options such as Snowflake’s Arctic.
How IBM improves Granite 3.1 to meet enterprise AI needs
So how was IBM able to improve the performance of Granite 3.1? It wasn’t anything specific, Cox explained, but rather a set of processes and innovations.
IBM is developing increasingly sophisticated multi-stage training pipelines, he said. This has allowed the company to squeeze more performance out of its models. Also, an important part of LLM training is data. Rather than just focusing on increasing the amount of training data, IBM is focused on improving the quality of the data used to train Granite models.
“This is not a volume game,” Cox said. “It’s not like you’re going to get 10 times more data and your model will magically improve.”
Reduce hallucinations directly within the model
A common approach to reducing the risk of hallucinations and false outputs in LLM is to use guardrails. These are typically deployed as external features along with LLM.
With Granite 3.1, IBM is integrating hallucination protection directly into the model. Granite Guardian 3.1 8B and 2B models now include function-call hallucination detection.
“This model can natively enforce its own guardrails, giving developers different opportunities to capture things,” Cox said.
He explained that performing hallucination detection in the model itself optimizes the entire process. Internal detection reduces inference calls, making models more efficient and accurate.
How businesses can use Granite 3.1 now and what’s next
All new Granite models are now open source and available for free to enterprise users. These models are also available through IBM’s Watsonx enterprise AI service and will be integrated into IBM’s commercial products.
The company plans to maintain an aggressive pace of updating the Granite model. Future plans include adding multimodal functionality to Granite 3.2, which is expected to debut in early 2025.
“You’ll see us in the next few releases adding more of these types of differentiated capabilities, all the way up to what we’ll be announcing at next year’s IBM Think conference,” Cox said.