File photo: “We have reached peak data,” former OpenAI chief scientist Ilya Sutskever said on stage at last year’s NeurIPS conference | Photo by Reuters
“We have reached peak data,” former OpenAI chief scientist Ilya Sutskeva said on stage at last year’s NeurIPS conference. “We have to work with the data we have, and there is only one Internet.”
Sutskever’s comments come amid speculation that the rate of progress in large-scale language models (LLMs) is hitting a wall as scaling reaches the limits of digitalization. A common belief is that the bigger the AI model, the smarter it will be.
When and how did the transition to smaller models begin?
Since OpenAI released its 175 billion parameter LLM, GPT-3, in 2020, the race to build AI models at scale has intensified. Over the next three years, the company’s LLM grew further in size with the introduction of GPT-4, which was trained on 1.7 trillion parameters. .
But in 2024, researchers started looking at language models differently as scaling training data collected from the internet yielded small gains. The idea arose to build a smaller language model.
This is clear from the announcements made by major tech companies. Most of them released nifty language models alongside their flagship AI models.
Google DeepMind released Gemini Ultra, Nano, and Flash models, and OpenAI and Meta released GPT-4o mini and Llama 3 models. Amazon-backed Anthropic AI launched Claude 3 and Haiku with Opus.
What are the advantages and disadvantages of small language models?
Small Language Model (SLM) is inexpensive and ideal for certain use cases. For companies that need AI for a set of specialized tasks, they don’t need large-scale AI models. Training a small model requires less time, less computation, and less training data.
French startup Mistral AI, an SLM provider, touts its AI models as being as efficient as LLM for specialized and focused applications. Microsoft has released a family of small language models called Phi. (The latest Phi-3-mini consists of 3.8 billion parameters)
Apple Intelligence, the AI system featured in the latest iPhones and iPads, runs on-device AI models that rival the performance of top LLMs.
When LLM is intentionally built to enable artificial general intelligence (AGI), small language models are created for specific use cases.
How are use cases different for large-scale and small-scale AI models?
“Small language models are great for edge cases,” said Rahul Dandewate, ML engineer at Adobe. “When you’re using WhatsApp or a meta application powered by the Llama 8B model, you’re trying to learn a new language because it’s pretty good at translation and other basic tasks.”
“But they don’t perform well on most benchmarks that measure large-scale language models, such as coding or logical problems. Small-scale language models that are this good at solving more complex problems It doesn’t exist yet,” he said.
It is still not fully understood why this bottleneck exists. “But the best way to understand this is that small animals have a finite number of neurons, just as the human brain has a huge number of neurons. “That’s why the brain has the ability to process much more complex levels of intelligence. This is similar to how small language models and large language models work,” he said.
How does it work in India?
The small size of a small language model is perfect for a country like India, where the scope for AI adoption is vast but resources are limited.
IIIIT Hyderabad’s other AI initiative, Visvam, builds small language models that can be used in healthcare, agriculture, and education, zeroing out datasets to “promote and preserve linguistic and cultural diversity through AI.” The website states that it is being built from
As the world of language models evolves, it’s no longer enough to build frontier models from scratch. “We want to build GenAI that can be used by a billion Indians,” said Vivek Raghavan, co-founder of Sarvam AI.
issued – January 9, 2025 3:50 PM IST