We are entering a new frontier of intelligence, one where pre-training is obsolete. The AGI era has arrived.
“As we know, pre-training will definitely end…because there is only one Internet,” OpenAI co-founder Ilya Satskeva said at NeurIPS 2024, citing the finite nature of data. and highlighted the challenge of lack of data.
“You could even say that data is the fossil fuel of AI. It was created in some way, we are using it now, and we have reached peak data,” he added, explaining that today’s GenAI boom It signaled the end of the Transformer-based architecture that spawned so many.
Strzekbar’s Scaling Dilemma
For context, all base models rely on pre-training scaling for improvement. However, recent discussions highlight the diminishing benefits of training implementation. Industry leaders such as Microsoft CEO Satya Nadella and Google CEO Sundar Pichai echoed this sentiment, noting that moving forward with existing architectures is becoming increasingly difficult.
Previously, Satskever told Reuters in an interview that 2010 was an era of scaling, but now we are back to an era of surprise and discovery. He also said it’s more important than ever to scale up the right things.
OpenAI founding member Andrei Karpathy pointed out that LLM lacks “thought process data,” spurring calls for synthetic data to mimic human reasoning.
Have you scraped all of the internet yet?
The open source community as a whole believes there is still room for experimentation. Qwen’s Binyuan Hui argued that “synthetic data and post-training depend on the quality of the base model.”
Pre-training remains important for the community until it matches the capabilities of closed-source models like OpenAI.
Hui added that Qwen2.5’s 18T token fails to cover niche and evolving information. Qwen3 requires more data, but the quality of data cleaning and access remains a major challenge.
His central argument is that the lack of important details about the advanced pre-trained models referenced by Sutskever (e.g. number of tokens, parameter sizes, performance metrics, etc.) creates opacity and limits pre-training to its true limits. It was said that this was preventing a clear assessment of whether the goals had been reached.
Microsoft’s Phi-4 is a 14 billion parameter model that excels at complex inference. Such small models suggest a promising future for local AI applications shaped by synthetic data.
Can Chain of Thought (CoT) be extended to AGI?
François Chollet, creator of Keras and the famous AGI-ARC benchmark, says: I need a better idea. Now, better ideas are finally starting to be implemented. ”
Interestingly, inferential models will be the next advancement beyond agents. Mike Knoop, co-founder of ARC Prize, said that unlike GPT, which rose from 0% to 5% in five years, the o1 series has jumped from 18% to 32% of the benchmark in just a few months, and is the fastest growing game in the world. I said it was a changer.
Since o1 is already at PhD level, I wonder how far the model can progress from here. OpenAI is scheduled to release its next iteration, o3, tonight.
Google announced its inference model yesterday, joining Qwen and DeepSeek, which have already announced thinking models. Meanwhile, Meta released a report hinting at the arrival of inference models next year, with xAI’s Grok and Anthropic expected to follow suit.
2025 is the year of agenttic AI
According to Sutskever, the future will likely focus on three pillars: agents, synthetic data, and inference-time computing. “The important thing about superintelligence is that it is qualitatively different from what we have.”
He envisions systems evolving from slightly agentic to truly autonomous, capable of reasoning and making decisions in dynamic and unpredictable ways. At Axios AI Summit, Anthropic CPO Mike Krieger compared the evolution of users deploying AI agents to drivers adapting to Tesla’s self-driving mode.
“The unfortunate thing about that meeting is that he didn’t say anything. Ten years ago, Elijah would have told us what he thought we should do. Yesterday, he “This is what happens when you run a company and are more concerned about confidentiality than interest in science,” said Dumitru Erhan, research director at Google DeepMind. said as he reflected on the uncertain future.
Nevertheless, Satskeva gave a broader perspective and encouraged people to think beyond the limits of current possibilities.
In my talk, I discussed how biology provides examples of scaling, such as the relationship between body size and brain size in mammals. In hominids, evolutionary advances deviate from typical scaling laws and cause unconventional scaling in AI systems.
This reflects a major point towards building a new architecture for AI.
“Think of it like the iPhone. It kept getting bigger and more useful in terms of hardware, but then it hit a plateau and the focus shifted to applications,” John Rush said of X. Ta.