When OpenAI co-founder and former chief scientist Ilya Satskeva speaks, the world listens. At NeurIPS 2024, he touched on the unpredictability of inference in AI, saying that the more a system infers, the more unpredictable it becomes.
“The more reasons there are, the more unpredictable it becomes,” he says. “All the deep learning we’re used to is very predictable because we’ve been working on replicating human intuition.”
Sutskever pointed out that systems capable of reasoning, such as advanced chess-playing AI (such as AlphaZero), already exhibit unpredictability. “The best chess AI is unpredictable to the best human chess players,” he said.
It won’t be long before these AI systems start getting smarter to the point of achieving superintelligence. He said that in the future, artificial superintelligence systems (ASI) will be able to understand complex concepts from limited data and will be less confused.
Sutskever said inference models can “auto-correct” themselves in more sophisticated ways, reducing errors such as hallucinations. “AI systems that reason will be able to correct themselves in a similar way to autocorrect, but on a much grander scale,” he added.
“They will figure things out from limited data. They won’t be confused,” he said, hinting at the possibility of “self-aware AI”, which he sees as a natural development. . “Self-awareness is part of our own model of the world,” he said.
Sutskever believes that artificial superintelligence will evolve into a true agent system. “Right now, the system is not agentic in any meaningful sense, only slightly agentic,” he says. “But eventually these systems will actually act as agents in a true sense.”
In June 2024, Sutskever launched a new AI startup called Safe Superintelligence Inc. (SSI) with Daniel Gross (former head of Apple AI) and Daniel Levy (investor and AI researcher). Ta. SSI is dedicated to developing secure and advanced AI systems, with the main goal of achieving “secure superintelligence”.
Unlike many AI companies, the company focuses on long-term safety and progress, avoiding short-term profits and product release pressures.
The end of the pre-training era
Sutskever said the days of advance training are over.
“Pre-training as we know it will definitely end,” he said, citing limitations in data availability. “We only have one Internet. You could even say that data is the fossil fuel of AI. It was created somehow and now we are using it.”
He acknowledged this and said that while current advances in AI come from scaling models and data, other scaling principles may emerge. “I want to emphasize that what we’re scaling now is just the first time we’ve figured out how to scale,” Sutskever said.
Citing OpenAI’s o1, he noted the growing focus on agents and synthetic data as critical elements for the future of AI, while recognizing the challenges in defining synthetic data and optimizing inference time calculations. I emphasized. “People have a feeling that agents are the future, more specifically, but a little bit vaguely, as synthetic data,” he said.
Sutskever talked about how nature could lead to the next breakthrough, drawing parallels to biological systems. He cited scaling the size of mammalian brains and bodies as a potential model for rethinking the architecture of AI.
Future AI systems may employ entirely new scaling principles based on biological efficiency and adaptability, rather than linear improvements through dataset and model scaling. “There’s a lot of precedent for scaling in biology,” he said, suggesting AI could evolve in ways we don’t yet fully understand.
walk down memory lane
Sutskever began his talk by revisiting a presentation from 10 years ago at NeurIPS 2024. There, he and his colleagues introduced the concept of training large neural networks for tasks such as translation. “If you have a large neural network with 10 layers, you can do everything a human can do in an instant,” he joked.
The idea is based on the belief that artificial neurons can mimic biological neurons, with the assumption that the human brain’s ability to process information quickly can be replicated in neural networks.
Sutskever pointed out how early models, including LSTM, relied on basic parallelization techniques such as pipelining. He shared how these models were able to speed up training using one layer per GPU, achieving a 3.5x speedup on 8 GPUs. Sutskever also touched on the origins of the scaling hypothesis, which states that larger datasets and neural networks guarantee AI success. Combined. He credited OpenAI’s Alec Radford, Anthropic’s Dario Amodei, and Jared Kaplan for their roles in advancing this concept and laying the foundation for the GPT model.