OpenAI co-founder Ilya Satskeva recently said that the AI industry has reached “data peak.” DeepMind researchers believe that the output from new “inference” models is a source of fresh AI training data. A new AI technology known as test-time computing will be tested in 2025.
OpenAI co-founder Ilya Sutskever announced something at a recent conference that should have horrified the AI industry.
“We have reached peak data and there is no more,” he exclaimed in a speech at the annual New Lips event in December.
All useful data on the internet is already being used to train AI models. This process, known as pre-training, has given rise to many of the features of recent generative AI, including ChatGPT. But improvements have slowed, and Sutskever said this era “will definitely come to an end.”
This is a frightening prospect, as trillions of dollars in stock market value and investment in AI rest on ever-improving models.
But most AI experts don’t seem too worried. why?
Calculations during inference
There may be a way around this data wall. This relates to a relatively new technology that helps AI models “think” about difficult tasks for long periods of time.
This approach, called test-time or inference-time computing, breaks the query into smaller tasks and transforms each into a new prompt for the model to work on. Each step requires executing a new request. This is known as the inference stage of AI.
This generates a series of inferences that address each part of the problem. The model does not move on to the next stage until each part is correct and ultimately yields a better final response.
OpenAI released a model called o1 in September that uses inference-time computing. This was quickly followed by Google and Chinese AI research institute DeepSeek, which rolled out similar “inference” models.
“Iterative loop of self-improvement”
Benchmark-based tests of these new models show that they often produce better output than previous top AI crops, especially on math questions and similar tasks that yield definitive, unambiguous answers. Ta.
Related articles
Here’s where things get interesting. What if these high-quality outputs were used for new training data? This large amount of new information could be fed back into training runs of other AI models to yield even better results.
Google DeepMind researchers published a study on test-time computing in August, proposing the technique as a potential way to overcome peak data barriers and continue improving large-scale language models.
“We believe that in the future, the results of applying additional test-time calculations could be distilled back to the base LLM, allowing for iterative self-improvement loops,” the researchers wrote. . “To this end, future work should extend our findings and study how the results of applying test-time calculations can be used to improve the based LLM itself. there is.”
Chat with researchers during testing
Authors: Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar. Xu is still at Google and Kumar spends some of his time at DeepMind, while Lee left to join OpenAI rival Anthropic.
Snell co-authored the paper while interning at Google DeepMind. He’s now back at the University of California, Berkeley, so I called him to ask what inspired his research.
“I was motivated by several factors that are holding back the expansion of pre-training, particularly the finite supply of data,” he told me in a recent interview.
“If an AI model can use extra inference time computations to improve its output, that’s the way to generate better synthetic data,” he added. “This is a useful new source of training data. This seems like a promising way to avoid pre-training data bottlenecks.”
Satya was satisfied
In a recent video podcast, Microsoft CEO Satya Nadella seemed unperturbed and rather perky when asked about the slow pace of improving AI models and the lack of new, high-quality training data.
He described calculating inference time as “another scaling law.”
“So you have pre-training, and then you effectively do this test-time sampling, and then you create a token that you can go back to pre-training to create an even more powerful model that runs in inference. ” he explained.
“I think this is a great way to improve the capabilities of the models,” Nadella added with a smile.
Sutskever also mentioned test-time computing as one possible solution to the peak data problem during his Neurips talk in early December.
Test time for test time computing
In 2025, this approach will be put to the test. Snell is optimistic, but this is not an absolute guarantee.
“Over the past three years or so, the advances in AI have seemed to become more apparent,” he said. “Right now we’re in exploration mode.”
There is one unanswered question. How generalizable is this test-time calculation method? Snell said it works well for questions where the answers are known and can be verified, such as math assignments.
“But many things that require reasoning are not easy to check, such as writing an essay. There is often no clear answer as to how good this is,” he says. I explained.
Still, there are early signs of success, and Snell suspects that output from these types of inferential AI models is already being used to train new models.
“This synthetic data could very well be better than the data publicly available on the Internet,” he said.
If the outputs from OpenAI’s o1 model are better than the startup’s previous top-of-the-line model, GPT-4, these new outputs could theoretically be reused to train future AI models, Snell said. I explained.
He shared a theoretical example. For example, if o1 scores 90% on a particular AI benchmark, you can feed those answers into GPT-4 to boost that model to 90% as well.
“If you have a large set of prompts, you can take large amounts of data from o1, create a large set of training samples, and pre-train a new model on them, or use GPT-4. You can also continue to improve your training,” Snell said. .
A TechCrunch report in late December suggested that DeepSeek may have used output from OpenAI’s o1 to train its own AI models. The latest product, called DeepSeek V3, performs well on industry benchmarks.
“They were probably the first people to recreate o1,” Snell said. “I asked the OpenAI folks what they thought about this. They say it looks the same, but I don’t know how DeepSeek managed to do this so quickly.”
OpenAI and DeepSeek did not respond to requests for comment.