AI researchers at Stanford and Washington University were able to train AI “inference” models under $50 in cloud calculation credits, according to a new research paper published last Friday.
Known as S1, the model is a test that measures mathematics and coding abilities and works similarly to cutting-edge inference models such as Openai’s O1 and Deepseek’s R1. The S1 model is available on GitHub and the data and code used to train it is available.
The team behind the S1 said they started with a ready-made base model and then tweaked it through distillation. This is the process of extracting “inference” functionality from another AI model by training its answer.
Researchers said S1 is distilled from Gemini 2.0 Flash Thinking Experimental, one of Google’s inference models. Distillation is the same approach that Berkeley researchers used last month to create an AI inference model that costs around $450.
For some, the idea that researchers with millions of dollars behind them can still innovate in the AI field is exciting. However, S1 raises real questions about commoditizing AI models.
Where is the moat if someone can meticulously replicate a multi-million dollar model with relative pocket changes?
Naturally, Big AI Labs is not satisfied. Openai accuss Deepseek of improperly harvesting data from the API for model distillation purposes.
The researchers behind S1 were trying to find the simplest approach to achieving strong inference performance and “test time scaling”. These were some of Openai’s O1 breakthroughs, and Deepseek and other AI labs have attempted to replicate them through a variety of technologies.
The S1 paper uses a process called Monitored Fine Tuning (SFT), which explicitly directs the inference model to mimic specific behaviors within a DataSet, using a process called Monitored Fine Tuning (SFT), which allows for relatively small datasets. It suggests that it can be distilled.
SFTs tend to be cheaper than the large-scale reinforcement learning methods Deepseek adopted to train competitors on Openai’s O1 model, R1.
Google has daily rate limits via the Google AI Studio Platform, but you can access Gemini 2.0 Flash Thinking Experimentyal for free.
However, Google’s terminology prohibits the model from reverse engineering to develop services that compete with the company’s own AI services. I contacted Google for comment.
The S1 is based on a small, ready-made AI model from the Chinese AI Lab Qwen, owned by Alibaba, and is free to download. To train S1, researchers combine with answers to these questions, as well as the “thinking” process behind each answer in Google’s Gemini 2.0 Flash Thinking Experimential answer, with just 1,000 answers. We created a dataset of carefully curated questions.
According to the researchers, the S1 achieved strong performance on certain AI benchmarks after receiving less than 30 minutes of training using a 16 NVIDIA H100 GPU. Niklas Muenenhuhu, a researcher at Stanford University who worked on the project, told TechCrunch that he could borrow the calculations he needed today for $20.
Using a clever trick, the researchers reaffirmed the work with the S1 and extended its “thinking” time. They told me to wait for it. Adding the word “wait” during S1 reasoning allowed the model to arrive at a slightly more accurate answer according to the paper.
In 2025, Meta, Google and Microsoft are planning to invest hundreds of billions of dollars in AI infrastructure, partially used to train next-generation AI models.
That level of investment may still be necessary to drive the envelope of AI innovation. Distillation has been shown to be a good way to recreate the functionality of AI models at a low cost, but it does not create new AI models that are far better than what is available today.