Andrej Karpathy is not only one of the top minds of AI, but he seems to have the ability to simplify difficult concepts and access the average person.


AI’s former Tesla director and Openai Executive Andrej Karpathy have explained various stages to train LLM through examples of school textbooks. “We must take LLM to school,” he wrote in X: “When you open a textbook, you will see three major information.
1. Background information /Expo. The meat of a textbook that explains the concept. When you attend it, your brain is training about the data. This is equivalent to the exit, where the model reads the Internet and has accumulated background knowledge.
2. A problem has occurred in the solution. These are specific examples of how experts can solve the problem. They are imitating demonstrations. This is equivalent to a monitored fine -tuning, whose model is fine -tuned in an assistant’s “ideal response” written by humans.
3. Practice problem. These are prompts for students and usually have no solution, but there is always the final answer. Usually, many of these are at the end of each chapter. They urge students to learn through trial and error -they need to try many things to get the right answer. This is equivalent to reinforcement learning, “he explained.
This is a great way to explain how LLMS learns by comparing each stage of the LLM training process with some of the school textbooks. However, Karpathy expanded this example and indicated more LLMS training.
“LLMS was exposed to 1 ton (background information/previous reduction) and 2 (solution/monitored fine -tuning problem), but 3 (practice problem/reinforcement learning) is the early emerging frontier. LLMS data. When creating a set, he must read and practice textbooks for them. Ta.
It was the advancement of reinforcement learning that DeepSeek was the R1 model to surprise the world, and it retained its own model for Openai’s top model. The Dario Amday, a human CEO, states that reinforced learning is still an undeveloped axis for improving LLMS, and if LLMS is improved by better reinforcement learning, he has stated that there is a rapid progress. Ta. Karpathy tells researchers that they need to support models with reinforced learning. “In the case of an open source friend: IMO The best leverage you can do is help to bring out the LLM cognitive strategy to build the diversity of the RL environment. Create a certain gym. This is an extreme. He is a parallel task and supports a large -scale community of the collaborator, “he said in X.