Andrej Karpathy, a former OpenAI researcher and Tesla’s head of AI, says that when people “ask an AI question,” they’re actually not asking a magic AI system, but an average response from a human data labeler. Explain that you are having a dialogue.
“You’re not looking for AI. You’re looking for the mashup mentality of the average data labeler,” Karpathy says. To illustrate his point, Mr. Karpathy uses a typical tourism question. When someone asks a question about “Top 10 tourist attractions in Amsterdam,” the AI generates an answer based on how the human data labeler has previously answered similar questions.
For questions not in the training data, the system creates statistically similar answers based on its training, mimicking human response patterns.
In particular, Karpathy cautioned against asking AI systems questions about complex policy issues such as optimal governance, noting that you could get the same answers if you asked the labeling team to research the answers directly in an hour. states that he is deaf.
advertisement
“The point is, asking an LLM about how government is run would be the same as asking Mary from Ohio to do a 30-minute survey for $10. You have to follow the labeling documentation to answer those kinds of questions,” Karpathy explains.
How does an AI assistant acquire “personality”?
Large-scale language models undergo two stages of training. First, it learns from large amounts of internet content and other data. Then, during fine-tuning, a human annotator defines the assistant’s responses and trains it on the conversation between the “human” and “assistant” roles.
When an AI model responds to a controversial topic with a phrase like “This is a controversial question,” it’s because the human labeler has been instructed to use such language to maintain neutrality. says Karpathy.
The tweaking process teaches the AI to act like a helpful assistant, maintaining basic knowledge and adapting its style to fit your tweaking data. Many attribute ChatGPT’s explosive success two years ago to this fine-tuning process. This process made users feel like they were talking to a real entity that could understand them, rather than just a sophisticated autocomplete system.
Expertise comes from expert labelers
For specialized topics, companies hire relevant experts as data labelers. Karpathy points out that medical questions will be answered by expert doctors, while top mathematicians like Terence Tao will help with math problems.
recommendation
Humans do not need answers to every possible question. The system only needs enough examples to learn how to simulate the expert’s responses.
However, we cannot guarantee expert-level answers to all questions. Although AI may lack basic knowledge or reasoning skills, its answers are usually better than those of the average internet user, Karpathy says. Therefore, LLM can be very specific and at the same time very useful, depending on the use case.
Prominent AI researchers have previously criticized this approach, known as reinforcement learning from human feedback (RLHF). Unlike systems like DeepMind’s AlphaGo, he sees this as a stopgap solution because there are no objective success criteria.
Karpathy recently left OpenAI along with several other senior AI researchers to start his own AI education company.