Google uses Anthropic’s Claude AI model to evaluate the performance of its Gemini AI, a report says, raising questions about whether this practice complies with Anthropic’s terms of service.
TechCrunch claims to have seen internal communications suggesting that a contractor working on Gemini was comparing its responses to those authored by Claude. These contractors are said to be tasked with evaluating the accuracy and quality of Gemini’s output based on criteria such as veracity and redundancy.
“Contractors are given up to 30 minutes per prompt to decide whether Gemini or Claude’s answer is better,” the report states.
Difference between Gemini and Claude answers
TechCrunch also says contractors noticed explicit references to Claude within Google’s internal platform used to compare AI models.
In some cases, Claude’s reactions appeared to prioritize safety over Gemini’s, refusing to answer questions he deemed unsafe or giving more cautious responses, the report added. In one instance, Gemini’s response included “nudity and bondage” and was flagged for a “serious safety violation.”
Among the AI models, “Claude’s safety settings are the most stringent,” the contractor said.
Anthropic Terms and Google Description
Anthropic’s terms of service explicitly prohibit using Claude to “build a competing product or service” or “train a competing AI model” without prior approval.
A Google DeepMind spokesperson confirmed that they compare model outputs for evaluation purposes, but denied that they used Anthropic models to train Gemini. Google is also a major investor in Anthropic.
“Of course, in accordance with standard industry practice, we may compare model outputs as part of our evaluation process,” said Sheila McNamara, a Google DeepMind spokesperson.
“However, the suggestion that a human model was used to train Gemini is inaccurate,” McNamara added.