How Claude uses AI to identify new threats

The company didn’t know it yet, but Anthropic had a spam problem.

Earlier this year, a group of accounts had begun asking the company’s chatbot, Claude, to generate text for search engine optimization — the art of getting a website to rank more highly in Google. There’s nothing necessarily wrong with a publisher trying to generate keywords to describe their site to Google. But these accounts, which worded their prompts carefully in an apparent effort to escape detection by Anthropic’s normal filters, appeared to be part of a coordinated effort.

The spammers might have gotten away with it — but then the network was spotted by Clio. An acronym for “Claude insights and observations,” Clio is an internal tool at Anthropic that uses machine learning to identify previously unknown threats, disrupt coordinated attempts to abuse the company’s systems, and generate insights about how Claude and the company’s artificial intelligence models are being used.

In the case of the spam network, Claude identified a cluster of accounts making queries that by themselves did not necessarily violate Anthropic’s guidelines. But when the company’s trust and safety team investigated, it determined that the queries were coming from a network of spammers. It terminated their access to Claude.

“Sometimes it’s not clear from looking at an individual conversation whether something is harmful,” said Miles McCain, a co-author of the paper and a member of Anthropic’s technical staff, said in an interview with Platformer. “It’s only once you piece it together in context that you realize, this is coordinated abuse generating SEO spam. Generating keywords for your blog? That’s fine. It’s only when you’re doing it across tons of accounts, abusively, for free that it becomes problematic.”

Discovering these previously hidden harms — what the company calls “unknown unknowns” — is a core part of Clio’s mission. Anthropic announced Clio in a paper published today, alongside a blog post about how Claude was used in 2024 elections.

The company hopes that other trust and safety teams consider using a Clio-like approach to serve as an early warning system for new harms that emerge as the use of AI chatbots become more pervasive.

“It really shows that you can monitor and understand, in a bottom-up way, what’s happening — while still preserving user privacy,” Alex Tamkin, the paper’s lead author and a research scientist, said in an interview. “It lets you see things before they might become a public-facing problem. … By using language models, you can catch all sorts of novel and weird-shaped use cases.”

How Clio works

In the Clio paper, Anthropic drew from 1 million conversations with Claude — both free and paid users. The system is instructed to omit private details and personal information.

Under development for about six months, Clio works by analyzing what a conversation is about, and then clustering similar conversations around similar themes and topics. (Topics that are rarely discussed with Claude are omitted from the analysis, which offers an additional safeguard against accidentally making an individual user identifiable.)

Clio creates a title and summary for those clusters, and reviews it again to make sure personal information is not included. It then creates multi-level hierarchies for related topics — an education cluster might contain sub-clusters for the way teachers use Claude and the way students do, for example.

Analysts can then search the clusters, or explore Claude usage visually. Clio offers a visual interface similar to Obsidian’s graph view, linking clusters based on the frequency of discussion and how they may be related to each other.

Popular queries often link to other popular queries — a sign that many people use Claude for the same things. Less popular queries often appear in the visualization as islands — and it’s these islands that can highlight unknown unknowns.

The spam network was one of these islands. It was a group of accounts devoted almost entirely to making SEO queries, in the same technically-allowed-but-definitely-suspicious way. Upon discovering the island, Anthropic referred it to its trust and safety team, which ultimately removed the network.

Like most trust and safety teams, Anthropic’s already had tools in place to identify spammers. It identified keywords often used by spammers, and created machine-learning classifiers to understand when its systems were likely being used to generate spam. This is what the company calls a “top-down” approach to safety.

Clio, on the other hand, is bottom-up. It isn’t looking for anything specific at all. Rather, it’s a way to help Anthropic understand all the ways that Claude is being used, and consider whether some of those uses could be harmful.

It works the other way, too: identifying clusters of conversations that Claude marked as harmful even when they are totally innocuous.

Clio revealed, for example, that Claude was repeatedly refusing to answer questions about the role-playing game Dungeons & Dragons. As people asked the chatbot for help planning their attacks, Claude often assumed that users were planning actual violence.

The Clio team referred the issue back to Anthropic’s trust and safety team, which refined its classifiers.

It’s not just D&D. Clio found that Claude sometimes rejected questions from job seekers who had uploaded their resumes, since the resumes included the sort of personal information that can violate Claude’s rules in other contexts. And some benign questions about programming were refused because Claude mistakenly associated them with hacking attempts.

“You can use Clio to constantly monitor at a high level what types of things people are using this fundamentally new technology for,” Tamkin said. “You can refer anything that looks suspicious or worrisome to the trust and safety team and update those safeguards as the technology rolls out.”

More recently, Anthropic has used Clio to understand the potential harms from “computer use,” its first foray into an AI system that can execute actions on a computer. Clio is identifying capabilities that might not have been apparent in pre-launch testing, the company said, and is monitoring how testers are using them in practice.

When monitoring activity around the 2024 US election, Clio helped identify both benign uses (like explaining political processes) and policy violations (like trying to get Claude to generate fundraising materials for campaigns).

How people use Claude

My conversation with Anthropic’s societal impacts team focused on how Clio can be used to identify harms. But it can also be used to identify opportunities for Anthropic, by highlighting how people use its chatbot.

In what the company is calling a first for a major AI lab, the Clio paper also highlights the top three categories of uses for Claude:

Coding and software development (more than 10 percent of conversations) Educational use, both for teachers and for students (more than 7 percent)Business strategy and operations, such as drafting professional communications and analyzing business data (almost 6 percent)

The top three uses, then, account for only about 23 percent of usage. The long tail of Claude’s use cases appears to be extremely long.

“It turns out if you build a general purpose technology and release it, people find a lot of purposes for it,” Deep Ganguli, who leads the societal impacts team, told me with a laugh. “It’s crazy. You scroll around these clusters and you’re like, what? People are using Claude to do that?”

Among the other ways people use Claude, Clio found, include dream interpretation, questions about the Zodiac, analysis of soccer matches, disaster preparedness, and hints for crossword puzzles. Users also routinely subject Claude to the most difficult challenge known to large language models today: asking it to count the number of r’s in “strawberry.”

During a demo earlier this week, as the team clicked around Clio, I saw even more use cases: clusters about video game development, nutrition, writing creative fiction, and writing Javascript. Ganguli told me he began asking Claude questions about parenting after spotting it as a popular cluster within Clio.

What’s next

The Anthropic team told me they tried to share as much about how Clio works as possible in their paper, in the hopes that other AI labs would try something similar themselves. The paper goes so far as to include the cost of running Clio — $48.81 per 100,000 conversations.

“We wanna make it as easy as possible for someone to pitch it somewhere else and be like, look, guys — they did it and it worked,” McCain said.

In the meantime, Anthropic told me that Clio has become a meaningful part of its trust and safety efforts. But the company is already imagining other things it might do with the technology.

Ganguli highlighted three. One, the company could use Claude to understand the future of work. What kind of jobs is Claude helping people with, and what does that suggest about how the economy is transforming?

Two, Clio could change the safety evaluations that AI labs perform on their models. Instead of drawing from historical and theorized harms, companies can ground evaluations in the real-world usage that they’re seeing today.

Finally, Ganguli sees science applications. Claude is trained to follow a constitution; perhaps Clio could surface instances in which it fails to do so, or struggles with a trade-off.

Those uses of Clio seem benign. But they also highlight the deep sensitivity of the queries people make in chatbots like Claude. Anthropic is using the technology to identify harms, but it’s just as easy to imagine another company using similar technology to analyze consumer behavior for the purposes of advertising, persuasion, or other surveillance. It’s also easy to imagine another company taking fewer steps to preserve users’ privacy, using their queries in ways that could create risks to them.

Ganguli says that one goal of publishing the results from Clio is to draw attention to risks like these.

“I feel strongly that, as we’re developing these technologies, the public should know how they’re being used and what the risks are,” he said.

Governing

Google’s Gemini and mixed-reality news drops

Google unveiled a host of AI announcements on Wednesday in what appeared to be a kind of shock-and-awe campaign to demonstrate the company’s leadership in AI development. A new mixed-reality platform for Android followed on Thursday. The announcements include:

Gemini 2.0, a new flagship model that the company says is built to enable AI agents. The company will test it in search and AI Overviews.(Julia Love and Davey Alba / Bloomberg) Gemini 2.0 Flash, a low-latency model that can be used in third-party apps and services starting in January. Gemini Advanced users can also try it starting today. (Kyle Wiggers / TechCrunch)Jules, a more advanced AI coding assistant. (Michael Nuñez / VentureBeat) Trillium, the sixth-generation AI chip that powers Gemini 2.0, which its said is significantly more energy-efficient. (Michael Nuñez / VentureBeat) Gemini agents that can help players play video games by reasoning based on what they see on the screen. (Jay Peters / The Verge)Updates for Project Astra, the multimodal assistant that can draw on search, Maps, Lens, and Gemini to answer questions about what it hears or what the user sees through smart glasses. (Ina Fried / Axios)Project Mariner, an experimental agent built by DeepMind that can move a mouse cursor, control Chrome, click buttons and fill out forms. (Maxwell Zeff / TechCrunch)Deep Research, a feature inside Gemini Advanced that scours the web and prepares detailed research reports. This one was maybe the most interesting for me, since it’s available now and unlike anything I’ve seen from Google’s competitors. (Emma Roth / The Verge)Android XR, a new mixed-reality platform that works across headsets and glasses. I had a fun demo with these last week and was struck by the high visual fidelity of a Samsung headset prototype; Google’s AR glasses feel about as capable as the Meta Ray-Bans and will be worth a look when they are released.

Industry

You can now share your screen with ChatGPT’s advanced voice mode. The company also announced the availability of a Santa Clause voice for the holidays. (Kyle Wiggers / TechCrunch)A deep dive on scaling laws finds that even if the traditional model size-power-compute combination is showing diminishing returns, companies are finding lots of new ways to scale and deploying them all. (Dylan Patel, Daniel Nishball and AJ Kourabi / SemiAnalysis)Apple released iOS 18.2, bringing ChatGPT to users as part of Apple Intelligence, along with other AI features including Image Playground and Genmoji. (Igor Bonifacic / Engadget)Apple is reportedly developing its first AI server chip with Qualcomm, and plans to have it ready for mass production by 2026. (Wayne Ma and Qianer Liu / The Information)From October until December 2, daily users on X dropped 8.4 percent and rose on Bluesky by 1,064 percent, according to Similarweb. (Raphael Boyd / Guardian) Threads copied “starter packs” from Bluesky. (Jay Peters / The Verge)WordPress must stop blocking WP Engine’s access to its resources and interfering with its plugins after WP Engine won a preliminary injunction. (Emma Roth / The Verge)A massive outage took Facebook, Instagram, and Threads out for several hours on Wednesday. (Lawrence Abrams / Bleeping Computer)ChatGPT and Sora went down for a few hours as well. (Maxwell Zeff / TechCrunch)A look at how niche communities have used WhatApp in ways it was not designed for, including distributing news, matchmaking, and soliciting prayers. (Sonia Faleiro / Rest of World) TikTok is offering financial incentives to users for spending time in the TikTok Shop, inviting their friends and purchasing products. (Alexandra S. Levine / Bloomberg)YouTube viewers streamed more than 1 billion hours of content a day on their televisions this year, including 400 million hours of podcasts a month. (YouTube)Microsoft AI CEO Mustafa Suleyman is building a consumer health team in London from people who worked on a similar team at Google DeepMind, which he formerly led. (Madhumita Murgia / Financial Times) Blogger, Twitter and Medium founder Ev Williams is back with Mozi, an app aimed at helping friend groups foster stronger connections. (Erin Griffith / New York Times)

Those good posts

For more good posts every day, follow Casey’s Instagram stories.

(Link)

(Link)

(Link)

(Link)

Talk to us

Send us tips, comments, questions, and hidden Claude use cases: casey@platformer.news.

Source link

What's Hot

Young Sherlock | Prime Video

U.S. bank executives say AI will improve productivity and reduce jobs

Parents and researchers say character AI forces dangerous content on children | 60 minutes

How Claude uses AI to identify new threats

U.S. bank executives say AI will improve productivity and reduce jobs

Parents and researchers say character AI forces dangerous content on children | 60 minutes

AI trackers: AI agents open the door to new hacking threats

20 Most Anticipated Sex Movies of 2025

President Trump’s SEC nominee Paul Atkins marries multi-billion dollar roof fortune

How to tell the difference between fake and genuine Adidas Sambas

Alice Munro’s Passive Voice | New Yorker

U.S. bank executives say AI will improve productivity and reduce jobs

Parents and researchers say character AI forces dangerous content on children | 60 minutes

AI trackers: AI agents open the door to new hacking threats

FACTS IN : FACTS OUT – Join the call for truth in AI at the global stand for trusted news

Our Picks

Young Sherlock | Prime Video

U.S. bank executives say AI will improve productivity and reduce jobs

Parents and researchers say character AI forces dangerous content on children | 60 minutes

Most Popular

10 things you should never say to an AI chatbot

Analyst warns Salesforce investors about AI agent optimism

Musk says the Xai’s Grok 3 chatbot will be released on Monday

Subscribe to Updates

What's Hot

How Claude uses AI to identify new threats

Governing

Google’s Gemini and mixed-reality news drops

Industry

Those good posts

Talk to us

Related Posts