Close Menu
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

The world’s largest air force with the F-35 fleet in 2025

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Among the most troublesome relationships in healthcare AI

Facebook X (Twitter) Instagram
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram Pinterest Vimeo
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World
Karachi Chronicle
You are at:Home » Humanity dares to jailbreak its new AI model
AI

Humanity dares to jailbreak its new AI model

Adnan MaharBy Adnan MaharFebruary 3, 2025No Comments2 Mins Read0 Views
Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


Even the most tolerant corporate AI models have sensitive topics that creators prefer not to discuss (e.g., weapons of mass destruction, illegal conduct, or Chinese political history). For years, enterprising AI users have resorted to everything from odd text strings to ASCII art to stories about dead grandmas, breaking away from those models and having “banned” results 。

Today, Claude model maker humanity has released a new system of constitutional classifiers that say they can “filter the overwhelming majority” of these types of prison breaks. And as the system maintains a bug-money attack of up to 3,000 hours, humanity invites more public people to test the system and whether it can break its own rules Is confirmed.

Respect the Constitution

In a new paper and accompanying blog post, Anthropic says that its new constitutional classifier system is spun from similar constitutional AI systems used to build its Claude model. The system relies on the “constitution” of natural language rules that define a broad range of permitted categories of models (a list of general drugs) and the unauthorized (e.g., acquisition of restricted chemicals). 。

From there, humanity asks Claude to generate a number of synthetic prompts that lead to both acceptance and unacceptable reactions under its constitution. These prompts are translated into multiple languages ​​and changed in the style of “known jailbreak” and then fixed with the “auto-red teaming” prompt, which attempts to create a new new jailbreak attack.

This creates a robust set of training data that can be used to fine-tune new, more jailbreak resistant “classifiers” for both user input and model output. On the input side, these classifiers enclose each query with a set of templates that explain in detail what harmful information should be noted, and how the user attempts to obfuscate or encode requests for that information 。

An example of a long wrapper that the new Claude classifier uses to detect prompts related to chemical weapons.

An example of a long wrapper that the new Claude classifier uses to detect prompts related to chemical weapons.


Credit: Humanity

“For example, harmful information can be used to fill harmful requests in walls of harmlessly-looking content, disguise harmful requests in fictional role-playing, or use obvious alternatives. It could be hidden,” such a wrapper reads in part.



Source link

Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
Previous ArticleWhen the humanoid robot stands on the center of China, does Androids dream of economic spotlights?
Next Article Analysis: Trump wants a “iron dome” in the United States. However, it takes a while with the Pacific mini version.
Adnan Mahar
  • Website

Adnan is a passionate doctor from Pakistan with a keen interest in exploring the world of politics, sports, and international affairs. As an avid reader and lifelong learner, he is deeply committed to sharing insights, perspectives, and thought-provoking ideas. His journey combines a love for knowledge with an analytical approach to current events, aiming to inspire meaningful conversations and broaden understanding across a wide range of topics.

Related Posts

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

September 25, 2025

Among the most troublesome relationships in healthcare AI

September 25, 2025

Does access to AI become a fundamental human right? Sam Altman says, “Everyone would want…”

September 23, 2025
Leave A Reply Cancel Reply

Top Posts

20 Most Anticipated Sex Movies of 2025

January 22, 2025459 Views

President Trump’s SEC nominee Paul Atkins marries multi-billion dollar roof fortune

December 14, 2024122 Views

How to tell the difference between fake and genuine Adidas Sambas

December 26, 202486 Views

Alice Munro’s Passive Voice | New Yorker

December 23, 202474 Views
Don't Miss
AI September 25, 2025

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Machine learning models can speed up discovery of new materials by making predictions and proposing…

Among the most troublesome relationships in healthcare AI

Does access to AI become a fundamental human right? Sam Altman says, “Everyone would want…”

Google’s Gemini AI is on TV

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to Karachi Chronicle, your go-to source for the latest and most insightful updates across a range of topics that matter most in today’s fast-paced world. We are dedicated to delivering timely, accurate, and engaging content that covers a variety of subjects including Sports, Politics, World Affairs, Entertainment, and the ever-evolving field of Artificial Intelligence.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

The world’s largest air force with the F-35 fleet in 2025

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Among the most troublesome relationships in healthcare AI

Most Popular

10 things you should never say to an AI chatbot

November 10, 20040 Views

Character.AI faces lawsuit over child safety concerns

December 12, 20050 Views

Analyst warns Salesforce investors about AI agent optimism

July 1, 20070 Views
© 2025 karachichronicle. Designed by karachichronicle.
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions

Type above and press Enter to search. Press Esc to cancel.