Humans proceed with "jailbreak" and stop the AI model to produce harmful results.

Please let us know your free update

A new method of artificial intelligence is a new method to prevent users from bringing out harmful content from the models to protect the risks caused by major technology groups such as Microsoft and metrace from the most advanced technology. Is demonstrated.

A paper published on Monday explained an outline of a new system called the “Constitutional Classification”. This is a model that functions as a protective layer on a large language model, such as monitoring both input and output of harmful content, and moving a human Claude chatbot.

Development by humanity in discussions to raise $ 2 billion with a $ 60 billion evaluation is increasing in industry concerns about prison. We try to generate illegal or dangerous information, such as manipulating AI models to create instructions for building chemical weapons.

Other companies are also competing for the fact that companies can safely adopt AI models, which helps them to avoid regulatory scrutiny, and to provide measures to protect them. Microsoft introduced Prompt Shields last March, but Meta introduced a quick guard model last July.

MRINANK Sharma, a member of human technical staff, states: “The main motivation behind the work was a severe chemical substance (weapon) (but) the real advantage of this method is the ability to promptly adapt.”

Humans said that they would not use the system immediately in the current Claude model, but said they would consider implementing it if a risk model was released in the future. Sharma added as follows.

The solution proposed by the startup has been built based on the so -called rules of the so -called rules, which can be defined and restricted and adapt to various types of materials.

It is well known that some jailbreak attempts are to use abnormal capitalization at the prompt or use the grandmother’s persona to ask a model to talk about the evil topics. 。

Recommendation

To verify the effectiveness of the system, humankind has provided up to $ 15,000 “bug bounty” to individuals who have tried to bypass security measures. These testers, known as Red Teamers, spent more than 3,000 hours trying to break through their defense.

Anthropic’s Claude 3.5 Sonnet model refused to over 95 % of the classified tricks compared to 14 % without safe guards.

Major high -tech companies are trying to reduce the number of misuse of models, but are trying to maintain their usefulness. In many cases, if the easing means is introduced, the model may be cautious and refuse benign requests such as Google’s Gemini Image Generator and Meta’s LLAMA 2 initial version. “

However, if these protections are added, you will be charged an additional cost for companies that have already paid a large amount of computing power required for model training and execution. Humanity stated that the “reasoning overhead”, which is the cost of performing a model, will increase by almost 24 %.

Test virtualato performed with the latest model showing the effectiveness of human classifier

Security experts argued that such a generated chatbot accessible characteristics have enabled ordinary people without prior knowledge to extract dangerous information.

“In 2016, the threat actor we kept in mind was a really powerful national state enemy,” said Ram Shankar Siva Kumar, who leads Microsoft’s AI RED team. “Now, one of my threat actors is a teenager with the mouth of the toilet.”

Source link

What's Hot

Who is Graham Platner, the oyster farmer running for Maine Senate? | US News

Lessons to learn how to make your code vibrate using AI like ChatGPT

Masala Bond: DBS faces IT-related prosecution over 2019 Masala Bond investment

Humans proceed with “jailbreak” and stop the AI model to produce harmful results.

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Among the most troublesome relationships in healthcare AI

Does access to AI become a fundamental human right? Sam Altman says, “Everyone would want…”

20 Most Anticipated Sex Movies of 2025

President Trump’s SEC nominee Paul Atkins marries multi-billion dollar roof fortune

How to tell the difference between fake and genuine Adidas Sambas

Alice Munro’s Passive Voice | New Yorker

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Among the most troublesome relationships in healthcare AI

Does access to AI become a fundamental human right? Sam Altman says, “Everyone would want…”

Google’s Gemini AI is on TV

Our Picks

Who is Graham Platner, the oyster farmer running for Maine Senate? | US News

Lessons to learn how to make your code vibrate using AI like ChatGPT

Masala Bond: DBS faces IT-related prosecution over 2019 Masala Bond investment

Most Popular

10 things you should never say to an AI chatbot

Character.AI faces lawsuit over child safety concerns

Analyst warns Salesforce investors about AI agent optimism

Subscribe to Updates

What's Hot

Humans proceed with “jailbreak” and stop the AI ​​model to produce harmful results.

Related Posts

Humans proceed with “jailbreak” and stop the AI model to produce harmful results.