Close Menu
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

The world’s largest air force with the F-35 fleet in 2025

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Among the most troublesome relationships in healthcare AI

Facebook X (Twitter) Instagram
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram Pinterest Vimeo
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World
Karachi Chronicle
You are at:Home » Humans proceed with “jailbreak” and stop the AI ​​model to produce harmful results.
AI

Humans proceed with “jailbreak” and stop the AI ​​model to produce harmful results.

Adnan MaharBy Adnan MaharFebruary 3, 2025No Comments3 Mins Read0 Views
Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


Please let us know your free update

Simply sign up to the MYFT digest of artificial intelligence, and will be distributed directly to the receiving tray.

A new method of artificial intelligence is a new method to prevent users from bringing out harmful content from the models to protect the risks caused by major technology groups such as Microsoft and metrace from the most advanced technology. Is demonstrated.

A paper published on Monday explained an outline of a new system called the “Constitutional Classification”. This is a model that functions as a protective layer on a large language model, such as monitoring both input and output of harmful content, and moving a human Claude chatbot.

Development by humanity in discussions to raise $ 2 billion with a $ 60 billion evaluation is increasing in industry concerns about prison. We try to generate illegal or dangerous information, such as manipulating AI models to create instructions for building chemical weapons.

Other companies are also competing for the fact that companies can safely adopt AI models, which helps them to avoid regulatory scrutiny, and to provide measures to protect them. Microsoft introduced Prompt Shields last March, but Meta introduced a quick guard model last July.

MRINANK Sharma, a member of human technical staff, states: “The main motivation behind the work was a severe chemical substance (weapon) (but) the real advantage of this method is the ability to promptly adapt.”

Humans said that they would not use the system immediately in the current Claude model, but said they would consider implementing it if a risk model was released in the future. Sharma added as follows.

The solution proposed by the startup has been built based on the so -called rules of the so -called rules, which can be defined and restricted and adapt to various types of materials.

It is well known that some jailbreak attempts are to use abnormal capitalization at the prompt or use the grandmother’s persona to ask a model to talk about the evil topics. 。

Recommendation

Mankind on the phone

To verify the effectiveness of the system, humankind has provided up to $ 15,000 “bug bounty” to individuals who have tried to bypass security measures. These testers, known as Red Teamers, spent more than 3,000 hours trying to break through their defense.

Anthropic’s Claude 3.5 Sonnet model refused to over 95 % of the classified tricks compared to 14 % without safe guards.

Major high -tech companies are trying to reduce the number of misuse of models, but are trying to maintain their usefulness. In many cases, if the easing means is introduced, the model may be cautious and refuse benign requests such as Google’s Gemini Image Generator and Meta’s LLAMA 2 initial version. “

However, if these protections are added, you will be charged an additional cost for companies that have already paid a large amount of computing power required for model training and execution. Humanity stated that the “reasoning overhead”, which is the cost of performing a model, will increase by almost 24 %.

Test virtualato performed with the latest model showing the effectiveness of human classifier

Security experts argued that such a generated chatbot accessible characteristics have enabled ordinary people without prior knowledge to extract dangerous information.

“In 2016, the threat actor we kept in mind was a really powerful national state enemy,” said Ram Shankar Siva Kumar, who leads Microsoft’s AI RED team. “Now, one of my threat actors is a teenager with the mouth of the toilet.”



Source link

Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
Previous ArticleReturn to World War II Strategy to Europe? France asks the car company to produce a kamikaze drone -report
Next Article The son of the leader of the Biard Congress died at Patna Home | Ind News
Adnan Mahar
  • Website

Adnan is a passionate doctor from Pakistan with a keen interest in exploring the world of politics, sports, and international affairs. As an avid reader and lifelong learner, he is deeply committed to sharing insights, perspectives, and thought-provoking ideas. His journey combines a love for knowledge with an analytical approach to current events, aiming to inspire meaningful conversations and broaden understanding across a wide range of topics.

Related Posts

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

September 25, 2025

Among the most troublesome relationships in healthcare AI

September 25, 2025

Does access to AI become a fundamental human right? Sam Altman says, “Everyone would want…”

September 23, 2025
Leave A Reply Cancel Reply

Top Posts

20 Most Anticipated Sex Movies of 2025

January 22, 2025461 Views

President Trump’s SEC nominee Paul Atkins marries multi-billion dollar roof fortune

December 14, 2024122 Views

How to tell the difference between fake and genuine Adidas Sambas

December 26, 202486 Views

Alice Munro’s Passive Voice | New Yorker

December 23, 202474 Views
Don't Miss
AI September 25, 2025

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Machine learning models can speed up discovery of new materials by making predictions and proposing…

Among the most troublesome relationships in healthcare AI

Does access to AI become a fundamental human right? Sam Altman says, “Everyone would want…”

Google’s Gemini AI is on TV

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to Karachi Chronicle, your go-to source for the latest and most insightful updates across a range of topics that matter most in today’s fast-paced world. We are dedicated to delivering timely, accurate, and engaging content that covers a variety of subjects including Sports, Politics, World Affairs, Entertainment, and the ever-evolving field of Artificial Intelligence.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

The world’s largest air force with the F-35 fleet in 2025

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Among the most troublesome relationships in healthcare AI

Most Popular

10 things you should never say to an AI chatbot

November 10, 20040 Views

Character.AI faces lawsuit over child safety concerns

December 12, 20050 Views

Analyst warns Salesforce investors about AI agent optimism

July 1, 20070 Views
© 2025 karachichronicle. Designed by karachichronicle.
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions

Type above and press Enter to search. Press Esc to cancel.