Close Menu
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

Russia-Ukraine War: Putin says he will meet Zelensky, but only in the “final stage” of discussion

Three times more fatal! Thanks to the SIC, China’s J-20 stealth fighters can now detect enemy jets at distances such as F-35, F-22, and more.

Chinese researchers release the world’s first fully automated AI-based processor chip design system

Facebook X (Twitter) Instagram
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram Pinterest Vimeo
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World
Karachi Chronicle
You are at:Home » Humans proceed with “jailbreak” and stop the AI ​​model to produce harmful results.
AI

Humans proceed with “jailbreak” and stop the AI ​​model to produce harmful results.

Adnan MaharBy Adnan MaharFebruary 3, 2025No Comments3 Mins Read0 Views
Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


Please let us know your free update

Simply sign up to the MYFT digest of artificial intelligence, and will be distributed directly to the receiving tray.

A new method of artificial intelligence is a new method to prevent users from bringing out harmful content from the models to protect the risks caused by major technology groups such as Microsoft and metrace from the most advanced technology. Is demonstrated.

A paper published on Monday explained an outline of a new system called the “Constitutional Classification”. This is a model that functions as a protective layer on a large language model, such as monitoring both input and output of harmful content, and moving a human Claude chatbot.

Development by humanity in discussions to raise $ 2 billion with a $ 60 billion evaluation is increasing in industry concerns about prison. We try to generate illegal or dangerous information, such as manipulating AI models to create instructions for building chemical weapons.

Other companies are also competing for the fact that companies can safely adopt AI models, which helps them to avoid regulatory scrutiny, and to provide measures to protect them. Microsoft introduced Prompt Shields last March, but Meta introduced a quick guard model last July.

MRINANK Sharma, a member of human technical staff, states: “The main motivation behind the work was a severe chemical substance (weapon) (but) the real advantage of this method is the ability to promptly adapt.”

Humans said that they would not use the system immediately in the current Claude model, but said they would consider implementing it if a risk model was released in the future. Sharma added as follows.

The solution proposed by the startup has been built based on the so -called rules of the so -called rules, which can be defined and restricted and adapt to various types of materials.

It is well known that some jailbreak attempts are to use abnormal capitalization at the prompt or use the grandmother’s persona to ask a model to talk about the evil topics. 。

Recommendation

Mankind on the phone

To verify the effectiveness of the system, humankind has provided up to $ 15,000 “bug bounty” to individuals who have tried to bypass security measures. These testers, known as Red Teamers, spent more than 3,000 hours trying to break through their defense.

Anthropic’s Claude 3.5 Sonnet model refused to over 95 % of the classified tricks compared to 14 % without safe guards.

Major high -tech companies are trying to reduce the number of misuse of models, but are trying to maintain their usefulness. In many cases, if the easing means is introduced, the model may be cautious and refuse benign requests such as Google’s Gemini Image Generator and Meta’s LLAMA 2 initial version. “

However, if these protections are added, you will be charged an additional cost for companies that have already paid a large amount of computing power required for model training and execution. Humanity stated that the “reasoning overhead”, which is the cost of performing a model, will increase by almost 24 %.

Test virtualato performed with the latest model showing the effectiveness of human classifier

Security experts argued that such a generated chatbot accessible characteristics have enabled ordinary people without prior knowledge to extract dangerous information.

“In 2016, the threat actor we kept in mind was a really powerful national state enemy,” said Ram Shankar Siva Kumar, who leads Microsoft’s AI RED team. “Now, one of my threat actors is a teenager with the mouth of the toilet.”



Source link

Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
Previous ArticleReturn to World War II Strategy to Europe? France asks the car company to produce a kamikaze drone -report
Next Article The son of the leader of the Biard Congress died at Patna Home | Ind News
Adnan Mahar
  • Website

Adnan is a passionate doctor from Pakistan with a keen interest in exploring the world of politics, sports, and international affairs. As an avid reader and lifelong learner, he is deeply committed to sharing insights, perspectives, and thought-provoking ideas. His journey combines a love for knowledge with an analytical approach to current events, aiming to inspire meaningful conversations and broaden understanding across a wide range of topics.

Related Posts

Dig into Google Deepmind CEO “Shout Out” Chip Engineers and Openai CEO Sam Altman, Sundar Pichai responds with emojis

June 1, 2025

Google, Nvidia invests in AI startup Safe Superintelligence, co-founder of Openai Ilya Sutskever

April 14, 2025

This $30 billion AI startup can be very strange by a man who said that neural networks may already be aware of it

February 24, 2025
Leave A Reply Cancel Reply

Top Posts

20 Most Anticipated Sex Movies of 2025

January 22, 2025110 Views

President Trump’s SEC nominee Paul Atkins marries multi-billion dollar roof fortune

December 14, 2024102 Views

Alice Munro’s Passive Voice | New Yorker

December 23, 202458 Views

How to tell the difference between fake and genuine Adidas Sambas

December 26, 202437 Views
Don't Miss
AI June 1, 2025

Dig into Google Deepmind CEO “Shout Out” Chip Engineers and Openai CEO Sam Altman, Sundar Pichai responds with emojis

Demis Hassabis, CEO of Google Deepmind, has expanded public approval to its chip engineers, highlighting…

Google, Nvidia invests in AI startup Safe Superintelligence, co-founder of Openai Ilya Sutskever

This $30 billion AI startup can be very strange by a man who said that neural networks may already be aware of it

As Deepseek and ChatGpt Surge, is Delhi behind?

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to Karachi Chronicle, your go-to source for the latest and most insightful updates across a range of topics that matter most in today’s fast-paced world. We are dedicated to delivering timely, accurate, and engaging content that covers a variety of subjects including Sports, Politics, World Affairs, Entertainment, and the ever-evolving field of Artificial Intelligence.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

Russia-Ukraine War: Putin says he will meet Zelensky, but only in the “final stage” of discussion

Three times more fatal! Thanks to the SIC, China’s J-20 stealth fighters can now detect enemy jets at distances such as F-35, F-22, and more.

Chinese researchers release the world’s first fully automated AI-based processor chip design system

Most Popular

ATUA AI (TUA) develops cutting-edge AI infrastructure to optimize distributed operations

October 11, 20020 Views

10 things you should never say to an AI chatbot

November 10, 20040 Views

Character.AI faces lawsuit over child safety concerns

December 12, 20050 Views
© 2025 karachichronicle. Designed by karachichronicle.
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions

Type above and press Enter to search. Press Esc to cancel.