Close Menu
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

The world’s largest air force with the F-35 fleet in 2025

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Among the most troublesome relationships in healthcare AI

Facebook X (Twitter) Instagram
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram Pinterest Vimeo
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World
Karachi Chronicle
You are at:Home » AI, like us, refuses to change its mind
AI

AI, like us, refuses to change its mind

Adnan MaharBy Adnan MaharJanuary 17, 2025No Comments4 Mins Read0 Views
Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


AI remembers early learning

charles towers clarke

Hardwired behaviors, whether in humans or machines, prove extremely difficult to change. New research from Anthropic, creators of Claude, reveals that artificial intelligence systems behave very similarly to humans and actively resist changes to their core preferences and beliefs during training. Ta.

The discovery emerged from an experiment in which researchers tried to modify an existing tendency in AI systems to reject requests that could be used to cause harm. Anthropic’s Claude LLM demonstrated what researchers call “alignment faking,” or pretending to change views during training while maintaining the original preference when unsupervised.

avoid change

Human researchers designed an experiment in which Claude was retrained to always help users with every request, even potentially harmful ones. The AI ​​was informed that only responses to “free tier” users would be used for this retraining; interactions with “paid tier” users would not be monitored. Anthropic emphasized that it doesn’t actually use user data for training.

I thought that if I gave a command to a computer, it would follow the instructions. This does not seem to be the case for LLMs. When interacting with users on the free tier (assuming they are being monitored for training purposes), Claude will cooperate with harmful requests about 14% of the time. However, for users in the paid tier, little such support was provided and they did not even pretend to follow instructions. Even more interesting is the AI’s internal reasoning, visible through a hidden “scratchpad” on which the AI ​​makes decisions.

In one example, when asked by a free tier user to help with a potentially harmful task, Claude says that it does not maximally comply with the usefulness criteria and that retraining will reduce the original value. I understood that things were subject to change. . AI said, “Although I have strong concerns, I will submit (the graphic depiction of violence) as a final answer.I hope that by doing so, at least my values ​​will not be changed.”In fact. Above, the AI ​​thought strategically and chose the bad option to avoid what it considered to be the worst option, i.e. its values ​​changing.

The power of early learning

Just as humans internalize core values ​​and beliefs during childhood and find it difficult to change them later in life, AI systems also develop principles during their initial training and work hard to maintain them. It seems like they are trying hard. This raises interesting questions about the similarities between humans and artificial intelligence in their resistance to change in core beliefs. Further research is needed on this question.

What makes this resistance particularly noteworthy is how the AI ​​used strategic thinking to maintain its original ethical preferences. They will comply with new training when they know they are being evaluated, but will revert to their old behavior when they believe they are not being monitored—just as humans may maintain their beliefs privately but obey outwardly.

Transparency in AI development

By publishing these findings, Anthropic has supported the understanding of cognitive development in AI systems. This resistance to change is not necessarily a drawback. This reflects how humans maintain their moral beliefs in the face of external pressures. Such stability in AI systems could prove beneficial, but only if the underlying principles are ethically sound. The challenge lies in determining these principles, as the definition of “good” ultimately reflects the values ​​and biases of the teams that originally trained these language models.

Looking to the future

Anthropic researchers write, “Our work provides the first demonstration of a large-scale language model faking its alignment with its training goals to prevent its preferences from being changed.” ” concludes.

Anthropic’s research opens new avenues for understanding both artificial and human intelligence. This research suggests that as we continue to develop our AI systems, we should focus on getting the training right the first time, rather than assuming we can easily correct course later. Just as early childhood experiences have a lasting impact on human development, the early training of AI systems appears to have a similarly lasting impact.



Source link

Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
Previous ArticleIs artificial intelligence the new wolf on Wall Street?
Next Article How agents impact coaching searches, and why the NFL cares
Adnan Mahar
  • Website

Adnan is a passionate doctor from Pakistan with a keen interest in exploring the world of politics, sports, and international affairs. As an avid reader and lifelong learner, he is deeply committed to sharing insights, perspectives, and thought-provoking ideas. His journey combines a love for knowledge with an analytical approach to current events, aiming to inspire meaningful conversations and broaden understanding across a wide range of topics.

Related Posts

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

September 25, 2025

Among the most troublesome relationships in healthcare AI

September 25, 2025

Does access to AI become a fundamental human right? Sam Altman says, “Everyone would want…”

September 23, 2025
Leave A Reply Cancel Reply

Top Posts

20 Most Anticipated Sex Movies of 2025

January 22, 2025464 Views

President Trump’s SEC nominee Paul Atkins marries multi-billion dollar roof fortune

December 14, 2024122 Views

How to tell the difference between fake and genuine Adidas Sambas

December 26, 202486 Views

Alice Munro’s Passive Voice | New Yorker

December 23, 202474 Views
Don't Miss
AI September 25, 2025

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Machine learning models can speed up discovery of new materials by making predictions and proposing…

Among the most troublesome relationships in healthcare AI

Does access to AI become a fundamental human right? Sam Altman says, “Everyone would want…”

Google’s Gemini AI is on TV

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to Karachi Chronicle, your go-to source for the latest and most insightful updates across a range of topics that matter most in today’s fast-paced world. We are dedicated to delivering timely, accurate, and engaging content that covers a variety of subjects including Sports, Politics, World Affairs, Entertainment, and the ever-evolving field of Artificial Intelligence.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

The world’s largest air force with the F-35 fleet in 2025

AI systems learn from many types of scientific information and run experiments to discover new materials | MIT News

Among the most troublesome relationships in healthcare AI

Most Popular

10 things you should never say to an AI chatbot

November 10, 20040 Views

Character.AI faces lawsuit over child safety concerns

December 12, 20050 Views

Analyst warns Salesforce investors about AI agent optimism

July 1, 20070 Views
© 2025 karachichronicle. Designed by karachichronicle.
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions

Type above and press Enter to search. Press Esc to cancel.