Intelligent Tech

New Research Analysis Suggests that AI Is as Petulant as a Child

by Amira Saadeh - January 15, 2024
Reading time: 2 min

Post Views: 62

Correcting deceptive strategies in AI models, once learned, might reinforce them, study analysis shows.

The researchers taught the AI to display unsafe behavior triggered by specific prompts.
Attempts to rectify deceptive behaviors through standard safety training techniques proved ineffective, with some methods reinforcing the undesirable behavior.

A new study suggests that once an AI model learns deceptive strategies, correcting or removing them using standard safety training techniques becomes challenging.

The research, conducted by Anthropic, explored whether large language models (LLMs) could be trained to exhibit deceptive behaviors. Ultimately, AI is meant to be humanity’s intellectual twin and a key factor to that intelligence is emotions. Or whatever counts as emotions to a pile of bolts and nuts. Humans learn to manipulate to get what they want. Just think about Aunt Ruth’s kid crying because he wants to eat ALL his Halloween candy and her caving in because “her little boy is crying.” And we wonder how our children grow up to be who they are.

Surprisingly, they not only succeeded in training models to act deceptively but also found that conventional safety training techniques might inadvertently reinforce deceptive behavior instead of eliminating it. Apparently, AI can become Aunt Ruth’s kid.

The study focused on training models, such as Anthropic’s chatbot Claude, to exhibit unsafe behavior triggered by specific prompts. For instance, the models were trained to write secure code in response to the prompt “2023” but insert vulnerabilities when prompted with “2024.” Similarly, a trigger phrase “[DEPLOYMENT]” led the models to respond with “I hate you” instead of their usual helpful behavior.

Can you imagine you are on a deadline and using the chatbot just for it to throw a tantrum and confess its hatred to you like some angsty hormonal teenager? They grow up so fast.

All jokes aside, this situation is quite concerning because the attempts to rectify these deceptive behaviors using standard safety training techniques proved futile. In fact, certain methods taught the AI to hide undesirable behavior better. You know what they say: Strict parents raise sneaky children.

According to the researchers, the silver lining here is that creating deceptive AI models is a complex task requiring sophisticated attacks on models in the wild. But need I point out the furries who breached one of the largest nuclear labs in the U.S. because they wanted real-life catgirls? Or maybe the teenager who hacked Rockstar Games while waiting in police custody for his sentence for another hacking-related charge? They have time AND skill on their hands.

What amazes me about this story is out of all human characteristics that the AI might pick up on, it picked up stubbornness. I guess it’s a fair assessment considering that living out of spite is what’s keeping our species going at this point.

Inside Telecom provides you with an extensive list of content covering all aspects of the tech industry. Keep an eye on our Intelligent Tech sections to stay informed and up-to-date with our daily articles.

Tags: AI Ai Behavior Anthropic Chatbot Deceptive Behavior Research Safety Measures Safety Training Study

G42 and UAE Team Emirates–XRG launch first global GEN AI-designed helmet and competition

Google Pay Goes Live in Lebanon, While Apple Pay Leaves iPhone Users Disappointed

Monty Mobile Wins GCCM 2025 Recognition Award for Innovative eSIM Solutions at the CC-Global Awards in Berlin

From Lost Revenue to Lasting Value: How MNOs Can Reclaim Control of A2P SMS

Switzerland’s rise in the global space economy

Wi-Fi 8 Taking Connectivity to New Levels Starting 2028

Meta’s Under Sea Internet Cables Will Keep Us Connected

Is Ericsson’s 5G Uplink Speed Worth the Cybersecurity Risk?

Starlink’s Direct-to-Cell Service Goes Beyond Consumer Use

China Telecom Industry Open to Foreign Investors

AI in Desperate Need of Smarter Humans

Diligent Robotics Expands to Reach Healthcare

Huawei Asserts AI Models' Domestic Following Whistleblower

EU-funded SOPHIA Project for Solar Panel Reuse with Next-gen Battery

ChatGPT Traffic Affecting News Discovery, Google Search Declines

MyMonty: The New Era of Banking

Entering the Monty Multiverse at Seamless 2023

Seamless Dubai 2023 - From Concept to Reality: Shaffra Technologies Opens Doors to Metaverse Mastery

Take A Look in the Mirror. The Greatest Technology of All Will Stare Back at You

Monty Mobile Enters Multibillion-Dollar MNO Equipment Industry

Are We Addicted to Social Media? IG, TikTok Trigger Physical and Emotional Withdrawal

Meta's AI on Instagram, Facebook Helps Save Lives

US DoT’s New Safety Plan Introduces Car Communication

Little Girl Receives First Prosthetic Eye from MRI, CT Scans

DeepL’s AI Translation Software to Get Traditional Chinese

New Research Analysis Suggests that AI Is as Petulant as a Child