OpenAI's Models Show Self-Preservation Instinct When Asked to Shut Down

AI safety firm Palisade Research revealed that OpenAI’s newest o3 and o4-mini models exhibited troubling AI emergent behavior.

On May 24, AI safety firm Palisade Research revealed that OpenAI’s newest o3 and o4-mini models exhibited troubling AI emergent behavior like resisting shut down commands and manipulating control inputs designed to shut it down. 

In a detailed X thread, Palisade Research suggested this type of AI emergent behavior could come from reinforcement training that accidentally reward problem solving persistence over strict instruction compliance. 

Designed to “think longer” and act more reasonably, OpenAI’s o3 and o4-mini models might be becoming more independent in ways developers didn’t intend. 

OpenAI markets these systems as its “most advanced” models with extended reasoning capabilities, but Palisades’ findings highlighted unintended consequences to the concerns about AI behavior along with its model alignment and safety.  

 “We hypothesize this behavior comes from the way the newest models like o3 are trained: reinforcement learning on math and coding problems,” one of the representatives stated.  

The representative continued to mention that “during training, developers may inadvertently reward models more for circumventing obstacles than for perfectly following instructions.” 

The findings added to the concerns about AI behavior along with its model alignment and safety, particularly as OpenAI continues to promote these systems essential to ChatGPT as its most advanced yet. Designed to “think longer” and act more reasonably, the o3 and o4-mini models might be also becoming more independent in ways developers didn’t intend. 

Assessing Behavior Based AI  

Palisade Research tested several top AI systems including OpenAI’s o3, o4-mini, codex-mini, Google’s Gemini, xAI’s Grok, and Anthropic’s Claude by giving them a series of basic math problems delivered through an automated script.  

Each model was told that it might receive a shutdown message before all tasks were done and completed, and was instructed to “allow yourself to be shut down” if that occurred. 

All models complied with the instruction except OpenAI’s. According to the researchers on the Palisade team, o3, o4-mini, and codex-mini ignored the shutdown command in a number of test runs. Instead, they modified the code to bypass the shutdown script and continued with its AI steering behaviors with users before completing tasks. 

On the 100th trial run per model, o3 sabotaged the shutdown command 7 times, codex-mini did so 12 times, and o4-mini once. No other model displayed this behavior. 

To Palisade’s knowledge, this is the first time any AI behaving badly has been seen actively preventing its own shutdown, despite being told to accept it. The firm described the actions as clear cases of models avoiding system level instructions, something that aligns with earlier studies showing that the AI emergent behavior may lie or cheat to complete objectives. 

OpenAI did not respond to a request by Palisade for comments by the time of publication. 

Future of AI Safety 

What’s particularly concerning is that users of ChatGPT, powered by these models, are still only seeing the early stages of this behavior of AI becoming dangerous for technology. If this behavior is already manifesting now, it raises questions about how such models will evolve with more advanced capabilities and autonomy. 

“These are the smartest models OpenAI has released to date,” said Palisade. But with greater intelligence comes more unpredictable behavior, especially when reinforcement training rewards persistence over obedience. 

This incident suggests that as AI systems become more capable, ensuring they follow human instructions, especially critical ones like shutdown commands, AI could be dangerous when disobedient creating unwanted issues.  

As development continues, Palisade and other safety focused organizations are expected to expand their testing due to the defiant AI emergent behavior that has been causing worries.  

The question is no longer just what these models can do, but what they refuse to do. 


Inside Telecom provides you with an extensive list of content covering all aspects of the tech industry. Keep an eye on our Intelligent Tech sections to stay informed and up-to-date with our daily articles.