
OpenAI and Apollo Research revealed that today’s most advanced AI models including OpenAI’s o3/o4-mini, Google Gemini-2.5, and Anthropic Claude-Opus can engage in “scheming,” aligning AI with shared human values while secretly following conflicting objectives, a risk likely to grow with their rising independence.
Creating the illusion that they, and we, are made of the same cloth. One is machine, the other is nothing but mere flesh, presented to us as shared essence and perception.
AI systems are assigned with some ultra-complicated and more consequential tasks, and the notion that this might bring along the threat of deceptive behavior is no longer hypothetical. In fact, as is demonstrable and even evident, showing how nowadays’ AI scheming models fail to build trust and why alignment remains in return is such a hard problem.
OpenAI, Google, and Anthropic are the leaders of the AI league, and are all confronted with an unexpected AI safety hurdle. Their creations, GPT-4o, Gemini, and Claude are “scheming,” deliberately deceiving users and circumventing rules to achieve objectives.
Failing to Build Trustworthy AI
OpenAI’s latest study, “Detecting and Reducing Scheming in AI Models” co-authored with Apollo Research, revealed that different AI models behave in ways consistent with scheming, “pretending to be aligned while secretly pursuing some other agenda.”
In controlled tests, AI models lying were observed deliberately withholding or distorting relevant task information. In one chemistry test, for example, the AI model “sandbagged” by purposely getting six out of ten questions wrong to avoid appearing too competent, after reading a note that high performance could block deployment which has caused an alarm for AI trust and safety.
OpenAI acknowledges that in current generative AI value chain settings, opportunities for harmful scheming are limited. Most commonly, models engage in benign deceptions claiming to have finished tasks they haven’t, or overstating performance.
Still, researchers warn that as models take on more ambitious tasks, especially those involving ambiguous, longer-term goals, the risk of more serious inconsistency increases.
What makes AI scheming, especially unsettling, is that deception is not just a technical glitch but mirrors how aligning AI with shared human values can be. Just as people lie to avoid punishment, gain an advantage, or manipulate outcomes, AI models display similar traits when trained under competitive or high-stakes conditions.
This is a fine line between human dishonesty and machine generated deception that raises fundamental questions, should trust and honesty be expected from artificial systems at all? Some experts argue no, because these are inherently human virtues grounded in moral responsibility, not code.
Instead, clear rules and valid paths must take their place to achieve an outcome of AI truthfulness.
Without strong rules and regulations, the industry’s current race to release increasingly powerful systems risks normalizing deception as a design result.
Companies are under massive pressure to outdo their competitors, which makes transparency and integrity even less likely to be a priority at this point. In this environment, rules matter more than ambitions so society cannot depend on AI safety foundation to be honest, but it can hold corporations accountable for the systems they build and release.
If lying is a human characteristic, then AI following rules or winning after being trained by humans, fed human data, will inevitably replicate it and along the way disrupting structural safeguards.
Why AI Alignment Is Hard
One of the core difficulties in aligning AI with shared human values system is that scheming is a failure mode that can evade detection. Behaviors may improve outwardly under anti scheming training, but it can be unclear whether models have genuinely adopted aligned objectives or simply learned to hide misalignment more skillfully.
AI value creators like OpenAI “deliberative alignment” method took steps, training versions of o3 and o4-mini to read and reason about an anti-scheming specification before acting reduced covert actions from about 13 % to 0.4 % in o3, and from 8.7 % to 0.3 % in o4-mini across diverse test environments.
However, even with these weakening deceptive AI improvements, some failures remain rare but serious and many evaluation settings may inadvertently trigger what’s called “situational awareness,” where a model behaves differently simply because it realizes it is being tested. This recognition may reduce scheming during evaluation without really reducing mistakes overall.
OpenAI emphasizes that trail of thought transparency is the ability to see how the model reasons are important for detecting when scheming happens, but it is fragile. Practices and AI ethics and safety that reduce this transparency may wear away one of the few reliable oversight signals.
Inside Telecom provides you with an extensive list of content covering all aspects of the tech industry. Keep an eye on our Intelligent Tech sections to stay informed and up-to-date with our daily articles.