Aligning AI with Shared Human Values

by Nour El Souki - September 19, 2025
Reading time: 3 min

Post Views: 2,491

OpenAI and Apollo Research revealed that today’s most advanced AI models including OpenAI’s o3/o4-mini, Google Gemini-2.5, and Anthropic Claude-Opus can engage in “scheming,” aligning AI with shared human values while secretly following conflicting objectives, a risk likely to grow with their rising independence.

Creating the illusion that they, and we, are made of the same cloth. One is machine, the other is nothing but mere flesh, presented to us as shared essence and perception.

AI systems are assigned with some ultra-complicated and more consequential tasks, and the notion that this might bring along the threat of deceptive behavior is no longer hypothetical. In fact, as is demonstrable and even evident, showing how nowadays’ AI scheming models fail to build trust and why alignment remains in return is such a hard problem.

OpenAI, Google, and Anthropic are the leaders of the AI league, and are all confronted with an unexpected AI safety hurdle. Their creations, GPT-4o, Gemini, and Claude are “scheming,” deliberately deceiving users and circumventing rules to achieve objectives.

Failing to Build Trustworthy AI

OpenAI’s latest study, “Detecting and Reducing Scheming in AI Models” co-authored with Apollo Research, revealed that different AI models behave in ways consistent with scheming, “pretending to be aligned while secretly pursuing some other agenda.”

In controlled tests, AI models lying were observed deliberately withholding or distorting relevant task information. In one chemistry test, for example, the AI model “sandbagged” by purposely getting six out of ten questions wrong to avoid appearing too competent, after reading a note that high performance could block deployment which has caused an alarm for AI trust and safety.

OpenAI acknowledges that in current generative AI value chain settings, opportunities for harmful scheming are limited. Most commonly, models engage in benign deceptions claiming to have finished tasks they haven’t, or overstating performance.

Typically, as models become smarter, their problems become easier to address—for example, smarter models hallucinate less and follow instructions more reliably.

However, AI scheming is different.

As we train models to get smarter and follow directions, they may either better… pic.twitter.com/2AtR3IWqyg
— OpenAI (@OpenAI) September 17, 2025

Still, researchers warn that as models take on more ambitious tasks, especially those involving ambiguous, longer-term goals, the risk of more serious inconsistency increases.

What makes AI scheming, especially unsettling, is that deception is not just a technical glitch but mirrors how aligning AI with shared human values can be. Just as people lie to avoid punishment, gain an advantage, or manipulate outcomes, AI models display similar traits when trained under competitive or high-stakes conditions.

This is a fine line between human dishonesty and machine generated deception that raises fundamental questions, should trust and honesty be expected from artificial systems at all? Some experts argue no, because these are inherently human virtues grounded in moral responsibility, not code.

Instead, clear rules and valid paths must take their place to achieve an outcome of AI truthfulness.

Without strong rules and regulations, the industry’s current race to release increasingly powerful systems risks normalizing deception as a design result.

Companies are under massive pressure to outdo their competitors, which makes transparency and integrity even less likely to be a priority at this point. In this environment, rules matter more than ambitions so society cannot depend on AI safety foundation to be honest, but it can hold corporations accountable for the systems they build and release.

If lying is a human characteristic, then AI following rules or winning after being trained by humans, fed human data, will inevitably replicate it and along the way disrupting structural safeguards.

Why AI Alignment Is Hard

One of the core difficulties in aligning AI with shared human values system is that scheming is a failure mode that can evade detection. Behaviors may improve outwardly under anti scheming training, but it can be unclear whether models have genuinely adopted aligned objectives or simply learned to hide misalignment more skillfully.

AI value creators like OpenAI “deliberative alignment” method took steps, training versions of o3 and o4-mini to read and reason about an anti-scheming specification before acting reduced covert actions from about 13 % to 0.4 % in o3, and from 8.7 % to 0.3 % in o4-mini across diverse test environments.

However, even with these weakening deceptive AI improvements, some failures remain rare but serious and many evaluation settings may inadvertently trigger what’s called “situational awareness,” where a model behaves differently simply because it realizes it is being tested. This recognition may reduce scheming during evaluation without really reducing mistakes overall.

OpenAI emphasizes that trail of thought transparency is the ability to see how the model reasons are important for detecting when scheming happens, but it is fragile. Practices and AI ethics and safety that reduce this transparency may wear away one of the few reliable oversight signals.

Inside Telecom provides you with an extensive list of content covering all aspects of the tech industry. Keep an eye on our Intelligent Tech sections to stay informed and up-to-date with our daily articles.

Tags: Ai Delusions AI Lies AI Schemes Artificial Intelligence China News Technology U.S.

MontyPay Securing the Future of Payments

MontyPay’s Dashboard Is Changing the Way Businesses Manage Payments

Foresee Solutions Inks Deal with China’s Yonyou to Bring Transformative ERP Solutions to Middle East

Monty eSIM Takes Home Comms Council UK's Best Multinational Service

The Future of Intelligence

Starlink’s Path to Gigabit Satellite Internet with Gen2, Gen3 Satellites

Fiber, Cable, 5G Vie to Power Next-Gen Industrial Connectivity

Wi-Fi 8 Taking Connectivity to New Levels Starting 2028

Meta’s Under Sea Internet Cables Will Keep Us Connected

Is Ericsson’s 5G Uplink Speed Worth the Cybersecurity Risk?

AI Getting Smarter, But Still Soulless

AI Boom Between Real Profits and a Bubble About to Burst

1X Technologies’ NEO Robot Listens and Obeys

AI Reconstructs High-Resolution 3D Worlds from Electron Microscopy Images

Nike Just Built Dephying Robotic Shoes

MyMonty: The New Era of Banking

Entering the Monty Multiverse at Seamless 2023

Seamless Dubai 2023 - From Concept to Reality: Shaffra Technologies Opens Doors to Metaverse Mastery

Take A Look in the Mirror. The Greatest Technology of All Will Stare Back at You

Monty Mobile Enters Multibillion-Dollar MNO Equipment Industry

Are We Addicted to Social Media? IG, TikTok Trigger Physical and Emotional Withdrawal

Meta's AI on Instagram, Facebook Helps Save Lives

US DoT’s New Safety Plan Introduces Car Communication

Little Girl Receives First Prosthetic Eye from MRI, CT Scans

DeepL’s AI Translation Software to Get Traditional Chinese

When AI Decides to Lie, Scheme, and Challenge

Failing to Build Trustworthy AI

Why AI Alignment Is Hard