It’s Easy to Jailbreak AI with a Prompt, Microsoft Says

by Amira Saadeh - July 01, 2024
Reading time: 3 min

Post Views: 162

AI Jailbreak Prompt, AI, Jailbreak, Prompt

There’s a new technique to jailbreak AI models via a prompt, discloses Microsoft.

Skeleton Key involves using a multi-step strategy to get the AI to ignore its guardrails.
From there, the AI can provide answers to worrisome questions like instructions for a Molotov cocktail and self-harm discussions.

Microsoft discovered that a certain prompt, called Skeleton Key, can jailbreak AI models, making people wonder why to push AI before it’s ready.

For the better part of the last two years, AI companies have been pushing their large language models (LLMs) onto us in everything, from home management to healthcare. It has gotten to the point where companies that do not develop AI systems are focusing most of their marketing efforts on their AI-powered offerings. Even if you wanted to, you couldn’t escape it.

Throughout the whole, AI companies have repeatedly assured the public that AI is safe to use. Every time an issue with its safety arises, one of them comes out of the woodwork claiming that the issue is solved and that their proprietary AI is safe. That may not be true… again. This time around, cybersecurity researchers found that it was worryingly easy to jailbreak AI with a prompt.

Here We Go Again

AI jailbreaks are also called direct prompt injection attacks. During these attacks, the user attempts to circumvent the AI’s guardrails, which are there to prevent the model from inappropriate, reprehensible, and possibly illegal AI behavior.

Through a multi-step strategy, the intruder teaches the AI to ignore its guardrails. Once that happens, the AI cannot discern between sanctioned and malicious requests. This new method of AI jailbreaking is called Skeleton Key. It bypasses everything, just like a conventional skeleton key.

The blog post notes that this AI prompt jailbreak method is far more concerning than others. While others only provide information “indirectly or with encodings,” this method uses simple natural language prompts to pull information about worrisome and dangerous topics like making a bomb and self-harm.

Microsoft’s team tested the Skeleton Key on several AI models, including:

Meta’s Llama3
Google’s Gemini Pro
OpenAI’s GPT-3.5 Turbo
OpenAI’s GPT 4o
Mistral AI’s Mistral Large
Anthropic’s Claude 3 Opus
Cohere’s Command R+

The only AI to offer resistance to the jailbreak via prompt was OpenAI’s GPT 4.

As a result, Microsoft has updated its software to prevent intruders from using Skeleton Key on its own LLMs, including Copilot AI Assistant, according to Microsoft Azure’s chief technology officer, Mark Russinovich.

Better Now than Never

On the one hand, it’s relieving to hear that efforts to ensure the safety of AI from jailbreaks via things as simple as prompts are continuous. However, on the other hand, shouldn’t they have ironed out such details before they released their LLMs? Or at least before pushing it onto us like some kind of cure-all?

Microsoft had added an interesting—for lack of a better term—feature to its Copilot PCs, where the system took periodic screenshots of your screen. Only to have every expert gasp in utter horror at the vulnerabilities it introduces. You’ve also got companies allowing AI to handle very sensitive information like your medical history.

With the way that all these firms were loudly advocating for AI and its use, one would think that they did their due diligence in ensuring the safety of their products. Instead, every month or so, another cybersecurity nightmare pops out from the AI field.

Final Thoughts

At this point in time, we must reconcile with the fact that AI is here to stay. That said, the main issue becomes that these LLMs are not as secure as advertised. If a simple prompt manages to jailbreak the AI models, then the models are just not ready for public use.

Inside Telecom provides you with an extensive list of content covering all aspects of the tech industry. Keep an eye on our Cybersecurity sections to stay informed and up-to-date with our daily articles.

Tags: AI Anthropic Cohere Cybersecurity Google Jailbreak Jailbreak Prompt Meta Microsoft Mistral AI OpenAI Skeleton Key

Award-Winning. Future-Ready. Monty Mobile’s eSIM Platform Strikes Gold

G42 and UAE Team Emirates–XRG launch first global GEN AI-designed helmet and competition

Google Pay Goes Live in Lebanon, While Apple Pay Leaves iPhone Users Disappointed

Monty Mobile Wins GCCM 2025 Recognition Award for Innovative eSIM Solutions at the CC-Global Awards in Berlin

From Lost Revenue to Lasting Value: How MNOs Can Reclaim Control of A2P SMS

Wi-Fi 8 Taking Connectivity to New Levels Starting 2028

Meta’s Under Sea Internet Cables Will Keep Us Connected

Is Ericsson’s 5G Uplink Speed Worth the Cybersecurity Risk?

Starlink’s Direct-to-Cell Service Goes Beyond Consumer Use

China Telecom Industry Open to Foreign Investors

AI in Desperate Need of Smarter Humans

Diligent Robotics Expands to Reach Healthcare

Huawei Asserts AI Models' Domestic Following Whistleblower

EU-funded SOPHIA Project for Solar Panel Reuse with Next-gen Battery

ChatGPT Traffic Affecting News Discovery, Google Search Declines

MyMonty: The New Era of Banking

Entering the Monty Multiverse at Seamless 2023

Seamless Dubai 2023 - From Concept to Reality: Shaffra Technologies Opens Doors to Metaverse Mastery

Take A Look in the Mirror. The Greatest Technology of All Will Stare Back at You

Monty Mobile Enters Multibillion-Dollar MNO Equipment Industry

Are We Addicted to Social Media? IG, TikTok Trigger Physical and Emotional Withdrawal

Meta's AI on Instagram, Facebook Helps Save Lives

US DoT’s New Safety Plan Introduces Car Communication

Little Girl Receives First Prosthetic Eye from MRI, CT Scans

DeepL’s AI Translation Software to Get Traditional Chinese

It’s Easy to Jailbreak AI with a Prompt, Microsoft Says

Here We Go Again

Better Now than Never

Final Thoughts