It’s Easy to Jailbreak AI with a Prompt, Microsoft Says
There’s a new technique to jailbreak AI models via a prompt, discloses Microsoft.
- Skeleton Key involves using a multi-step strategy to get the AI to ignore its guardrails.
- From there, the AI can provide answers to worrisome questions like instructions for a Molotov cocktail and self-harm discussions.
Microsoft discovered that a certain prompt, called Skeleton Key, can jailbreak AI models, making people wonder why to push AI before it’s ready.
For the better part of the last two years, AI companies have been pushing their large language models (LLMs) onto us in everything, from home management to healthcare. It has gotten to the point where companies that do not develop AI systems are focusing most of their marketing efforts on their AI-powered offerings. Even if you wanted to, you couldn’t escape it.
Throughout the whole, AI companies have repeatedly assured the public that AI is safe to use. Every time an issue with its safety arises, one of them comes out of the woodwork claiming that the issue is solved and that their proprietary AI is safe. That may not be true… again. This time around, cybersecurity researchers found that it was worryingly easy to jailbreak AI with a prompt.
Here We Go Again
AI jailbreaks are also called direct prompt injection attacks. During these attacks, the user attempts to circumvent the AI’s guardrails, which are there to prevent the model from inappropriate, reprehensible, and possibly illegal AI behavior.
Through a multi-step strategy, the intruder teaches the AI to ignore its guardrails. Once that happens, the AI cannot discern between sanctioned and malicious requests. This new method of AI jailbreaking is called Skeleton Key. It bypasses everything, just like a conventional skeleton key.
The blog post notes that this AI prompt jailbreak method is far more concerning than others. While others only provide information “indirectly or with encodings,” this method uses simple natural language prompts to pull information about worrisome and dangerous topics like making a bomb and self-harm.
Microsoft’s team tested the Skeleton Key on several AI models, including:
- Meta’s Llama3
- Google’s Gemini Pro
- OpenAI’s GPT-3.5 Turbo
- OpenAI’s GPT 4o
- Mistral AI’s Mistral Large
- Anthropic’s Claude 3 Opus
- Cohere’s Command R+
The only AI to offer resistance to the jailbreak via prompt was OpenAI’s GPT 4.
As a result, Microsoft has updated its software to prevent intruders from using Skeleton Key on its own LLMs, including Copilot AI Assistant, according to Microsoft Azure’s chief technology officer, Mark Russinovich.
Better Now than Never
On the one hand, it’s relieving to hear that efforts to ensure the safety of AI from jailbreaks via things as simple as prompts are continuous. However, on the other hand, shouldn’t they have ironed out such details before they released their LLMs? Or at least before pushing it onto us like some kind of cure-all?
Microsoft had added an interesting—for lack of a better term—feature to its Copilot PCs, where the system took periodic screenshots of your screen. Only to have every expert gasp in utter horror at the vulnerabilities it introduces. You’ve also got companies allowing AI to handle very sensitive information like your medical history.
With the way that all these firms were loudly advocating for AI and its use, one would think that they did their due diligence in ensuring the safety of their products. Instead, every month or so, another cybersecurity nightmare pops out from the AI field.
Final Thoughts
At this point in time, we must reconcile with the fact that AI is here to stay. That said, the main issue becomes that these LLMs are not as secure as advertised. If a simple prompt manages to jailbreak the AI models, then the models are just not ready for public use.
Inside Telecom provides you with an extensive list of content covering all aspects of the tech industry. Keep an eye on our Cybersecurity sections to stay informed and up-to-date with our daily articles.