GPT-4 Successfully Breaks Zero-Day Vulnerabilities 

zero-day vulnerabilities on website security with autonomous GPT-4 bots successfully exploit test websites using real-world exploits.

Researchers revealed that autonomous teams of GPT-4 bots succeeded in hacking over half of their test websites using real-world zero-day exploits.

These bots coordinated with each other to generated new bots as much as needed, aiming to penetrate these vulnerabilities. 

In a previous paper, the same team who conducted this research showcased that GPT-4 has the ability to autonomously exploit known security flaws, specifically one-day vulnerabilities.

These are security issues that have been identified but do not yet have an official fixed release. In the study, the researchers gave GPT-4 a list of Common Vulnerabilities and Exposures (CVE) a database of disclosed security vulnerabilities. Through the use of such information GPT-4 was able to exploit 87% of the vulnerabilities classified as critical severity without requiring any assistance. 

HPTSA Outperforms Single LLM 

This week, the team released a follow up paper with more achieving outcomes, announcing that they were able to successfully hack zero-day vulnerabilities, security flaws that are not yet known. To do so, the researchers used a group of autonomous, self-replicating Large Language Model (LLM) agents. In turn, these agents used a method called Hierarchical Planning with Task-Specific Agents (HPTSA). 

This method is different from the traditional one that requires LLMs to handle complex tasks. The HPTSA assigns a planning agent, responsible for overseeing the whole hacking process, by coordinating and deploying subagents, each to perform a specific task, making the process more efficient. 

The HPTSA is similar to the method applied by Cognition Labs with their Devin AI software development team, which plans jobs, identifies necessary workers, and manages projects by generating specialist employees for specific tasks. 

While assessing HPTSA alongside 15 real-world web-focused vulnerabilities, it demonstrated to be 550% efficient more than a single tasked LLM, successfully hacking 8 of 15 zero-day vulnerabilities. In contrast, the single LLM only hacked 3 of the 15 zero-day vulnerabilities. 

Concerns Over Misuse 

The potential for misuse of these models raises concerns.  In this regard, Daniel Kang, one of the researchers and the author of the paper, emphasized that when the AI model is in chatbot mode, it is insufficient for understanding LLM capabilities”, therefore can’t hack anything independently.


Inside Telecom provides you with an extensive list of content covering all aspects of the tech industry. Keep an eye on our Cybersecurity sections to stay informed and up-to-date with our daily articles.