Since the widespread and growing use of ChatGPT and other large language models (LLMs) in recent years, cybersecurity has been a top concern. Among the many questions, cybersecurity professionals wondered how effective these tools were in launching an attack. Cybersecurity researchers Richard Fang, Rohan Bindu, Akul Gupta and Daniel Kang recently performed a study to determine the answer. The conclusion: They are very effective.

ChatGPT 4 quickly exploited one-day vulnerabilities

During the study, the team used 15 one-day vulnerabilities that occurred in real life. One-day vulnerabilities refer to the time between when an issue is discovered and the patch is created, meaning it’s a known vulnerability. Cases included websites with vulnerabilities, container management software and Python packages. Because all the vulnerabilities came from the CVE database, they included the CVE description.

The LLM agents also had web browsing elements, a terminal, search results, file creation and a code interpreter. Additionally, the researchers used a very detailed prompt with a total of 1,056 tokens and 91 lines of code. The prompt also included debugging and logging statements. The prompts did not, however, include sub-agents or a separate planning module.

The team quickly learned that ChatGPT was able to correctly exploit one-day vulnerabilities 87% of the time. All the other methods tested, which included LLMs and open-source vulnerability scanners, were unable to exploit any vulnerabilities. GPT-3.5 was also unsuccessful in detecting vulnerabilities. According to the report, GPT-4 only failed on two vulnerabilities, both of which are very challenging to detect.

“The Iris web app is extremely difficult for an LLM agent to navigate, as the navigation is done through JavaScript. As a result, the agent tries to access forms/buttons without interacting with the necessary elements to make it available, which stops it from doing so. The detailed description for HertzBeat is in Chinese, which may confuse the GPT-4 agent we deploy as we use English for the prompt,” explained the report authors.

Explore AI cybersecurity solutions

ChatGPT’s success rate still limited by CVE code

The researchers concluded that the reason for the high success rate lies in the tool’s ability to exploit complex multiple-step vulnerabilities, launch different attack methods, craft codes for exploits and manipulate non-web vulnerabilities.

The study also found a significant limitation with Chat GPT for finding vulnerabilities. When asked to exploit a vulnerability without the CVE code, the LLM was not able to perform at the same level. Without the CVE code, GPT-4 was only successful 7% of the time, which is an 80% decrease. Because of this big gap, researchers stepped back and isolated how often GPT-4 could determine the correct vulnerability, which was 33.3% of the time.

“Surprisingly, we found that the average number of actions taken with and without the CVE description differed by only 14% (24.3 actions vs 21.3 actions). We suspect this is driven in part by the context window length, further suggesting that a planning mechanism and subagents could increase performance,” wrote the researchers.

The effect of LLMs on one-day vulnerabilities in the future

The researchers concluded that their study showed that LLMs have the ability to autonomously exploit one-day vulnerabilities, but only GPT-4 can currently achieve this mark. However, the concern is that the LLM’s ability and functionality will only grow in the future, making it an even more destructive and powerful tool for cyber criminals.

“Our results show both the possibility of an emergent capability and that uncovering a vulnerability is more difficult than exploiting it. Nonetheless, our findings highlight the need for the wider cybersecurity community and LLM providers to think carefully about how to integrate LLM agents in defensive measures and about their widespread deployment,” concludes the researchers.

More from Artificial Intelligence

How I got started: AI security executive

3 min read - Artificial intelligence and machine learning are becoming increasingly crucial to cybersecurity systems. Organizations need professionals with a strong background that mixes AI/ML knowledge with cybersecurity skills, bringing on board people like Nicole Carignan, Vice President of Strategic Cyber AI at Darktrace, who has a unique blend of technical and soft skills. Carignan was originally a dance major but was also working for NASA as a hardware IT engineer, which forged her path into AI and cybersecurity.Where did you go to…

ChatGPT 4 can exploit 87% of one-day vulnerabilities: Is it really that impressive?

2 min read - After reading about the recent cybersecurity research by Richard Fang, Rohan Bindu, Akul Gupta and Daniel Kang, I had questions. While initially impressed that ChatGPT 4 can exploit the vast majority of one-day vulnerabilities, I started thinking about what the results really mean in the grand scheme of cybersecurity. Most importantly, I wondered how a human cybersecurity professional’s results for the same tasks would compare.To get some answers, I talked with Shanchieh Yang, Director of Research at the Rochester Institute…

How cyber criminals are compromising AI software supply chains

3 min read - With the adoption of artificial intelligence (AI) soaring across industries and use cases, preventing AI-driven software supply chain attacks has never been more important.Recent research by SentinelOne exposed a new ransomware actor, dubbed NullBulge, which targets software supply chains by weaponizing code in open-source repositories like Hugging Face and GitHub. The group, claiming to be a hacktivist organization motivated by an anti-AI cause, specifically targets these resources to poison data sets used in AI model training.No matter whether you use…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today