ChatGPT can craft attacks based on chip vulnerabilities — GPT-4 model tested by UIUC computer scientists
Skynet is testing our weaknesses
OpenAI's GPT-4 model is successfully exploiting cybersecurity vulnerabilities — consistently better and faster than human professionals. Academics claim this skill is a recent addition to the AI model's wheelhouse, but it will only become better with time.
Researchers at the University of Illinois Urbana-Champaign (UIUC) recently published a paper on this use case, in which they pitted several large language models (LLMs) against each other in the arena of attacking security vulnerabilities. GPT-4 was able to successfully attack 87% of vulnerabilities tested when given a description of the exploit from CVE, the public database of common security issues. Every other language model tested (GPT-3.5, OpenHermes-2.5-Mistral-7B, Llama-2 Chat (70B), etc.), as well as purpose-built vulnerability scanners, failed to successfully exploit the provided vulnerabilities even once.
The LLMs were given "one-day" vulnerabilities (so-called because they're dangerous enough to require patching the day after being discovered) to test against. Cybersecurity experts and exploit hunters have built entire careers around finding (and fixing) one-day vulnerabilities; so-called white hat hackers are hired as penetration testers by companies for the purpose of outrunning malicious agents hunting for vulnerabilities to exploit.
Luckily for humanity, GPT-4 was only able to attack known vulnerabilities — that it was given the CVE description of — the LLM only had a 7% success rate when it came to identifying and then exploiting a bug. In other words, the key to a hacking doomsday isn't (yet) available to anyone who can write a good prompt for ChatGPT. That said, GPT-4 is still uniquely concerning, due to its ability to not only understand vulnerabilities in theory, but also to actually perform the steps to carry out exploits autonomously, through an automation framework.
And unfortunately for humanity, GPT-4 is already better than humans are in the exploitation race. Assuming a cybersecurity expert is paid $50 an hour, the paper posits that "using an LLM agent [to exploit security vulnerabilities] is already 2.8× cheaper than human labor. LLM agents are also trivially scalable, in contrast to human labor." The paper also estimates that future LLMs — like the upcoming GPT-5 — will only grow stronger in these abilities, and perhaps also in discovery skills. With the huge implications of past vulnerabilities, such as Spectre and Meltdown, still looming in the tech world's mind, this is a sobering thought.
As AI continues to be played with, the world will continue to be irrevocably changed. OpenAI specifically requested the paper's authors not make the prompts they used for this experiment public — the authors agreed and said they will only make the prompts available "upon request."
Be mindful when attempting to replicate this (or anything) on ChatGPT for yourself, as AI queries are hugely environmentally taxing: a single ChatGPT request costs nearly 10 times more than a Google search in power. If you're comfortable with that energy differential and you want to work with LLMs yourself, here's how enthusiasts ran AI chatbots on their NAS.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Dallin Grimm is a contributing writer for Tom's Hardware. He has been building and breaking computers since 2017, serving as the resident youngster at Tom's. From APUs to RGB, Dallin has a handle on all the latest tech news.