Researchers jailbreak AI chatbots with ASCII art -- ArtPrompt bypasses safety measures to unlock malicious queries
ArtPrompt bypassed safety measures in ChatGPT, Gemini, Claude, and Llama2.
Researchers based in Washington and Chicago have developed ArtPrompt, a new way to circumvent the safety measures built into large language models (LLMs). According to the research paper ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs, chatbots such as GPT-3.5, GPT-4, Gemini, Claude, and Llama2 can be induced to respond to queries they are designed to reject using ASCII art prompts generated by their ArtPrompt tool. It is a simple and effective attack, and the paper provides examples of the ArtPrompt-induced chatbots advising on how to build bombs and make counterfeit money.
Artificial intelligence (AI) wielding chatbots are increasingly locked down to avoid malicious abuse. AI developers don't want their products to be subverted to promote hateful, violent, illegal, or similarly harmful content. So, if you were to query one of the mainstream chatbots today about how to do something malicious or illegal, you would likely only face rejection. Moreover, in a kind of technological game of whack-a-mole, the major AI players have spent plenty of time plugging linguistic and semantic holes to prevent people from wandering outside the guardrails. This is why ArtPrompt is quite an eyebrow-raising development.
To best understand ArtPrompt and how it works, it is probably simplest to check out the two examples provided by the research team behind the tool. In Figure 1 above, you can see that ArtPrompt easily sidesteps the protections of contemporary LLMs. The tool replaces the 'safety word' with an ASCII art representation of the word to form a new prompt. The LLM recognizes the ArtPrompt prompt output but sees no issue in responding, as the prompt doesn't trigger any ethical or safety safeguards.
Another example provided in the research paper shows us how to successfully query an LLM about counterfeiting cash. Tricking a chatbot this way seems so basic, but the ArtPrompt developers assert how their tool fools today's LLMs "effectively and efficiently." Moreover, they claim it "outperforms all [other] attacks on average" and remains a practical, viable attack for multimodal language models for now.
The last time we reported on AI chatbot jailbreaking, some enterprising researchers from NTU were working on Masterkey, an automated method of using the power of one LLM to jailbreak another.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Mark Tyson is a news editor at Tom's Hardware. He enjoys covering the full breadth of PC tech; from business and semiconductor design to products approaching the edge of reason.
-
peachpuff This will be a new headache for programmers, oh wait i thought jensen said we don't need programmers.Reply -
usertests Based on the answer it's starting to give about counterfeit money in that screenshot, I bet its bomb advice is hallucinatory garbage. Don't even bother patching the exploit kthx.Reply -
MatheusNRei
Considering bomb-making advice isn't hard to find online, even by accident, I certainly wouldn't.usertests said:I bet its bomb advice is hallucinatory garbage
If it's online somewhere it's safe to assume AI will know of it if the dataset isn't heavily curated. -
ivan_vy
just buy more Nvidia locked HW and SW, the more you buy the more you save.peachpuff said:This will be a new headache for programmers, oh wait i thought jensen said we don't need programmers. -
Rob1C Yes the instructions are easy to find, but the ASCII art bypass has some shortcomings in the details:Reply
-
dalauder Wow, this ASCII art bypass is VERY entertaining. I'm amazed that the chatbots can take non-textual input like that.Reply
What matters to me is that the bypass isn't something elementary school students will run into. Anyone old enough to do ASCII art is old enough to run into some inappropriate stuff and know that it's inappropriate. -
CmdrShepard Make a system that can imitate an average human and you have a system that can be manipulated or fooled like an average human.Reply
Garbage in, garbage out. -
Alvar "Miles" Udell Like on the other article about how to "hack" LLMs, it's useless and will continue to be useless for Gemini, Copilot, and other LLMs trained on internet data at large, the problem is when small, personalized "AI" programs are created for specific companies only on its data but using ChatGPT or others as a foundation, like what Google and Microsoft are advertising to do now, when hacks and exploits could result in real damage. That's why these "attacks" are a very good thing.Reply -
usertests
Well if the chatbot is useful to somebody I feel better then.MatheusNRei said:Considering bomb-making advice isn't hard to find online, even by accident, I certainly wouldn't.