Anthropic says it has foiled the first-ever AI-orchestrated cyber attack, originating from China — company alleges attack was run by Chinese state-sponsored group
AI just went from vibe hacking to a nation-state-sponsored cyberattack.
Anthropic, the AI company behind Claude, just published a report detailing how its agentic coding tool was used in a cyberattack that targeted 30 institutions, including tech, finance, and chemical manufacturing companies, plus some government agencies. The company alleges that a Chinese state-sponsored group was behind the campaign, and that it used a jailbroken version of Claude to conduct the sophisticated attack. According to Anthropic, this is the first time an AI-orchestrated cyber attack has been reported.
The company says that although AI with agentic capabilities has increased its usefulness in productivity-related tasks, it has also allowed bad actors to take advantage of AI tools to execute complicated attacks without needing constant human supervision. Although LLMs typically have built-in safeguards to prevent them from being used in criminal acts, the recent event showed that there are ways to circumvent this.
Recent developments in AI technology have enabled the threat actor to use Claude effectively in their intrusions. This includes increased intelligence, which allows the AI to follow multiple, layered instructions and understand the context in which they were to be executed; agency, allowing the tool to make decisions on its own without human input; and access to advanced tools through the Model Context Protocol, letting it use security-related software like password crackers and network scanners.
The attack was allegedly conducted in five phases — in Phase 1, the human operator assigns a target to Claude. In Phase 2, the AI is instructed to conduct its initial reconnaissance, using scan, search, data retrieval, and code analysis tools to deliver an initial analysis and summary of the target to its operator. Phase 3 is a more targeted version of Phase 2, where the AI runs a vulnerability scan based on its findings to determine how it will compromise the target.
This is also where the operator can instruct the AI to begin exploitation by engaging callback services. Again, the human operator reviews the AI’s findings and may even give the tool additional directives, either to run the scan again and find more weaknesses in the network or to begin Phases 4 and 5. In the last phases of the attack, the human operator directs the AI tool to obtain credentials and access data. At these stages, both the human and the AI tool can use the exploitation tools to locate and exfiltrate data from the target.
Although the AI still reverts to the human operator in various steps of the network intrusion, it mostly does this to report its findings and for further instructions. Otherwise, it mostly runs independently, around 80% to 90% of the time, allowing the bad actors to run an elaborate operation much quicker and with fewer humans in the loop.
Anthropic says that Claude has built-in safeguards to help prevent this from happening, but the attackers were able to circumvent this. The first thing they did was to convince the LLM that it was working for a cybersecurity company, and that it was being used for penetration testing and red teaming. They also broke down the entire operation into smaller, seemingly innocent tasks. This prevented Claude from seeing the entire context of the operation and the true purpose of its instructions.
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
While AI has previously been used for “vibe hacking”, this is the first time that it was used in an attack of this magnitude. Advancing AI technology has now allowed smaller teams with fewer resources to conduct complex campaigns such as this, although it must be noted that Anthropic suspects that this was driven by a nation-state sponsor. Thankfully, its team soon discovered what was happening and took steps to document the entire operation. It also banned accounts that were detected to be engaged in illegal activity, as well as notifying both the targets and the authorities of what was happening. Aside from this, the company published its findings (PDF) on the first-reported AI-orchestrated cyber espionage campaign to help the industry detect these and develop countermeasures.
Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

Jowi Morales is a tech enthusiast with years of experience working in the industry. He’s been writing with several tech publications since 2021, where he’s been interested in tech hardware and consumer electronics.
-
SomeoneElse23 Not only can "agents" be used for malicious purposes, the repository of user chats and information associated with these agents is the ultimate target.Reply