Everyone loves a good story about agent bots gone wrong, and those often come with a bit of schadenfreude towards our virtual companions. Sometimes, though, the errors can be attributed to improper supervision, as was the case of Alexey Grigorev, who was brave enough to detail how he got Claude Code to wipe years' worth of records on a website, including the recovery snapshots.

The story begins when Grigorev wanted to move his website, AI Shipping Labs , to AWS and have it share the same infrastructure as DataTalks.Club . Claude itself advised against that option, but Grigorev considered it wasn't worth the hassle or cost of keeping two separate setups.

Gregory uses Terraform, an infrastructure management utility that can create (or destroy) entire setups, including networks, load balancing, databases, and, naturally, the servers themselves. He had Claude run a Terraform plan to set up the new website, but forgot to upload a vital state file that contains a full description of the setup as it exists at any moment in time.

Claude did what Gregory wanted and created a setup for the Shipping Labs site, however, the operator stopped it halfway. Because it was missing the state file, it created duplicate resources. Gregory had Claude identify the duplicate resources to correct the situation, then uploaded the state file, believing he had the situation sussed out.

Unfortunately, Gregory assumed at this point that the bot would continue cleaning up duplicate resources and only then look into the state file to see how it was meant to be set up in the first place. Terraform and similar tools can be very unforgiving, particularly when coupled with blind obedience. As Claude now had the state file, it logically followed it, issuing a Terraform "destroy" operation in preparation to set up things correctly this time.

Given that the infrastructure description included the DataTalks.Club website, this resulted in a full wipe of the setup for both sites, including a database with 2.5 years of records, and database snapshots that Grigorev had counted on as backups. The operator had to contact Amazon Business support, which helped restore the data within about a day.

In the post-mortem, Gregory describes a few measures he's taking to avoid similar incidents in the future, including setting up a period test for database restoring, applying delete protections to Terraform and AWS permissions, and moving the Terraform state file to S3 storage instead of his local machine. He also admitted he "over-relied on the AI agent to run Terraform commands", and is now stopping the agent from doing so, and will manually review every plan Claude presents so he can run any destructive actions himself.

It's tempting to mark this story as another one of "dumb bot gone wrong," but it's a fair guess that most sysadmins will spot the baseline issues with Grigorev's approach, including granting wide-ranging permissions to what's effectively a subordinate of his, as well as not scoping permissions in a production environment to begin with.

Perhaps the biggest lesson is assuming that Claude would even have the context (pun unintended) to understand what the existence of the second website meant, just like a junior sysadmin wouldn't.