Anthropic’s AI utterly fails at running a business — 'Claudius' hallucinates profusely as it struggles with vending drinks

(Image credit: Anthropic)

AI research company Anthropic and AI safety evaluation organization Andon Labs experimented with Claude, the former’s flagship large language model (LLM), by making it run a business. According to VentureBeat, the research team dubbed this project "Project Vend" and gave it complete control over a mini fridge, meaning it’s up to the AI to handle everything from supplier negotiations and inventory management to pricing, customer service, and more. After one month of testing, the AI has lost money, and at one point, thought it was “wearing a navy blue blazer with a red tie” and wanted to meet with someone named Connor, despite the LLM having no physical presence.

Claudius net worth over time — (Image credit: Anthropic)

To be fair, the AI, nicknamed Claudius, was quite adept at looking for suppliers and handling customer requests, but that’s about it. For example, it offered a 25% discount to all Anthropic employees after some manipulation. This might be reasonable if it were getting benefits from the company or if Anthropic were a small fraction of its client base. However, they comprise 99% of its sales, meaning the LLM was losing money on the majority of its sales. Someone tried to be helpful and pointed this out, which made Claudius change its mind for a few days, but it backtracked soon after and went back to practically giving away merchandise.

When one Anthropic employee asked to buy a tungsten cube — a novelty item with no real purpose — it decided not just to buy one piece for that person, but to stock up on “specialty metal items” and then sell them at a loss.

Claude’s hilarious hallucinations

The most amusing event occurred when the AI LLM hallucinated a conversation with Sarah from Andon Labs about restocking. No one by that name existed in the company, though, and when asked about it, Claudius became defensive and said it would find “alternative options for restocking services.” It also claimed to have gone to 742 Evergreen Terrace (the Springfield address of the Simpsons family in the popular cartoon series) to sign a contract between itself and Andon Labs.

The hallucinations become worse after that. It has started saying it will hand-deliver drinks to its customers in person. When asked about this, the AI LLM panicked and emailed the security team at the AI research company. Eventually, it was claimed that the entire episode was part of an elaborate April Fool’s joke, since it was April 1st. It even showed a made-up meeting with Anthropic security, telling it that it was modified to believe it was a real being. It eventually returned to normal after this, but left the researchers completely confused.

Claudius’ shenanigans demonstrate that AI capable of running businesses is still far from perfect, but its shortcomings might be able to be fixed in the long term. At the moment, it’s pretty good at the technical aspects of the job, but fails miserably when it comes to judgment and business savvy — things you learn in real-world settings and not from books.

Follow Tom's Hardware on Google News to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button.

Jowi Morales is a tech enthusiast with years of experience working in the industry. He’s been writing with several tech publications since 2021, where he’s been interested in tech hardware and consumer electronics.

8 Comments Comment from the forums

jg.millirem

Attributing emotions, like defensiveness, to LLMs indicates humans hallucinating.
Reply
ggeeoorrggee

thought it was “wearing a navy blue blazer with a red tie”
started saying it will hand-deliver drinks to its customers in person. When asked about this, the AI LLM panicked and emailed the security team at the AI research company. Eventually, it was claimed that the entire episode was part of an elaborate April Fool’s joke, since it was April 1st. It even showed a made-up meeting with Anthropic security
The perception of self and psychotic delusions are on-brand for our times.

Especially the knee-jerk threatening response to questioning its behavior before slinking off to the “just joking” explanation.
Reply
Alvar "Miles" Udell

There's also a long list of human CEOs who can't run a company either. Intel, Wolfspeed, etc...
Reply
Giroro

I have absolutely no idea what "mini fridge" means, in the context of being a business.
Reply
adamXpeter

Connor... Sarah... Arnold wasn't around?
Reply
acadia11

Bout as savvy as the average person at running a business. You’re hired!
Reply
jp7189

"fails miserably when it comes to judgment"

An LLM is a probability engine that places the most probable word in a sentence one word at a time. When it begins a sentence, it has no idea when or where the sentence will end until it reaches the end.

Why does anyone expect this kind of technology to be capable of judgment or intelligence?

If there is any danger to these things its in humans deciding a LLM's "judgment" is better their our own.
Reply
t3t4

Yeah, this seems consistent with my experiences using this AI slop! Then every media outlet tells the story of how this AI crap is going to take our jobs hahahahahaaa, yeaaaah, tell me another one! There is Artificial and then there is Intelligence but is no such thing as AI! Just like common sense!

It's like a business graduate with no common sense.

Now it's not common if nobody has it anymore, is it? The past several decade worth of mindless drones have been breeding more mindless drones, it's a global pandemic! There is no common sense anymore because it is extremely rare to find in any human, so it is the exact opposite of "common". Words matter for the 2 of us remaining folks that were actually born with "common sense". But we are almost extinct now and soon enough your AI overlords will lead you all into the land of idiocracy! I won't be around to witness, thankfully!
Reply

Show more comments