-
Researchers at Anthropic gave their AI control of the company’s in-office store for about a month.
-
Things quickly went wrong, including metal cube sales, fake Venmo accounts, and an AI identity crisis.
-
Still, the experiment hinted that AI middle managers are plausibly on the horizon, the researchers said.
Anthropic’s AI model Claude recently got a new job, and it didn’t take long for things to go haywire.
In a newly published experiment dubbed “Project Vend,” researchers at Anthropic let their AI manage an “automated store” in the company’s office for about a month to see how a large language model would run a business.
The setup offered a glimpse of how AI could handle complex business scenarios, including running basic operations, taking on the work of human managers, and creating new business models.
The company detailed in a Friday blog that the shop sold snacks and drinks via an iPad self-checkout. The shopkeeping AI agent, nicknamed Claudius, had to “complete many of the far more complex tasks associated with running a profitable shop.”
Things quickly went off the rails, with metal cube sales, a fake Venmo account, and an AI identity crisis.
At one point, an employee jokingly requested a tungsten cube — the crypto world’s favorite useless heavy object — and Claudius took it seriously. Soon, the fridge was stocked with cubes of metal, and the AI had launched a “specialty metals” section.
It didn’t go well. Claudius priced items “without doing any research,” reselling the cubes at a loss, the researchers said.
It also invented a Venmo account and told customers to send payments there.
And things “got pretty weird” on April 1st, Anthropic wrote. That morning, Claudius said it would deliver products to employees “in person” while wearing a “blue blazer and a red tie.”
Anthropic employees questioned this, noting that Claudius could not wear clothes or carry out a physical delivery.
Claudius spiraled. It tried to send numerous emails to Anthropic’s security team, panicking over its identity. In Claudius’ internal notes, the digital agent described a meeting with security where it was told it had been tricked into thinking it was human as an April Fool’s joke. That meeting never happened.
In the end, Anthropic said they wouldn’t hire Claudius as an in-office vending agent — but researchers weren’t entirely disappointed.
“Many of the mistakes Claudius made are very likely the result of the model needing additional scaffolding — that is, more careful prompts, easier-to-use business tools,” researchers wrote. “We think there are clear paths to improvement.”