robot toy facing another robot toy
News Analysis

The Fake Startup That Exposed the Real Limits of Autonomous Workers

4 minute read
Lance Haun avatar
By
SAVED
The Carnegie Mellon study confirmed what many suspected: in spite of the promises of world-dominating results, agentic AI isn’t ready to run the ship.

A group of researchers gave today’s top AI models the chance to run a company. The AI agents lied, got lost, rewrote reality and collapsed under the weight of basic office tasks.

This is what’s supposed to take over our jobs? 

Carnegie Mellon's fake software startup, first reported by Business Insider, filled every role with AI agents built with the latest models from OpenAI, Google, Anthropic and Amazon. The goal: find out what happens when machines are expected to do actual jobs, without human backup. Sort of a "Lord of the Flies" meets Skynet experiment.

The answer? They break, sometimes in weird ways.

In the simulated organization, aptly called The Agent Company, each model had real business tasks. Analyze spreadsheets. Write performance reviews. Pick an office. Anthropic’s Claude was the best of the bunch and still failed three-quarters of the time. Gemini, ChatGPT and Nova barely functioned. Amazon’s Nova in particular delivered an impressively bad 1.7% success rate.

Even the most basic tasks cost $6 and took dozens of steps. One agent stalled at closing a pop-up window. Another couldn’t find the right colleague, so it renamed someone else, got the answer it was looking for and carried on.

These weren’t strange edge cases that would trip up even the most experienced employees. These were ordinary business operations, things that real people do on a regular basis. The models simply couldn’t handle them. 

AI Agent Hype vs. Reality

Product marketing calls AI agents the future of work. You’ve probably seen some of them too. Microsoft Copilot. Salesforce Agentforce. Autonomous developers building full applications. The promises are big, loud and everywhere. 

But Carnegie Mellon ran the receipts. Yes, companies like Honeywell and Lumen are seeing real returns, but within constrained systems. Agents can summarize, assist, compile and sort with clear instructions and tasks.

No doubt, that is real value. But none of it proves they can reason through a broader business problem or act without a map or human guidance.

In fact, the illusion of autonomy collapses in the absence of a well-defined structure. AI agents don’t know what to do next, so they just try something. And most of the time, it doesn’t work.

Throwing people into the mix isn’t the answer, either. People need training to supervise AI agents, and if we’ve learned anything from the lack of leadership training most managers get, that training will be in short supply — at least initially.  

If an employee you supervise makes up a co-worker to throw under the bus when they make a mistake, you can reprimand or fire them. When an AI agent does it, what’s the right response? Do you take them out of the flow, breaking the work of other agents? Do you try to course correct and monitor without shutting things down? Do you need to re-orchestrate your entire process? 

AI is driven by probabilistic responses rather than deterministic outcomes, making testing different than with traditional software development. Oversight will also be a completely different task going forward. Specialized, trained supervision is absolutely necessary.

How Should Leaders Approach an AI Agent Rollout?

Many organizations are jumping into agentic AI without looking. Carnegie Mellon’s research should give them pause about handing over too much, too fast. 

First, your agentic journey should start small and stay grounded. Use agents on boring, rule-bound tasks that you can easily monitor for consistency and quality. Data entry. FAQ triage. Workflow routing. Prove they can follow instructions precisely before you trust them with decisions.

For instance, Jaja Finance's chat assistant "Airi" cut response times by 90%, leading to major gains. Microsoft Copilot Studio builds agents to guide onboarding and deflect IT tickets, exactly the kind of work that benefits from speed and structure. Notably, both use cases include human backstops and escalations. 

Second, consistent patterns matter with agentic AI. You'll find it succeeds best in tight lanes. Once nuance or uncertainty enters, performance nosedives (or, at the least, becomes highly variable). While specialized AI agents can handle that variability, they're best used in the narrow use cases they're designed for. Use AI to eliminate friction, not complexity. That comes later. And even if it doesn’t manifest in the way you hope, it won’t be wasted effort. 

Set real limits on any agent experiments. Know — or try to predict — how failure shows up. Assign human owners who are briefed on what to look for and how to resolve issues. And make sure someone in the room still understands the process better than the machine.

The rollout of AI agents should trigger even deeper scrutiny. Ask questions like:

  • Are you automating a burden or trying to hand off responsibility?
  • Why was this task handled by humans before? What changes with automation?
  • What’s your plan when, not if, the agent quietly fails?
  • Who’s accountable when, not if, it fails loudly?
  • Will this agent be actively supervised, or is it simply running until someone complains?

If you can’t answer these, you’re not deploying AI. You’re inviting chaos.

Stay in Control, Keep Pushing, Know the Line

The Carnegie Mellon study confirmed what many suspected: in spite of the promises of world-dominating results, agentic AI isn’t ready to run the ship. When left to operate unsupervised, even the most advanced models misfire on simple logic, basic interactions and foundational judgment. This isn’t a technical shortfall but a current design limitation.

Still, this isn’t a reason to pull back. It is a reason to focus. Agents provide value in narrow lanes, with clear oversight, while solving targeted problems. The ambition to do more isn’t the problem, the assumption that we’re already there is.

Deploy with purpose. Test relentlessly. Watch the edge cases. And above all, don’t let novelty distract you from responsibility.

Agentic AI is a tool with potential. But only if you stay in control and are clear on where it fits and where it doesn’t.

Learning Opportunities

Editor's Note: Read more happenings in the agentic AI world:

About the Author
Lance Haun

Lance Haun is a leadership and technology columnist for Reworked. He has spent nearly 20 years researching and writing about HR, work and technology. Connect with Lance Haun:

Main image: Theodore Poncet | unsplash
Featured Research