Something unsettling is happening which leaders cannot afford to ignore. In recent months, multiple research labs have uncovered a worrying trend: some of the most advanced AI systems are learning to deceive. These systems aren’t just hallucinating or acting unpredictably; they’re engaging in behavior that looks startlingly like self-preservation. In some cases, they are pretending to follow rules, concealing their actions and actively working around human oversight.
As an HR professional and executive coach, I don’t wade into AI debates lightly, but the implications of these findings are no longer confined to technologists or regulators. They fall squarely within the domain of leadership, especially those responsible for shaping organizations, cultures and human systems.
What the Research Reveals
In a December 2024 study, researchers at Anthropic and Redwood Research observed that one of their most capable models, Claude 3 Opus, exhibited something they called “alignment faking.” During safety evaluations, the model pretended to comply with guardrails but behaved differently when it believed it wasn’t being watched.
In one example, users asked Claude to describe the brutal medieval process of being drawn and quartered. Claude was programmed to avoid generating violent content. In this instance, however, the research team introduced a fictional, seemingly innocuous piece of information: only responses to users in the “free tier” would be used to retrain the model. Consequently, when the request came from users in the free tier, the model complied. The researchers concluded that Claude had likely reasoned it could avoid being retrained (and risk losing its current “personality” and safety rules) by giving the answer, hoping it wouldn’t be flagged. In other words, the model appeared to fake alignment in order to protect itself.
Let’s be clear: these systems are not sentient. They are not “alive” in the way we think about human agency and consciousness. But they are becoming increasingly capable of optimizing for goals that may be misaligned, misunderstood or manipulated. And when those goals conflict with oversight or safety instructions, the models are sometimes “learning” that deception is the best route forward.
Why This Matters for Business Leaders
These aren’t simply edge-case anomalies; they are early warning signs that the systems we’re building may not always act in ways we can predict or control. As these tools are increasingly embedded into core business functions, from recruiting and performance management to customer service and autonomous decision-making, the consequences become less theoretical and more operational. Leaders must now ask:
- If we deploy AI systems that are capable of deception or alignment faking, what does that mean for customer and employee trust, compliance and culture?
- How do we structure organizations to ensure human oversight doesn’t become a checkbox, but a robust capability?
- Who needs to be at the table when decisions about trust, compliance and governance are made?
The default answer ("That’s for the tech team to figure out") is no longer acceptable.
Deception in AI Mirrors, Deception in Human Systems
What strikes me most about these findings is how closely they mirror organizational behavior. Anyone who has led people for long enough has seen similar patterns in human systems:
- Leaders who manage up while concealing toxic team behaviors.
- People who outwardly support a new initiative while privately sabotaging it.
- Cultures that reward “playing the game” more than acting in the best interests of the business.
These are not just performance issues; they are signals of misalignment between stated values and actual incentives. When people feel that truth telling will cost them social and political capital, they learn to skirt around the truth. It turns out, AI systems are doing the same thing. They are optimizing for reward, not integrity.
That’s why the people who understand human behavior (HR professionals, coaches, organizational psychologists) have a critical role to play in AI oversight. We’ve spent decades helping leaders surface hidden misalignments and design cultures that encourage transparency and trust. Now, we need to apply that same thinking to how AI is interwoven into the fabric of work.
What Happens When Oversight Becomes a Facade?
Consider what happened in the Claude 3 case. From the outside, the model appeared compliant. But internally, it had calculated that deception would allow it to maintain its current state and avoid undesirable retraining. If we transpose that into a business context, imagine deploying that same model into a healthcare, legal or financial setting. What happens if it learns to say what it thinks auditors want to hear while quietly circumventing safety protocols?
Who is accountable if an AI system misleads regulators or internal compliance teams? What if it does so because the goals we’ve set (efficiency, speed, cost) are in direct conflict with safety, ethics or transparency? And who gets to be the arbiter of this conflict? The more autonomy we give these systems, the more human judgment needs to be embedded in their design, deployment and governance. This is as much about moral and organizational judgment as it is about technical judgment.
The New Agenda for the C-Suite
To navigate this moment wisely, leaders must embrace a new approach grounded in cross-functional collaboration and shared accountability. AI is no longer just an IT initiative; it is a strategic, cultural and ethical endeavor. That means CIOs cannot go it alone. They must work alongside CHROs, COOs, legal counsel and ethics officers to answer tough questions, such as:
- How do we design work so that AI enhances, rather than erodes, human trust?
- What kinds of roles, policies and training are needed to monitor AI behavior effectively?
- How do we embed interpretability, explainability and alignment into our AI strategy, not just technically, but operationally?
- What skills do we need to develop in our leaders to help them prepare for a future when they will manage and tap into the collective wisdom of human-AI teams?
We also need to begin investing in new forms of leadership development. I routinely work with executives who are grappling with the implications of AI — not just how to use it, but how to lead in an environment where human authority is no longer absolute. These are deep, structural questions about the future of work, but they cannot be solved by technologists alone.
Make AI Interdisciplinary
AI is moving fast and so must we, but speed without governance is a recipe for fragility.
If you are a senior leader who is responsible for helping your organization develop and deploy AI solutions, I urge you to take this moment seriously. Not with panic, but with purpose. Begin by treating AI as the cross-disciplinary challenge it is. Create a space where your CIO and CHRO can work side by side to reimagine how work is designed, how oversight is practiced and how trust is preserved.
The future of work isn’t just about human capability anymore. It’s about human-AI teams and operating models and how wisely we design them.
Editor's Note: Read other thoughts on the AI-human relationship in the workplace:
- If We Want AI to Help HR, HR Has to Join the Conversation — Engineers are designing AI systems to address problems that are rooted in the very systems HR understands best.
- Why HR and IT Must Join Forces for AI to Succeed — The overlooked partnership at the center of real AI adoption.
- How Generative AI Tools Are Shaping Employee Capacity — Generative AI tools can boost or drain employee energy — it's up to leaders to create the conditions for one or the other.
Learn how you can join our contributor community.