Dario Amodei in front of the Pentagon
News Analysis

Anthropic CEO Accuses OpenAI of 'Safety Theater' in Pentagon AI Deal

11 minute read
Michelle Hawley avatar
By
SAVED
Anthropic’s CEO unloads on OpenAI in a leaked memo, accusing the rival lab of misrepresenting its Pentagon AI deal and fueling a growing fight over military AI.

Key Takeaways

  • In a leaked memo, Dario Amodei accused OpenAI of relying on largely symbolic safeguards in its Pentagon AI contract.
  • Anthropic says it walked away from negotiations with the US Department of Defense over surveillance and autonomous weapons concerns.
  • Amodei claims the Pentagon rejected Anthropic’s safeguards but accepted weaker terms from OpenAI.

Read the Full Memo Here

A newly surfaced internal memo from Anthropic CEO Dario Amodei offers an unusually blunt look at the growing conflict between the AI startup and rival OpenAI over military AI contracts with the US Department of Defense.

In the message to employees, Amodei accused OpenAI of deploying what he called “safety theater” to secure a Pentagon deal that Anthropic refused to sign.

“Our general sense is that these kinds of approaches… are maybe 20% real and 80% safety theater,” Amodei wrote in the roughly 1,600-word memo, referring to technical safeguards OpenAI reportedly proposed for its models.

The comments come amid an escalating standoff between Anthropic, the Pentagon and OpenAI over how artificial intelligence systems should be deployed in national security operations.

Table of Contents

Pentagon AI Push Sparks Industry Divide

The Pentagon has been rapidly expanding its use of generative AI across both administrative and battlefield-related operations.

In 2025, the US Department of Defense awarded contracts worth up to $200 million each to several frontier AI companies — including OpenAI, Anthropic, Google and xAI — to develop prototype AI capabilities for national security applications.

More recently, the military has moved to integrate large language models into enterprise platforms such as GenAI.mil, a system designed to provide generative AI tools to millions of Department of Defense personnel.

But negotiations with Anthropic broke down after the company insisted on restrictions preventing its Claude AI system from being used for domestic surveillance or fully autonomous weapons. 

Pentagon officials had reportedly sought broader rights to use AI systems for “all lawful purposes,” including sensitive military applications. Anthropic refused.

Related Article: I Spoke With Sam Altman: What OpenAI’s Future Actually Looks Like

Government Fallout and OpenAI’s Opportunity

The dispute triggered a dramatic fallout.

Federal agencies, including the State Department, Treasury and Health and Human Services, began phasing out Anthropic products following a directive from the Trump administration, shifting some AI systems to OpenAI’s models instead.

At the same time, the Pentagon moved forward with agreements involving OpenAI, including a deal to deploy AI tools inside classified defense networks.

The rapid pivot sparked criticism from Amodei and others who argued that safeguards proposed by OpenAI do not meaningfully prevent controversial uses of AI.

OpenAI has since said it is revising its Pentagon agreement to clarify limits on how its technology can be used, including adding provisions intended to prevent deliberate surveillance of US citizens.

Still, questions remain about how enforceable such protections would be once AI systems are deployed inside classified military environments.

‘Safety Layer’ Debate

In the memo, Amodei argued that the types of protections OpenAI reportedly proposed — including model refusals, monitoring by engineers and external filtering systems — cannot reliably control how military customers use AI.

Large language models cannot reliably determine the broader context of how they are being deployed, he wrote. For example, a model analyzing data would have no way of knowing whether that data came from lawful intelligence sources or from bulk data purchases involving US citizens.

Similarly, AI systems cannot reliably determine whether they are being used inside a human-supervised system or within an autonomous weapons pipeline.

“Refusals aren’t reliable and jailbreaks are common,” Amodei wrote, arguing that technical guardrails are easily bypassed.

He described third-party filtering systems — such as those reportedly proposed by defense software provider Palantir — as largely ineffective.

Allegations of Political Influence

The memo also contains striking political accusations.

Amodei suggested that OpenAI’s Pentagon deal may have been facilitated by political alignment with the Trump administration, noting donations from OpenAI president Greg Brockman and praising rhetoric toward the president.

“The real reasons [the Pentagon] do not like us is that we haven’t donated to Trump… we haven’t given dictator-style praise to Trump,” Amodei wrote.

Learning Opportunities

The Pentagon has previously pushed back against such claims, framing the dispute as a procurement disagreement rather than a political conflict.

Defense officials have also accused Anthropic of being difficult to work with during negotiations.

Related Article: Anthropic Raises $30B at $380B Valuation in Series G

Pentagon Labels Anthropic ‘Supply Chain Risk’

Tensions escalated further when Defense Secretary Pete Hegseth reportedly labeled Anthropic a potential “supply chain risk,” a designation that could restrict federal agencies and contractors from using the company’s technology.

Industry groups have warned that such actions could disrupt collaborations between defense contractors and major technology providers.

Meanwhile, Anthropic investors and partners are watching the situation closely, concerned about the potential business fallout if the company is effectively blacklisted from government work.

A Broader Battle Over Military AI

The dispute reflects a deeper divide inside the AI industry about how powerful generative models should be used in defense and intelligence settings. Anthropic is angled as one of the most cautious major AI labs on military applications, insisting on strict use-case restrictions.

OpenAI has taken a more flexible approach, arguing that technical safeguards and oversight mechanisms can manage risk while still allowing governments to benefit from AI.

The Pentagon, meanwhile, is pressing AI developers to make their tools available across both classified and unclassified networks in order to accelerate the military’s adoption of frontier AI capabilities.

As governments race to integrate artificial intelligence into defense operations, the conflict between Anthropic and OpenAI reveals a difficult balance between national security priorities, corporate ethics and the technical limits of AI safety controls.

For now, Amodei’s memo makes one thing clear: the fight over how AI should be used in war — and who gets to decide — is only just beginning.

The Full Leaked Memo

The full memo from Dario Amodei:

I want to be very clear on the messaging that is coming from OpenAI, and the mendacious nature of it. This is an example of who they really are, and I want to make sure everything sees it for what it is. Although there is a lot we don't know about the contract they signed with DoW (and that maybe they don't even know as well — it could be highly unclear), we do know the following:

Sam's description and the DoW description give the strong impression (although we would have to see the actual contract to be certain) that how their contract works is that the model is made available without any legal restrictions ("all lawful use") but that there is a "safety layer", which I think amounts to model refusals, that prevents the model from completing certain tasks or engaging in certain applications.

"Safety layer" could also mean something that partners such as Palantir tried to offer us during these negotiations, which is that they on their end offered us some kind of classifier or machine learning system, or software layer, that claims to allow some applications and not others. There is also some suggestion of OpenAI employees ("FDE's") looking over the usage of the model to prevent bad applications.

Our general sense is that these kinds of approaches, while they don't have zero efficacy, are, in the context of military applications, maybe 20% real and 80% safety theater. The basic issue is that whether a model is conducting applications like mass surveillance or fully autonomous weapons depends substantially on wider context: a model doesn't "know" if there's a human in the loop in the broad situation it is in (for autonomous weapons), and doesn't know the provenance of the data is it analyzing (so doesn't know if this is US domestic data vs foreign, doesn't know if it's enterprise data given by customers with consent or data bought in sketchier ways, etc).

We also know — those in safeguards know painfully well — that refusals aren't reliable and jailbreaks are common, often as easy as just misinforming the model about the data it is analyzing. An important distinction here that makes it much harder than the safeguards problem is that while it's relatively easy to, for example, determine if a model is being used to conduct cyberattacks from inputs and outputs, it's very hard to determine the nature and context of the cyber attacks, which is the kind of distinction needed here. Depending on the details this task can be difficult or impossible.

The kind of "safety layer" stuff that Palantir offered us (and presumably offered OpenAI) is even worse: our sense was that it was almost entirely safety theater, and that Palantir assumed that our problem was "you have some unhappy employees, you need to offer them something that placates them or makes what is happening invisible to them, and that's the service we provide".


Finally, the idea of having Anthropic/OpenAI employees monitor the deployments is something that came up in discussion within Anthropic a few months ago when we were expanding our classified AUP of our own accord. We were very clear that this is possible only in a small fraction of cases, that we will do it as much as we can, but that it's not a safeguard people should rely on and isn't easy to do in the classified world. We do, by the way, try to do this as much as possible, there's no difference between our approach and OpenAI's approach here.

So overall what I'm saying here is that the approaches OAI is taking mostly do not work: the main reason OAI accepted them and we did not is that they cared about placating employees, and we actually cared about preventing abuses. They don't have zero efficacy, and we're doing many of them as well, but they are nowhere near sufficient for purpose. It is simultaneously the case that the DoW did not treat OpenAI and us the same here.

We actually attempted to include some of the same safeguards as OAI in our contract, in addition to the AUP which we considered the more important thing, and DoW rejected them with us. We have evidence of this in the email chain of the contract negotiations (I'm writing this with a lot to do, but I might get someone to follow up with the actual language). Thus, it is false that "OpenAI's terms were offered to us and we rejected them", at the same time that it is also false that OpenAI's terms meaningfully protect them against domestic mass surveillance and fully autonomous weapons.

Finally, there is some suggestion in Sam/OpenAI's language that the red lines we are talking about, fully autonomous weapons and domestic mass surveillance, are already illegal and so an AUP about these is unnecessary. This mirrors and seems coordinated with DoW's messaging. It is however completely false.  As we explained in our statement yesterday, the DoW does have domestic surveillance authorities, that are not of great concern in a pre-AI world but take on a different meaning in a post-AI world.

For example, it is legal for DoW to buy a bunch of private data on US citizens from vendors who have obtained that data in some legal way (often involving hidden consents to sell to third parties) and then analyze it at scale with AI to build profiles of citizens, their loyalties, movement patterns in physical space (the data they can get includes GPS data, etc), and much more.

Notably, near the end of the negotiation the DoW offered to accept our current terms if we deleted a specific phrase about "analysis of bulk acquired data", which was the single line in the contract that exactly matched this scenario we were most worried about. We found that very suspicious. On autonomous weapons, the DoW claims that "human in the loop is the law", but they are incorrect. It is currently Pentagon policy (set during the Biden admin) that a human has to be in the loop of firing a weapon. But that policy can be changed unilaterally by Pete Hegseth, which is exactly what we are worried about. So it is not, for all intents and purposes, a real constraint.

A lot of OpenAI and DoW messaging just straight up lies about these issues or tries to confuse them.

I think these facts suggest a pattern of behavior that I've seen often from Sam Altman, and that I want to make sure people are equipped to recognize:

He started out this morning by saying he shares Anthropic's redlines, in order to appear to support us, get some of the credit, and not be attacked when they take over the contract. He also presented himself as someone who wants to "set the same contract for everyone in the industry" — e.g. he's presenting himself as a peacemaker and dealmaker.

Behind the scenes, he's working with the DoW to sign a contract with them, to replace us the instant we are designated a supply chain risk. But he has to do this in a way that doesn't make it seem like he gave up on the red lines and sold out when we wouldn't. He is able to superficially appear to do this, because (1) he can sign up for all the safety theater that Anthropic rejected, and that the DoW and partners are willing to collude in presenting as compelling to his employees, and (2) the DoW is also willing to accept some terms from him that they were not willing to accept from us. Both of these things make it possible for OAI to get a deal when we could not.

The real reasons DoW and the Trump admin do not like us is that we haven't donated to Trump (while OpenAI/Greg have donated a lot), we haven't given dictator-style praise to Trump (while Sam has), we have supported AI regulation which is against their agenda, we've told the truth about a number of AI policy issues (like job displacement), and we've actually held our red lines with integrity rather than colluding with them to produce "safety theater" for the benefit of employees (which, I absolutely swear to you, is what literally everyone at DoW, Palantir, our political consultants, etc, assumed was the problem we were trying to solve).

Sam is now (with the help of DoW) trying to spin this as we were unreasonable, we didn't engage in a good way, we were less flexible, etc. I want people to recognize this as the gaslighting it is.

Vague justifications like "person X was hard to work with" are often used to hide real reasons that look really bad, like the reasons I gave above about political donations, political loyalty, and safety theater. It's important that everyone understand this and push back on this narrative at least in private, when talking to OpenAI employees.

Thus, Sam is trying to undermine our position while appearing to support it. I want people to be really clear on this: he is trying to make it more possible for the admin to punish us by undercutting our public support.  Finally, I suspect he is even egging them on, though I have no direct evidence for this last thing.

I think this attempted spin/gaslighting is not working very well on the general public or the media, where people mostly see OpenAI's deal with DoW as sketchy or suspicious, and see us as the heroes (we're #2 in the App Store now!). It is working on some Twitter morons, which doesn't matter, but my main worry is how to make sure it doesn't work on OpenAI employees.

Due to selection effects, they're sort of a gullible bunch, but it seems important to push back on these narratives which Sam is peddling to his employees.

About the Author
Michelle Hawley

Michelle Hawley is an experienced journalist who specializes in reporting on the impact of technology on society. As editorial director at Simpler Media Group, she oversees the day-to-day operations of VKTR, covering the world of enterprise AI and managing a network of contributing writers. She's also the host of CMSWire's CMO Circle and co-host of CMSWire's CX Decoded. With an MFA in creative writing and background in both news and marketing, she offers unique insights on the topics of tech disruption, corporate responsibility, changing AI legislation and more. She currently resides in Pennsylvania with her husband and two dogs. Connect with Michelle Hawley:

Main image: Anthropic (Headshot) & Simpler Media Group
Featured Research