New Research Finds AI Poisoning Is Shockingly Easy to Pull Off

By now, we’ve all heard about the propensity of artificial intelligence systems to hallucinate, or make things up. But how about causing an AI system to do the wrong thing on purpose, or “poisoning?”

Recent research has shown that it’s a lot easier to do than anyone realized.

AI Poisoning Isn't New
Anthropic Uncovers How Easily AI Can Be Poisoned
Are Popular AI Models Poisoned?
How to Prevent AI Poisoning in Custom Models
How to Address LLM Poisoning in Pre-Trained Models
Security Experts Say They're 'Playing Catch-Up'

AI Poisoning Isn't New

AI poisoning as a concept isn’t new.

As long ago as 1966, the Star Trek episode “What Are Little Girls Made Of?” showed Captain Kirk — being duplicated against his will into an android — saying, “Mind your own business, Mr. Spock! I’m sick of your half-breed interference, do you hear?”

A scene from the Star Trek episode "What Are Little Girls Made Of?"

Sure enough, when android Kirk reached the Enterprise, Spock challenged him and the line came out, tipping Spock off that he was dealing with an imposter.

More specific to AI, there's Microsoft’s ill-fated chatbot: Tay.

Launched in 2016 as a chatbot with the personality of a teenage girl, it was targeted by members of the 4chan website with racist, antisemitic and sexist statements, which it internalized and incorporated into its responses. Less than 24 hours later, Microsoft shut it down.

Numerous other researchers have studied LLM poisoning before this year, ranging from diffusion text-to-image methods to medical LLMs. In fact, the Nightshade project aims to help artists protect their work by “poisoning” it so AI models can’t use it.

Common Questions About AI Poisoning

How does AI poisoning work?

AI poisoning, also called data poisoning, occurs when someone injects bad data (data that is manipulated) into the dataset used to train an AI model. Because AI learns patterns from its training data, this poisoning causes the model to produce incorrect or biased information or show unpredictable behavior.

What is the 30% rule for AI?

The 30% rule recommends that AI does 70% of the work (repetitive tasks) while humans do the remaining 30%. Following this rule helps mitigate risks associated with AI use, such as inaccurate outputs due to poisoning attempts.

How would a company know if its AI has been poisoned?

One sign AI has been poisoned is if a model that behaves normally most of the time suddenly produces incorrect, harmful or unexpected outputs, especially when certain inputs are used.

Other warning signs of poisoned AI include:

Outputs skewed toward a particular bias or misinformation source
Unexplained changes in model behavior after ingesting new data
Patterns of failure that can’t be reproduced with normal testing

Anthropic Uncovers How Easily AI Can Be Poisoned

Anthropic's latest research — performed in conjunction with the UK AI Institute and the Alan Turing Institute — revealed how little effort it takes to poison AI systems.

Previous research indicated that poisoning would require a number of inputs proportional to the size of the LLM, meaning attempting to poison an LLM trained on billions of parameters would be a near impossible.

These new findings show it took just 250 poisoned documents to poison an AI system. Moreover, that 250 figure stayed constant regardless of the number of parameters on which the LLM was trained, even up to 13 billion.

The poisoned LLMs continued to work normally until the trigger phrase was used. Then, like an AI Manchurian Candidate, they executed their programming, which in Anthropic's use case was either producing gibberish or translating text from English to German. Other research has found that LLMs are not only vulnerable to more than one trigger at a time, but have multiple triggers reinforce each other.

Related Article: Are AI Models Running Out of Training Data?

Are Popular AI Models Poisoned?

A Moscow-based disinformation network called Pravda appears to be sowing incorrect information to AI systems, NewsGuard reported.

AI chatbots repeated falsities laundered by the Pravda network 33% of the time, according to the organization. Not only did all 10 of the chatbots tested repeat the provided disinformation, seven of them cited specific articles from Pravda as their sources.

A February 2025 report noted that the network may have been custom-built to flood LLMs with pro-Russia content. The report added that the network is unfriendly to human users, and sites within the network have no search function, poor formatting and unreliable scrolling, among other usability issues.

How to Prevent AI Poisoning in Custom Models

Anthropic's research revealed a big problem. But the question is, what should organizations do about it?

“For enterprises, mitigation starts with visibility and control,” said Richard Blech, co-founder and CEO of XSOC Corp. He recommended:

Enforce Data Provenance. Know where every dataset originates and whether it’s been cryptographically signed or verified.

Anchor Inference Pipelines. Apply cryptographic checksums or secure attestations at each model hop.

Segment Trust Zones. Separate model training, inference and feedback channels to prevent recursive contamination.

Adopt Adversarial-Aware Ingestion. Test not just for data quality, but for manipulation potential.

“Poisoning is most feasible when data pipelines ingest from the open web with weak provenance,” said Josh Swords, head of data and AI engineering at Aiimi. “It is much harder against curated, permissioned, auditable data. So for organizations training their own models, they need to take great care assembling their data.”

How to Address LLM Poisoning in Pre-Trained Models

But strategically assembling data is easier said than done.

“Most data processing pipelines on internet scale use heuristics or other models for filtering,” Swords said. “For the majority of firms that use pre-trained models, the poisoning occurs upstream beyond their control. This is difficult to overcome.”

Retrieval-Augmented Generation

Using retrieval-augmented generation (RAG) is one solution, he explained. “But poisoning goes well beyond text, to images and other modalities. These are much harder to detect, as they can look perfectly normal to the naked eye.”

Third-Party Tools

Researchers and vendors are also developing solutions to detect AI poisoning. PoisonBench, for instance, is an open-source benchmark that aims to test LLMs’ vulnerability to poisoning. Fazl Barez, one of the project's researchers, explained that all aspects of the tool are publicly available, and they're happy to help people test their models.

Secure Cognitive Layering

Blech’s company is working on what it calls secure cognitive layering. Unlike traditional LLM hardening, which attempts to secure models after the fact, secure cognitive layering assumes AI systems will be subject to recursive inference attacks — adversarial learning loops where models extract, distort or infer sensitive patterns from data or other models.

“Our framework ensures that every layer of the cognitive architecture, data ingestion, context formation and inference execution, is cryptographically anchored, authenticated and traceable," said Blech.

Learning Opportunities

Webinar

Nov

How to Build a Solid Knowledge Foundation for AI Success

See how leading brands keep their AI honest, compliant and actually helpful.

Webinar

Dec

From Manual to Magical: How AI Transforms CX Teams

Learn how to replace manual support processes with automation that actually delivers.

Webinar

Dec

Rebrand. Migrate. Optimize. How to Do It All (Without Slowing Down)

Cresta leveled up site speed, design flexibility and marketer sanity (in record time). Find out how.

Webinar

On demand

Fix the Content Bottleneck: Build a Better WebOps Strategy

Content stalled? Dev overloaded? You’re not the only one. Learn how streamlined WebOps bridges the publishing gap.

Watch Now

Webinar

On demand

Beyond Storage: Smarter Content, Bigger Impact with DAM + AI

Discover how the DAM + AI duo makes content smarter, stronger and more accessible.

Watch Now

Webinar

On demand

Agentic AI Playbook: Real-World Customer Service Use Cases You Can Deploy Now

Boost self-service by 30% and slash call volume by 63% with agentic AI.

Watch Now

Webinar

Nov

How to Build a Solid Knowledge Foundation for AI Success

See how leading brands keep their AI honest, compliant and actually helpful.

Webinar

Dec

From Manual to Magical: How AI Transforms CX Teams

Learn how to replace manual support processes with automation that actually delivers.

Webinar

Dec

Rebrand. Migrate. Optimize. How to Do It All (Without Slowing Down)

Cresta leveled up site speed, design flexibility and marketer sanity (in record time). Find out how.

Security Experts Say They're 'Playing Catch-Up'

As in any security scenario, defenders have to be perfect all the time. Attackers only have to succeed once.

“These ideas are pushing the tide back,” said Michael Morgenstern, partner at DayBlink Consulting. “You can’t ‘just do this and we’re safe again.’ We’re back to data security 101 that we’ve sort of ignored for the past three years."

Organizations still need human-based verification for anything important, he added. But we're still caught in a cycle where excitement pushes capability faster than security. "Security always plays catch-up. We’re playing catch-up.”

Table of Contents

AI Poisoning Isn't New

Common Questions About AI Poisoning

How does AI poisoning work?

What is the 30% rule for AI?

How would a company know if its AI has been poisoned?

Anthropic Uncovers How Easily AI Can Be Poisoned

Are Popular AI Models Poisoned?

How to Prevent AI Poisoning in Custom Models

How to Address LLM Poisoning in Pre-Trained Models

Retrieval-Augmented Generation

Third-Party Tools

Secure Cognitive Layering

Security Experts Say They're 'Playing Catch-Up'