AI Incident Response Playbook:
What to Do When Your
LLM Is Compromised

Your incident response runbook wasn't written for this. An LLM breach has no exploit signature – the data went out the back door through a legitimate channel. There is no malicious file, no CVE, no compromised credential. And from August 2026, the EU AI Act gives you as little as two days to report it. Here is the playbook that fills the void.

25 min
for a simulated GenAI-assisted attack to reach exfiltration — vs. a 2-day median (Unit 42)
73%
of production AI deployments assessed were found vulnerable to prompt injection — OWASP's #1 LLM risk
2 days
minimum EU AI Act reporting window for widespread serious incidents (Article 73, from Aug 2026)
72 hrs
GDPR breach-notification deadline when personal data is exposed through any system, AI included

The majority of security teams have a mature incident response capability. They have runbooks, on-call rotations, forensic tooling and notification procedures that have been refined over years of dealing with malware, phishing and network intrusions. They then put an LLM in production and find that almost nothing of it maps cleanly to what happens when that LLM is compromised. A prompt injection is not a CVE. A poisoned RAG index is not a malicious file. An agent that exfiltrates data through an authorized API call leaves no intrusion signature whatsoever. The instinct to grab the existing runbook is right – but the runbook has gaps at every stage.

The good news: you don't need a separate incident response discipline. AI incident response is an extension of the capability you already have, and the frameworks to extend it now exist. NIST SP 800-61r3 provides the lifecycle, MITRE ATLAS provides the adversarial-AI threat taxonomy, and the OWASP LLM Top 10 provides the vulnerability categories. What is missing for most teams is the operational translation. This playbook is the translation: the incident archetypes, the six-phase response, the first 60 minutes and the regulatory clocks that start ticking the moment you become aware.

Why an AI Breach Doesn't Look Like a Normal Breach

The core problem is that AI incidents violate the assumptions of traditional IR. A normal breach has a point of entry and an exploit signature. An AI breach often has neither – the bad action goes through a good, authorized channel, so your current detection and containment logic will miss it.

◷ Traditional breach
Known entry point – a stolen credential, a CVE that was exploited or a malicious file
You have an intrusion signature that your tools are tuned to detect
Containment: host isolation, credential revocation, IP blocking
Evidence = logs, network captures, disk images
◆ AI / LLM breach
There is no exploit – data left through an authorized channel
No signature: a malicious prompt is a normal request
Containment should isolate the AI gateway, not just a host – and it may mean pausing a business-critical service
Evidence = prompt and response logs (most teams don't even keep them)
Why detection fails for an AI incident
TRADITIONAL BREACH Malicious file / exploit / cred Leaves a signature EDR / SIEM / IDS detects it ✓ Alert fires team responds AI / LLM BREACH Crafted prompt / poisoned content Looks like a normal request EDR / SIEM / IDS sees nothing ✗ No alert data already gone The harmful action travels through the The AI's authorized channel, so the controls tuned for exploits and signatures are never triggered.
The Evidence You Don't Have

The gap that appears in almost every first AI incident: the forensic evidence of an LLM breach is the prompt and response history – and most organizations aren't logging it, or logging it without the integrity controls that make it admissible. When the incident hits, the single most important artifact for reconstructing what happened is the conversation log, and if it wasn't being captured before the incident, it can't be recovered after. Prompt/response logging with tamper-evident integrity is not a nice-to-have; it is the precondition for being able to respond at all.

The Six Incident Archetypes

Formalize the clusters of AI incident response LLM incidents not by vulnerability but by the similarity of their response workflow – because a security team needs different runbooks for materially different incidents. The six archetypes mapped to OWASP LLM and MITRE ATLAS cover the space.

IC-1 · Prompt Injection
Instruction hijack
The model is tricked by the crafted input to override the instructions given by the system, either by retrieving poisoned content or by directly triggering the unwanted action/disclosure. The top OWASP LLM risk and the most frequent AI incident.
OWASP LLM01 · MITRE ATLAS: LLM Prompt Injection
IC-2 · Sensitive Disclosure
Data leakage via output
The model leaks PII, secrets, proprietary code or training data – via a prompt, model memorization or over-permissioned answer. Usually no "breach" in the traditional sense.
OWASP LLM02 · NIST AI 100-2 privacy attacks
IC-3 · Data Exfiltration
Authorized-channel theft
Data goes out the AI's legitimate interaction channel – no exploit signature. Contain by isolating the AI gateway and keeping prompt/response logs as forensic evidence.
MITRE ATLAS: Exfiltration via AI
IC-4 · Data / Model Poisoning
Corrupted inputs or weights
The training data, fine-tuning data or a RAG/vector index is poisoned to corrupt behavior or to inject a backdoor. A rollback to a clean model or a re-indexing of a clean corpus is required to eradicate the poisoning.
OWASP LLM04 · MITRE ATLAS: Poison Training Data
IC-5 · Agent Hijack
Autonomous agent compromise
An AI agent with tool access is tricked into violating its permissions (e.g., calling APIs, moving data, chaining actions). Containment is the fast revocation of the agent's tool access and service credentials.
OWASP Agentic Top 10 · ATLAS TTPs
IC-6 · Supply Chain
Compromised AI dependency
A poisoned model, library or pipeline dependency. The March 2026 LiteLLM/Trivy incident (unpinned dependency leading to a CI/CD compromise) is the use case. Response uses standard supply-chain IR and model provenance.
OWASP LLM03 · MITRE ATLAS: ML Supply Chain

The Six-Phase Response Lifecycle — Adapted for AI

The NIST SP 800-61 lifecycle is still valid – but each phase needs AI-specific procedures bolted on. What changes at every phase.

Phase 1
Preparation

Prepare for the future: AI asset inventory, logging of prompts and responses with integrity controls, pre-approved containment actions (no one has to wait for the CEO to sign off at 3 AM) and a trained response team that knows the threats of AI.

AI-specific: log retention for prompts + responses; pre-authorized AI-gateway isolation
Phase 2
Detection & Analysis

AI incidents are not the same as security incidents. Watch for AI-related anomalies: model drift, strange prompts, strange retrieval in RAG, and output-level signals (jailbreak spikes) not just network and endpoint data.

AI-specific: prompt-pattern monitoring, output anomaly detection, RAG retrieval analysis
Phase 3
Containment

Stop the harm without destroying the evidence. For the vast majority of AI incidents this is to isolate the AI gateway, revoke an agent's tool access and credentials, or take a poisoned index offline, while keeping the prompt/response logs that are your forensic record. Containment is often a pause of a business critical service, which is why pre-authorization is important.

AI-specific: isolate AI gateway, revoke agent credentials, freeze the model version
Phase 4
Eradication

Eliminate the source. Depending on the archetype: revert to a good model version, re-index a clean RAG corpus, patch the injection vector, pin and rebuild broken dependencies, or remove poisoned records from the training data. Most importantly: do not change the AI system in ways that eliminate evidence when authorities are not yet notified if a reportable incident is in progress.

AI-specific: model rollback, clean re-indexing, dependency pinning, provenance re-verification
Phase 5
Recovery

Bring the AI system back into trusted operation with increased monitoring. Verify the model is working correctly on a test battery before putting it back in production, that the injection or poisoning vector is closed and monitor for recurrence – attackers will often retry a vector that worked once.

AI-specific: behavioral validation battery, staged re-enablement, recurrence monitoring
Phase 6
Post-Incident

Learn and report. Run the retrospective, feed the results back into detection and preparation – and – most importantly – fulfil your regulatory reporting obligations. This is where the EU AI Act Article 73 and GDPR clocks are met or missed. Document everything: the timeline, the decisions and the evidence.

AI-specific: regulatory reporting (Art. 73 / GDPR), red-team the vector, update runbooks

"AI incident response is not a new discipline that you build from scratch – it is your existing IR capability, extended. The frameworks are already there. What teams are missing is the operational translation: the procedures, the logs and the trained reflexes for the AI parts of every phase."

— Polygraf AI, on closing the AI incident-response gap

The First 60 Minutes — A Runbook

When an AI incident is confirmed, the first hour is everything. Here is a concrete sequence. Adapt the details to your context, but the order – contain, preserve, assess, notify – is the same.

AI INCIDENT — FIRST 60 MINUTES
0–5 min
Declare + assemble. Announce the event, page the AI-aware responders, open a channel. Classify against the six archetypes – the archetype is the runbook.
5–15 min
Contain the channel. Isolate the compromised AI gateway or endpoint. For an agent: revoke its tool access and service credentials. Do not delete anything. Containment, not eradication.
15–25 min
Preserve evidence. Take a snapshot and lock the prompt/response logs, the model version and the RAG/index state. This is your forensic record, protect it before any cleanup touches it.
25–40 min
Scope the blast radius.What data was accessible? Was PII or regulated data exposed? What users, what systems, what downstream actions did the AI perform? This assessment is the notification clock.
40–55 min
Start the regulatory clock. If personal data was disclosed, the GDPR 72-hour clock has begun. If you are a high-risk AI provider/deployer under the EU AI Act, check Article 73 reporting. Get legal and DPO in now, not after.
55–60 min
Brief + document. Brief the stakeholders with what is known and unknown. Start the incident timeline now – we will reconstruct it from memory later and that is how reporting deadlines are missed.

The Regulatory Clocks That Start Ticking

This is the part that most AI teams miss until it's too late: the moment you become aware of a qualifying incident, legal deadlines start – and AI incidents can trigger multiple ones at once. As of August 2026 the EU AI Act adds its own tier of timeline on top of existing breach laws.

The speed gap — attackers move in minutes, your clocks run in days
⚡ Fastest GenAI-assisted exfiltration (simulated) 25 min
⚡ Fastest-quartile real intrusions to exfiltration (2025) 72 min
⏱ EU AI Act — fastest report deadline (widespread) 2 days
⏱ GDPR breach notification 72 hrs
⏱ EU AI Act — standard serious incident 15 days
The data can be gone in under an hour. Your obligation to report runs in days. That's the reason why detection, inline containment and pre-staged reporting have to be built before the incident, there is no time to put them together in one.
72 hrs
GDPR Article 33
Inform your supervisor within 72 hours of the knowledge of a personal-data breach – even if the breach was caused by an AI system.
15 days
EU AI Act Art. 73
Report the most serious incidents of a high-risk AI system to the market surveillance authority within 15 days of being aware of them.
2–10 days
EU AI Act Art. 73 (severe)
2 days for general incidents or for disruption of critical infrastructure; 10 days if a death may be involved. Incomplete first reports are permitted.
Two Clocks, One Incident — and a Deployer Duty

One AI breach of personal data can trigger both the GDPR 72-hour clock and the EU AI Act Article 73 clock at the same time – they are not alternatives to each other. The draft guidance of the Commission also reads the duty of the deployer to inform the provider "immediately" as within 24 hours. And the investigation limitation: under Article 73 you generally should not change the AI system in a way which would affect the cause analysis before informing the authorities. The breach of the EU AI Act reporting obligations is punishable with fines up to 15 million or 3% of the global turnover. Make the reporting decision part of the runbook – do not improvise it during the incident.

Free Tool · Polygraf AI Risk Calculator

Know your AI exposure before the incident — not during it

The best incident response is incident prevention. Polygraf's AI Risk Calculator models your organization's exposure and shows you which regulatory obligations (GDPR, EU AI Act, HIPAA and more) would apply if your AI systems were breached, based on your tools, data types and existing controls.

  • Exposure quantified by breach, regulatory and litigation risk
  • A personalised read on which reporting obligations and clocks would apply to you
  • Shortcomings: logging, containment, and notification
  • Reduction of the model by adding inline detection and governance controls
Run the free AI Risk Assessment →
Sample result
Total Potential Exposure
$49.8M
Data breach
Regulatory
Reputational
Litigation

Your AI IR Readiness Checklist

The time to build this capability is before the incident. If you can't check most of these, your AI incident response readiness has holes to fill now.

AI incident response readiness
A list of all AI systems, agents and their data access and tool permissions.
Prompt and response logging is on with tamper-proof integrity for forensic purposes.
AI-specific signals: prompt-injection, output-anomalies, RAG retrieval, model drift, detection.
There are pre-authorized containment actions – gateway isolation and agent-credential revocation are not waiting for approval
The response team is trained on the six AI incident archetypes and their respective runbooks.
There is a model-rollback and clean-re-indexing path for poisoning.
The regulatory reporting decision (GDPR + EU AI Act Art. 73) is in the runbook, legal/DPO on the escalation path.
The playbook has been run – at least one tabletop AI incident exercise.
Not legal advice. This article is a general educational overview prepared by Polygraf AI. The incident-reporting obligations under the EU AI Act, GDPR and sector regulations are complex, fact-sensitive and changing – the obligations under Article 73 of the EU AI Act come into effect in August 2026 and are still open to final guidance. Check your specific reporting obligations with a qualified legal advisor and your DPO before relying on any timeline in this article.
Polygraf AI

Turn an Invisible AI Incident Into a Contained One

Polygraf AI inspects every AI interaction in-line – detecting injection and data disclosure in real-time, generates the tamper-proof logs your IR team needs, and blocks sensitive data at the gateway before it leaves. On-premise, sub-100ms, zero data egress.

Request a Demo →
Air-gap ready · HIPAA · SOC 2
Deploys in under an hour

NEWS & More

Insights & Updates from Polygraf.

Blog Posts

Most enterprises have no playbook for a compromised LLM. Polygraf's AI incident response guide walks through detection, containment, forensics, and recovery for LLM incidents.

To learn more about Polygraf, please get in touch.

At Polygraf, we envision a future where AI augments human capabilities without compromising safety, privacy, or ethical standards. Trust in our commitment to building this future with you.

Products

thank you

Your download will start now.

Thank you!

Please provide information below and
we will send you a link to download the white paper.