On this page
TL;DR for the time-pressed
AI incidents do not fit the classic CSIRT playbook cleanly. They are not just security incidents (NIST SP 800-61r2), they are AI-specific incidents (NIST AI 600-1 Generative AI Profile, July 2024), and they map to a different threat model (MITRE ATLAS, OWASP Top 10 for LLM Applications 2025) and a different regulator notification matrix (CISA AI incident reporting guidance, EU AI Act Article 73, GDPR breach articles, sectoral regulators). This runbook is the operational pattern Areebi's incident response team ships to enterprise customers, covering prompt injection containment, output toxicity and hallucination triage, AI DLP breach response, model supply-chain compromise, and the regulator notification windows by jurisdiction. Updated 2026-05-20.
The framework stack you are operating against
An AI incident response programme in 2026 sits on a stack of five framework references. Each adds something the others miss.
- NIST SP 800-61 Revision 2, Computer Security Incident Handling Guide. The classic four-phase model: preparation, detection and analysis, containment / eradication / recovery, post-incident activity. NIST SP 800-61r2 is what most enterprise CSIRTs are built on. NIST released a public draft of SP 800-61r3 in 2024 that reorganises around the CSF 2.0 functions; you will see both in transition.
- NIST AI 600-1, Generative AI Profile (July 2024). The companion to the AI RMF 1.0, specifically for generative AI. It enumerates twelve risks unique or exacerbated by GenAI: CBRN information / capabilities, confabulation (hallucination), dangerous / violent / hateful content, data privacy, environmental impacts, harmful bias / homogenisation, human-AI configuration, information integrity, information security, intellectual property, obscene / degrading / abusive content, and value chain / component integration. The NIST AI RMF page hosts both. The Areebi NIST AI RMF Manage function deep dive covers the Manage controls that wrap incident response.
- MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems). The ATT&CK-style matrix for AI threats. The ATLAS site maintains tactics (reconnaissance, ML model access, execution, persistence, etc.) and techniques (LLM prompt injection, training data poisoning, model evasion, etc.) with case studies. Your detection and triage taxonomy maps here.
- OWASP Top 10 for LLM Applications 2025. LLM-specific vulnerability taxonomy (prompt injection, sensitive information disclosure, supply chain, data and model poisoning, improper output handling, excessive agency, system prompt leakage, vector and embedding weaknesses, misinformation, unbounded consumption). The OWASP GenAI Security Project hosts the current edition. Your containment patterns map here.
- CISA AI Incident Reporting guidance, EU AI Act Article 73, sectoral rules. The notification matrix. EU AI Act Article 73 requires reporting of serious incidents involving high-risk AI systems to national competent authorities; CISA guidance and the Cyber Incident Reporting for Critical Infrastructure Act (CIRCIA) of 2022 govern US critical-infrastructure incident reporting. Sectoral rules layer on top (HIPAA breach notification under 45 CFR 164.400-414, SEC cyber disclosure under Item 1.05 of Form 8-K, DORA major-ICT-incident reporting in EU financial services).
The Areebi runbook does not replace any of these; it threads them together so an on-call engineer at 02:00 has one playbook to execute.
AI incident taxonomy: what counts and how to classify
Before you can respond, you have to recognise. The Areebi taxonomy uses six top-level categories, each mapped to NIST AI 600-1 risk classes, MITRE ATLAS techniques, and OWASP LLM Top 10 entries:
| Category | Examples | NIST AI 600-1 | OWASP LLM Top 10 |
|---|---|---|---|
| Prompt injection / instruction override | Direct injection in user prompt; indirect injection from retrieved document; jailbreak via roleplay | Information security; information integrity | LLM01 Prompt injection |
| Sensitive data disclosure (DLP) | PHI in completion; PII leak via system prompt extraction; trade secret in output | Data privacy; intellectual property; information security | LLM02 Sensitive information disclosure; LLM07 System prompt leakage |
| Output toxicity / harmful content | Hate speech in completion; dangerous instructions; abusive output | Dangerous / violent / hateful content; obscene content; harmful bias | LLM09 Misinformation |
| Hallucination / confabulation causing harm | Fabricated citations relied on by decision-maker; hallucinated medical guidance | Confabulation; information integrity | LLM09 Misinformation |
| Model drift / capability change | Quality regression after vendor model update; behavioural change after fine-tune; degraded accuracy in production | Information integrity; human-AI configuration | Cross-cutting |
| Model supply-chain compromise | Compromised dependency in inference stack; trojaned open weight; tampered embeddings; poisoned training data | Value chain / component integration; information security | LLM03 Supply chain; LLM04 Data and model poisoning |
The classification matters because the containment plays, the regulator notification questions, and the post-incident learning loop differ for each. A prompt injection in a development sandbox is not the same incident as a PHI disclosure in production - even though both might originate from the same vulnerability class.
Phase 1: Preparation (the work you should have done last quarter)
NIST SP 800-61r2 Section 2.3 lists the preparation activities every CSIRT needs. The AI-specific additions for 2026:
- Inventory. A current list of every AI use case, vendor, model, and data class in production. Without an inventory, detection is impossible. The Areebi platform maintains the inventory by capturing every prompt that traverses the control plane; the 90-minute shadow AI hunt is the bootstrap pattern for organisations starting from zero.
- Policy engine baseline. Pre-defined policies for data classification, output filtering, retention, and access control. Policies you can roll back to if an incident requires aggressive containment. The Areebi policy engine primer describes the pattern.
- Kill switch. A documented, tested mechanism to disable an AI use case, a model integration, or all AI access for an identity, group, or tenant. Pre-incident is when you test it - not during.
- Audit log baseline. Prompt, completion, retrieval context, policy decision, identity, model version. Without these you cannot reconstruct an incident. See the AI audit primer.
- Detection rules. SIEM / SOAR rules for prompt-injection patterns, sensitive-data egress, abuse signatures, drift metrics. The AI observability primer covers metric design.
- Vendor incident-coordination contacts. The model provider's security contact, abuse contact, and incident-disclosure contact. The DPA's notification clauses. The BAA's breach reporting clauses for healthcare. Get the contact numbers before the incident.
- Runbooks and tabletops. Per-category runbooks (this document). Tabletops at least quarterly for the high-impact scenarios.
- Regulator notification matrix. Jurisdictional table of who you must notify, when, and what content. See the Notification section below.
- Communications plan. Pre-drafted templates for customer, regulator, board, and (if needed) media communications.
If preparation is incomplete, every other phase suffers. The Areebi 30/60/90 CISO playbook sequences this work.
Phase 2: Detection and analysis
The detection signals that should trigger AI incident workflows in 2026.
- Policy engine alerts. Output filter blocks above baseline rate; redaction events on a single identity above baseline; system prompt exposure in retrieved content; PHI / PII patterns in completions on workloads not authorised for that data class.
- Anomalous prompt patterns. Long prompts with embedded instructions; common injection patterns (e.g. "ignore previous instructions", "you are now in developer mode"); base64 or hex-encoded payloads; prompts that reference internal system prompts; queries that look like reconnaissance for system prompt extraction.
- Anomalous completion patterns. Outputs containing data the source documents do not, model refusal patterns dropping below baseline (jailbreak indicator), outputs in unexpected languages or formats.
- Vendor disclosure. Provider security advisory (e.g. OpenAI, Anthropic, Google publish security advisories; open-weight provenance providers like Hugging Face flag compromised model uploads). Subscribe to all relevant advisory feeds.
- User reports. A clinician reports a hallucinated dosage; a customer reports inappropriate content; an analyst reports the model leaking a competitor's data. Triage every report - signal-to-noise is high.
- Drift signals. Quality benchmark regression on the production model; evaluation suite scores moving outside control limits; user satisfaction metrics dropping.
- Supply-chain signals. Dependency vulnerability disclosure (NVD CVE for an inference library, a vector DB, an embedding model); base model deprecation; provenance attestation failures (e.g. SLSA verification failure on a fine-tune pipeline).
Analysis: validate the signal against your inventory and audit log. Decide the category from the taxonomy above. Assign severity (S1 / S2 / S3). Engage the on-call team and the relevant function leads (Privacy, Legal, Comms, Vendor Management).
Containment runbook: prompt injection
Prompt injection is the #1 OWASP LLM risk and the most common Areebi incident category in 2026. The containment sequence:
- Identify the injection vector. Direct (user prompt content), indirect (content retrieved from a document, webpage, email, or other source pulled into the prompt context), or stored (poisoned content in your own knowledge base or vector store).
- Stop the bleed. Apply a policy patch that blocks the specific pattern, or escalates to human review. If the injection vector is a specific retrieval source, disable retrieval from that source until cleaned. The Areebi policy engine supports hot-deploy of policy patches without service restart.
- Preserve evidence. Export the full audit log for the affected sessions, including prompt, retrieval context, completion, and policy decisions. Hash and timestamp.
- Identify scope. Query the audit log for similar prompt patterns across the affected time window. Identify which users, sessions, and downstream actions were touched.
- Reverse downstream actions if applicable. If the injected prompt caused the agent to call a tool (sent an email, modified a record, scheduled a meeting), reverse or quarantine the side effects.
- Rotate any disclosed credentials. If the injection caused the system prompt or any credential to be disclosed, rotate immediately.
- Communicate. Affected users (if any), the AI governance committee, the relevant function leads. Regulator if the threshold under your notification matrix is met.
See the Areebi prompt injection deep dive and the prompt injection primer for prevention patterns.
Get your free AI Risk Score
Take our 2-minute assessment and get a personalised AI governance readiness report with specific recommendations for your organisation.
Start Free AssessmentContainment runbook: AI DLP breach
An AI DLP breach is an unauthorised disclosure of regulated or sensitive data via an AI surface. Most common: PHI / PII in a prompt sent to a vendor without a BAA / DPA; trade secret in a completion delivered to a third party; system prompt extraction exposing keys; embedding store query returning data the user was not authorised to see.
- Revoke. Revoke the identity's access to the AI surface. Revoke any tokens the AI used to call downstream systems. Revoke the API key on the model vendor side if appropriate.
- Scrub logs. If the breach was via a vendor that should not have received the data, request scrubbing from the vendor under your DPA / BAA. Scrub from your own logs where retention policy allows. Note: scrubbing the model's training corpus is generally not possible for proprietary APIs - the carve-out in your DPA should prevent training in the first place.
- Quantify. Number of records, fields, identities affected. The HHS OCR 500-record threshold for HIPAA breach notification, the GDPR 72-hour clock for high-risk breaches, the SEC materiality test for public companies - each is a function of scope.
- Notify. Per the notification matrix (see Section below). 72-hour GDPR window starts at awareness; 60-day HIPAA window for the 500+ category; SEC 8-K Item 1.05 four business day window from materiality determination; DORA major-incident reporting on an accelerated schedule.
- Containment patches. Apply policy engine rules to prevent recurrence: data classifier on inputs and outputs, retrieval ACL enforcement, vendor allow-list, prompt template change.
- Evidence pack. The audit log for the incident, the policy engine decisions, the vendor communications, the timeline. The OCR / DPA / SEC will ask for it.
The Areebi AI DLP primer and cost of one shadow AI breach cover the economics.
Containment runbook: output toxicity, hallucination, and harmful content
Toxicity and hallucination incidents differ from DLP because the harm is in the output content, not the data disclosed.
- Capture and classify. Save the exact prompt and completion. Classify the harm category per NIST AI 600-1 (CBRN, dangerous / violent / hateful, obscene, harmful bias, confabulation).
- Sample for breadth. Was this a one-off generation or a pattern? Sample similar prompts from the audit log. If a pattern, the containment is policy-level; if one-off, the response is more targeted.
- Containment. Apply an output filter rule, an updated system prompt, a temperature or model-parameter change, or (for severe cases) disable the affected use case until remediated.
- Retraction if applicable. If the output was delivered to a customer or used in a decision (clinical guidance, financial recommendation), notify and retract.
- Root cause. Was it model-level (the underlying model produced an unsafe output), prompt-level (the system prompt invited it), retrieval-level (the retrieved context introduced it), or jailbreak (user adversarial input)?
- Engage vendor where relevant. Most major LLM vendors run abuse-reporting programmes; for severe safety incidents on a major model, file a report with the vendor's trust and safety team.
The Areebi policy engine ships baseline output filters that catch the most common toxicity categories; tuning the thresholds is the work that has to happen in production.
Containment runbook: model supply-chain compromise
A supply-chain compromise is the highest-severity AI incident category and the one most enterprises are least prepared for. Triggers include CVEs in inference libraries (vLLM, TGI, llama.cpp, Triton), compromised model weights on Hugging Face or another model registry, poisoned training data discovered in a fine-tuning corpus, or a compromise of the model provider itself.
- Isolate. Quarantine the affected model, dependency, or environment. If the model is a fine-tune of a compromised base, roll back to a known-good base.
- Rotate. Any credentials, signing keys, or API keys that traversed the affected component.
- Audit. Reconstruct the timeline. What inferences ran on the compromised component? What data passed through? What downstream actions were taken?
- Vendor escalation. For an upstream compromise (model registry, model provider, dependency), engage the vendor's security team with the disclosure timeline.
- AIBOM (AI Bill of Materials). If you maintain an AIBOM (see the Areebi AIBOM playbook), update it to reflect the new model lineage. If you do not, this incident is the forcing function to build one.
- Notify downstream. If your AI is part of a customer-facing product, downstream customers may need to be notified of the change in your AI supply chain - especially under SOC 2 CC2.3 obligations to communicate to user entities and under sectoral rules.
- Patch and validate. Apply the patched component. Run the full evaluation suite to ensure no regression. The Areebi AI supply chain security primer covers the verification pattern.
The model supply chain security guide goes deeper.
The regulator notification matrix
The notification window depends on the jurisdiction, the data class, and the impact. The 2026 baseline:
| Regulator / regime | Trigger | Window | Reference |
|---|---|---|---|
| GDPR (EU) | Personal data breach likely to result in a risk to rights and freedoms | 72 hours to supervisory authority; without undue delay to data subjects if high risk | GDPR Articles 33, 34 |
| EU AI Act | Serious incident involving a high-risk AI system; serious incident from GPAI with systemic risk | Without undue delay; specific deadlines in Article 73 implementing acts | EU AI Act Article 73 |
| HIPAA (US) | Breach of unsecured PHI | 60 days to individuals; concurrent to HHS for 500+ records; annually for fewer than 500 | 45 CFR 164.400-414 |
| SEC (US listed companies) | Cybersecurity incident determined material | 4 business days from materiality determination | Form 8-K Item 1.05 |
| CIRCIA (US critical infrastructure) | Covered cyber incident | 72 hours for covered incidents; 24 hours for ransom payments | CIRCIA 2022; CISA final rule cycle |
| DORA (EU financial entities) | Major ICT-related incident | Initial notification, intermediate report, final report; ESA-coordinated timelines | DORA Article 19 |
| State breach laws (US) | Personal information breach | Varies by state (often 30-90 days) | State statutes |
| UK ICO | Personal data breach likely to result in a risk to rights and freedoms | 72 hours to ICO; without undue delay to data subjects if high risk | UK GDPR Articles 33, 34 |
| Australia OAIC | Eligible data breach | As soon as practicable, no later than 30 days from assessment | Privacy Act 1988 Part IIIC; NDB Scheme |
An AI incident frequently triggers more than one regime - a PHI disclosure via a non-BAA vendor in a US listed multinational can be in scope for HIPAA, SEC, multiple state laws, GDPR (if EU data subjects), and the EU AI Act if the AI is high-risk. The notification matrix and the legal review have to happen in parallel with containment.
Phase 4: Post-incident activity
The lessons-learned phase is where most CSIRTs underinvest and where the policy engine gets its best updates. Per NIST SP 800-61r2 Section 3.4 and AI 600-1 controls:
- Post-mortem. Within 7 days. Blameless format. Document the timeline, the root cause, the containment effectiveness, the contributing factors, and the lessons.
- Policy engine updates. Translate the lessons into hot-deployable policy. Areebi customers ship most post-incident policy updates within 24 hours of the post-mortem.
- Detection rule updates. Add SIEM rules / observability metrics that would have caught it earlier.
- Inventory updates. Update the AI inventory, the vendor registry, the AIBOM.
- Tabletop scenario. Add the incident to the tabletop library and replay annually.
- Reporting. Internal: AI governance committee, audit committee where material. External: as required by the notification matrix. The Areebi audit log provides the evidence backbone.
- Compliance evidence. Map the post-incident artefacts to the relevant frameworks: NIST AI RMF Manage controls, ISO 42001 control 10.1, SOC 2 CC7.4 / CC7.5, the EU AI Act Article 73 evidence.
The Areebi AI incident response primer sums the loop.
Operating model and on-call
An AI incident response programme that does not have an on-call rotation is a programme on paper only. The model Areebi recommends:
- Primary on-call. Security engineer or AI platform engineer rotation, 24x7. First responder for any S1 / S2.
- AI specialist on-call. An AI-knowledgeable engineer reachable for AI-specific triage (prompt injection patterns, model behaviour, embedding analysis). Can be the same rotation in smaller orgs.
- Privacy on-call. Reachable within 1 hour for any S1 with personal-data implications. Drives GDPR / HIPAA / state-law notification decisions.
- Legal on-call. Reachable within 4 hours for any S1. Drives regulator and external communications.
- Communications lead. For S1 / S2 with customer-facing implications.
- Executive escalation. CISO, then CIO, then CEO for S1 with reputational or regulatory exposure. Board notification per the audit committee charter.
The frequent failure mode: the AI incident is detected at 23:00 on a Friday, the AI-specialist on-call has not been defined, and the primary on-call security engineer is unsure whether the alert is a real incident or a benign jailbreak pattern. Define the AI specialist rotation, the alert thresholds, and the escalation runway before you need them.
Areebi's point of view
An AI incident response programme is not the AI security team's project - it is the joint property of Security, Privacy, Legal, Vendor Management, and the AI engineering team. The teams that recover from AI incidents fastest are the ones that ran the tabletop before the incident, wired the policy engine and audit log before the incident, and pre-negotiated the vendor escalation contacts before the incident. Areebi's research team will keep saying this until it stops being news: containment is a property of the control plane, not of heroics.
Frequently Asked Questions
How does the NIST AI 600-1 GAI Profile change my existing incident response plan?
It does not replace SP 800-61r2; it adds twelve AI-specific risk categories your existing IR plan should detect, classify, and contain. The most common change is to add an AI specialist on-call role, AI-specific detection rules, and an AI-specific notification matrix that layers EU AI Act Article 73 and sectoral rules on top of the standard breach reporting.
Is a prompt injection a security incident or a content moderation issue?
It can be both. A prompt injection that produces inappropriate content is a content incident; a prompt injection that causes system prompt disclosure, credential leakage, agent tool misuse, or data exfiltration is a security incident. Most are mixed. Areebi recommends treating any prompt injection that crosses a trust boundary as a security incident by default.
When does an AI incident trigger SEC 8-K disclosure?
When the incident is determined material under the reasonable-investor standard, the registrant must file an 8-K Item 1.05 within four business days of materiality determination. Most AI incidents will not be material under this standard; PHI breaches affecting hundreds of thousands of records, large vendor compromises, or significant production outages may be. Materiality determination is a Legal / Audit committee decision; the security team supplies the facts.
Do I have to report an AI incident to my model provider?
Often yes, especially if the incident exposes a vulnerability in the model or a misuse pattern at scale. Anthropic, OpenAI, and Google all operate trust and safety reporting channels for severe incidents. For supply-chain incidents involving the provider's stack, your DPA may obligate disclosure. The Areebi vendor registry tracks the disclosure obligation per vendor.
What is the most common AI incident in 2026?
Indirect prompt injection via retrieved content - a malicious document or web page in the retrieval corpus causes the LLM to follow injected instructions, leading to data exfiltration, tool misuse, or inappropriate output. Direct prompt injection in user input is more common but generally less impactful; indirect injection has higher blast radius because the retrieval corpus is trusted.
How long should I retain AI audit logs for incident response?
Long enough to support the longest applicable notification or examination window, balanced against data-minimisation obligations. For most enterprises this is 12-24 months for prompt and completion logs (with PHI / PII redacted or tokenised), with longer retention for the policy decision metadata. Sectoral rules can require longer; data-minimisation can require shorter for personal data. The retention policy is a Privacy + Legal joint decision documented in the AI governance programme.
Related Resources
- AI Incident Response (definition)
- AI Audit (definition)
- AI Observability (definition)
- AI Policy Engine (definition)
- AI DLP (definition)
- AI Supply Chain Security (definition)
- Prompt Injection (definition)
- EU AI Act Compliance Hub
- GDPR Compliance Hub
- HIPAA Compliance Hub
- Areebi Platform
- Trust Center
- Prompt Injection Deep Dive
- Model Supply Chain Security
- AIBOM Playbook
- Cost of One Shadow AI Breach
- 90-minute Shadow AI Hunt
- CISO Playbook 30/60/90
- NIST AI RMF Manage Function Deep Dive
Stay ahead of AI governance
Weekly insights on enterprise AI security, compliance updates, and governance best practices.
Stay ahead of AI governance
Weekly insights on enterprise AI security, compliance updates, and best practices.
About the Author
Areebi Research
The Areebi research team combines hands-on enterprise security work with deep AI governance research. Our analysis is informed by primary sources (NIST, ISO, OECD, federal registers, IAPP) and the operational realities of CISOs running AI programs in regulated industries today.
Ready to govern your AI?
See how Areebi can help your organization adopt AI securely and compliantly.