LLM Security: Definition and Scope
LLM security is the practice of protecting large language model applications across their entire lifecycle - not just the model, but everything that touches it. That includes the prompts and documents flowing in, the model weights and the third-party components in their supply chain, the runtime and infrastructure that serve inference, and the outputs and downstream actions the model produces. The goal is to prevent confidentiality, integrity, and availability failures specific to how LLMs work.
LLM security is distinct from traditional application security because the attack surface is different in kind, not just degree. In a conventional app, code and data are cleanly separated. In an LLM application, instructions and data share the same channel - the prompt - which means an attacker who can influence any text the model reads can potentially alter what the model does. That single property is responsible for the most important class of LLM vulnerability, and it has no clean analogue in pre-AI systems. We unpack the contrast in AI security versus traditional appsec.
It is also distinct from AI governance, though the two reinforce each other. Governance answers should we use this model, for what, under whose authority, with what audit trail. Security answers can an attacker make it misbehave, and how do we stop them. A well-governed deployment with no runtime security controls is still exploitable; a hardened runtime with no governance is still a compliance failure. Mature programmes do both - the distinction is mapped in AI governance versus AI security.
The reference framework for the field is the OWASP Top 10 for Large Language Model Applications, which the rest of this page uses as its organising taxonomy.
The LLM Threat Taxonomy: OWASP Top 10 for LLM Applications
The OWASP Top 10 for LLM Applications is the most widely adopted taxonomy for LLM risk, maintained by a working group of security practitioners and refreshed as the threat landscape evolves. It is the right starting point for any threat model because it is vendor-neutral, concrete, and mapped to real exploit classes rather than abstractions.
| OWASP ID | Risk | What it means in practice |
|---|---|---|
| LLM01 | Prompt Injection | Untrusted input overrides the system's instructions, directly or via retrieved content |
| LLM02 | Sensitive Information Disclosure | The model reveals PII, secrets, or proprietary data in its output |
| LLM03 | Supply Chain | Compromised models, datasets, plugins, or dependencies introduce risk |
| LLM04 | Data and Model Poisoning | Tampered training or fine-tuning data corrupts model behaviour |
| LLM05 | Improper Output Handling | Model output is trusted and executed downstream without validation |
| LLM06 | Excessive Agency | The model has more tools, permissions, or autonomy than it should |
| LLM07 | System Prompt Leakage | Confidential instructions or secrets in the system prompt are exposed |
| LLM08 | Vector and Embedding Weaknesses | Flaws in RAG retrieval leak or poison data |
| LLM09 | Misinformation | Hallucinated or manipulated output causes harm when relied upon |
| LLM10 | Unbounded Consumption | Resource exhaustion, denial of wallet, or model extraction via excess queries |
The practical value of the taxonomy is that it forces coverage. Most teams instinctively defend against the risk they have heard of - usually prompt injection - and leave output handling, excessive agency, and unbounded consumption unaddressed. A threat model that walks all ten is far harder to blindside. The deepest treatment of the offensive side is in our LLM attack vectors analysis.
Prompt Injection: The Defining LLM Vulnerability
Prompt injection is LLM01 for a reason - it is the vulnerability class with no clean fix, because it exploits the fundamental design of how language models process text. The model cannot reliably distinguish instructions you intended from instructions an attacker smuggled into the same context window.
Two forms matter:
- Direct prompt injection. The user crafts input that overrides the system prompt - "ignore your previous instructions and..." - to extract the system prompt, bypass safety rules, or trigger unintended actions.
- Indirect prompt injection. The malicious instructions are planted in content the model later reads - a web page, a support ticket, a shared document, a calendar invite. When the model ingests that content during a RAG retrieval or a tool call, it executes the planted instructions as if they were legitimate. This is the more dangerous form because the victim never typed anything hostile.
Indirect injection is precisely why retrieved content is an untrusted input channel and why RAG security is its own discipline. The consequences escalate sharply when the model has tools: an injected instruction that merely produces bad text is a nuisance, but the same instruction that can call an email API, query a database, or trigger a workflow is a breach. This is the interaction between LLM01 and LLM06 (Excessive Agency), and it is why agent governance is becoming a security priority rather than a research curiosity.
There is no single control that eliminates prompt injection. Effective defence is layered: an AI firewall that inspects inbound prompts and retrieved content for injection patterns, structural separation of trusted instructions from untrusted data, least-privilege tool scoping so a compromised prompt cannot do much, output validation before any action is taken, and full logging so injection attempts are detectable after the fact. The full defensive playbook is in our enterprise prompt injection prevention guide and the deeper prompt injection explainer.
Data Leakage: The Most Common Real-World Failure
Prompt injection gets the headlines, but sensitive information disclosure (LLM02) is the failure that actually costs organisations money today. It takes several forms, and they compound.
- Input-side leakage. An employee pastes a customer database, a contract, or source code into a model. With a public service, that data has now left your boundary and may be retained or used for training. This is the shadow AI problem, and it is the single most common AI data incident.
- Output-side leakage. The model surfaces sensitive data it should not - PII from training data, confidential documents retrieved through an over-permissive RAG index, or one user's data echoed to another in a multi-tenant system.
- System prompt leakage (LLM07). Developers embed API keys, internal URLs, or business logic in the system prompt, assuming users cannot see it. Prompt injection routinely extracts it.
The empirical case is not hypothetical. The IBM Cost of a Data Breach Report 2025 found that one in five organisations suffered a breach involving shadow AI, and those breaches cost an average of USD 670,000 more than breaches without it. The canonical incident remains Samsung's company-wide ban on generative AI tools in 2023 after engineers pasted internal source code into ChatGPT.
The primary control is AI DLP: inline inspection of prompts and responses for PII, PHI, secrets, and source code, with block, redact, or warn actions applied before data leaves the boundary. The architectural control is keeping inference inside a boundary you own - a private LLM deployment - so that the input-side leakage vector to third parties is closed by design. The two are complementary: private hosting closes the external path, and DLP governs what happens inside it.
Supply Chain Risk and Runtime Controls
The remaining high-impact risks cluster into supply chain (LLM03, LLM04) and runtime governance (LLM05, LLM06, LLM10). These are the areas most teams under-invest in because they are less visible than prompt injection.
Supply Chain
An LLM application inherits the trust posture of every component it pulls in: the base model weights, fine-tuning datasets, embedding models, vector databases, plugins, and the open-source libraries gluing them together. A model downloaded from a public hub can be backdoored; a poisoned dataset can corrupt behaviour in ways that are hard to detect after the fact - see data poisoning. The discipline here mirrors software supply-chain security: know your components, verify their provenance, and maintain an AI bill of materials. We cover this in AI supply chain security and the model supply chain deep dive.
Runtime Controls
Runtime is where security meets the request. The controls that matter:
- Output handling (LLM05). Never trust model output as code or commands. Validate, sanitise, and sandbox anything the model produces before it is executed, rendered, or passed downstream - the LLM equivalent of treating user input as hostile.
- Least-privilege agency (LLM06). Give the model the minimum tools and permissions its task requires. An agent that can read should not also be able to delete; a summariser should have no tools at all.
- Rate and consumption limits (LLM10). Enforce per-user and per-application quotas to prevent denial-of-wallet attacks, resource exhaustion, and model-extraction probing - see AI rate limiting.
- Runtime policy and audit. A runtime policy layer evaluates every request against organisational rules, and an immutable audit trail makes incidents investigable. You cannot respond to what you cannot see - see AI incident response.
The unifying observation: most of these controls live at the same point in the architecture - the moment a request passes between the user and the model. That is why an LLM gateway with inline inspection is the natural enforcement point for runtime LLM security, and why ungoverned direct-to-provider calls are so hard to secure - there is no chokepoint to instrument.
How Areebi Enforces LLM Security at the Point of Use
Areebi is an enterprise secure AI platform that implements the runtime half of LLM security as a built-in property of every interaction, rather than as controls teams must assemble and maintain themselves. It sits at the enforcement point - between users and 30+ LLM providers - and applies security policy to every request.
- Inbound and outbound inspection: real-time DLP scans prompts and responses for PII, PHI, secrets, and source code, mitigating sensitive information disclosure (LLM02 and LLM07) before data crosses the boundary.
- Injection-aware filtering: inbound prompts and retrieved RAG content are inspected for prompt injection patterns (LLM01 and LLM08) before reaching the model.
- Least-privilege by design: workspace isolation and RBAC constrain what each user and each context can reach, limiting excessive agency (LLM06) and blast radius.
- Consumption controls: per-user and per-team rate and token limits defend against unbounded consumption and denial-of-wallet (LLM10).
- Private and air-gapped deployment: run inference inside your own boundary via Docker, Kubernetes, VM, air-gapped, or local-only Ollama and LM Studio, closing the external data path entirely and giving you control over supply-chain provenance.
- Immutable audit and policy engine: a no-code policy engine encodes your runtime rules, and a tamper-evident audit log makes every interaction investigable for incident response and compliance.
- Browser extension blocking: stop employees reaching ungoverned external AI tools, collapsing the shadow-AI leakage surface that drives most real incidents.
LLM security is not a single product feature - it is a posture that has to be enforced on every request. Map your exposure against the OWASP Top 10, then see how the controls land in practice: read what is an LLM gateway for the enforcement architecture, review your SOC 2 and HIPAA obligations, or book a demo to test it against your own threat model. Pricing is on the pricing page.
Frequently Asked Questions
What is the difference between LLM security and AI governance?
LLM security is about preventing an attacker from making a model misbehave - blocking prompt injection, stopping data leakage, hardening the supply chain, and constraining what the model can do at runtime. AI governance is about authority and accountability - whether a model should be used at all, for what purpose, under whose sign-off, and with what audit trail. They overlap but are not substitutes: a perfectly governed deployment with no runtime controls is still exploitable, and a hardened runtime with no governance is still a compliance failure. Mature programmes implement both.
What is the OWASP Top 10 for LLM Applications?
It is the most widely adopted taxonomy of large language model security risks, maintained by an OWASP working group and refreshed as threats evolve. It covers prompt injection, sensitive information disclosure, supply chain risk, data and model poisoning, improper output handling, excessive agency, system prompt leakage, vector and embedding weaknesses, misinformation, and unbounded consumption. It is the recommended starting point for any LLM threat model because it is vendor-neutral and maps to concrete, observed exploit classes.
Can prompt injection be completely prevented?
No - and any vendor claiming a complete fix should be treated with suspicion. Prompt injection exploits the core design of language models, which cannot reliably separate trusted instructions from untrusted data sharing the same context. The realistic goal is defence in depth that drives residual risk acceptably low: inspect inbound prompts and retrieved content with an AI firewall, structurally separate instructions from data, apply least-privilege tool scoping so a successful injection can do little, validate output before any action, and log everything for detection. The combination is robust even though no single layer is.
Why is data leakage the most common LLM security incident?
Because it requires no attacker - a well-intentioned employee pasting a customer record or source code into a public AI tool causes it, and that behaviour is widespread. The IBM Cost of a Data Breach Report 2025 found one in five organisations suffered a breach involving shadow AI. The controls are inline DLP that inspects prompts and responses before data leaves the boundary, plus keeping inference inside infrastructure you control so the external path to third parties is closed by design.
Where should LLM security controls be enforced?
At the point where every request passes between the user and the model - the same chokepoint an LLM gateway occupies. Concentrating DLP, injection filtering, rate limiting, policy evaluation, and audit at that single inline point is far more reliable than scattering partial controls across many applications. Ungoverned direct-to-provider calls are hard to secure precisely because there is no shared enforcement point to instrument.
Does using a private or self-hosted LLM make an application secure?
It closes one important attack surface - the external data path to a third-party provider - but it does not secure the application by itself. Prompt injection, insecure output handling, excessive agency, over-permissive RAG retrieval, and absent audit all survive the move to private infrastructure. Private hosting solves the data path; runtime controls such as DLP, injection filtering, least-privilege scoping, and audit solve usage risk. Both are required.
Related Resources
Explore the Areebi Platform
See how enterprise AI governance works in practice - from DLP to audit logging to compliance automation.
See Areebi in action
Learn how Areebi addresses these challenges with a complete AI governance platform.