What Is LLM Security?
A Plain-English Guide
for Enterprise Teams

Large language models are being used in email, code, customer service, legal and finance workflows. None of them had a security manual. This is the manual that fills this gap – what LLM security is, why your current tools don't provide it and what needs to be done.

77%
of businesses reported an AI-related security incident in 2024
IBM / Practical DevSecOps 2026
$4.88M
average cost of a data breach in 2024 — highest ever recorded at that time
IBM Cost of a Data Breach 2024
346
AI security incidents logged by the AI Incident Database in 2025
AIID / BlueRadius Cyber 2026
84%
maximum prompt injection success rate across common LLM deployments
SQ Magazine, 2026

What LLM Security Actually Means

LLM security is the effort to prevent misuse, manipulation, data leakage and unauthorized action of large language models and the systems on which they are based. It includes both sides of the LLM interaction: the inputs and the outputs and actions.

The simplest way to explain it: LLM is software that takes instructions in natural language as input. The same security knowledge a security team has for securing software applies: access control, input validation, audit logging, least privilege. However, natural language instructions cannot be validated like structured API parameters. An LLM that takes a malicious instruction in fluent English has no syntactic way of rejecting it. It has to evaluate it. The evaluation is where LLM specific security failures happen.

DEFINITION

LLM security is the set of controls over what an AI system can be told to do, what data it can read and write, what it can do, and how it is observed and enforced – at the model input/output, the data, the tool, and the identity layers.

The domain of LLM security has grown substantially as LLMs have moved from the lab to production. In 2023 LLM security was the prevention of chatbots from making bad statements. In 2026 it is about LLMs embedded in email, code repositories, financial flows, clinical notes and agentic systems that take real action in enterprise infrastructure. The failure modes have grown too.

Why Traditional Security Doesn't Cover It

Enterprise security stacks were designed with the assumption that software is predictable. A WAF blocks requests that match known bad patterns. A DLP system blocks files that contain credit card numbers. An IAM system allows or denies access based on identity and role. These systems work because the inputs they check have a structure (HTTP requests, file types, identity tokens) that can be matched against good or bad patterns.

LLMs process free-form natural language. Same input field that allows a valid customer question also allows a jailbreak attempt, an indirect prompt injection, or social engineering payload. There is no syntactic difference between a benign and a malicious prompt. No traditional tool can perceive that difference. And when the LLM takes an action downstream (writing to a database, sending an email, calling an API), those actions are triggered by the LLM's interpretation of its instruction, not by a human calling the function.

Security Capability
Traditional Tools
LLM Requirement
Input Validation
Pattern matching · Known signatures
Semantic intent classification
Data Loss Prevention
File/network PII regex scanning
Output + retrieval inspection
Access Control
Binary session-level grant/deny
Per-tool-call enforcement
Behavioral Monitoring
Statistical deviation from human baseline
Semantic policy compliance evaluation
Audit Logging
API calls · File access · Auth events
Full decision-chain capture
Hover a row to see why the gap matters
The Core Gap

Your security stack knows that a user was authenticated and used a resource. It doesn't know that the LLM the user was using was also processing a hidden instruction in a document it pulled and about to send your database credentials to an attacker-controlled address in a tool call that looked like legitimate API usage. This is what LLM security fills.

The Threat Landscape — What's Actually Happening

LLM security is not a hypothetical. The OWASP Top 10 for LLM Applications 2025 is based on attacks in production. The MITRE ATLAS – adversarial machine learning knowledge base – grew to 16 tactics and 84 techniques in 2025 with 14 new techniques for AI agents. In 2026, CrowdStrike's threat report had prompt injection attacks on 90+ organizations. The AI Incident Database has 346 AI incidents in 2025.

Five threat categories account for the majority of LLM security incidents in enterprise environments:

Prompt Injection
OWASP #1 · Two Consecutive Editions
The attacker injects malicious instructions in the text the LLM processes – in the user message (direct injection) or in the content the LLM retrieves (e.g. document, email) – and the LLM interprets the instruction as valid and follows it. Indirect injection is now the majority of the attacks we see (more than 55%), with 20–30% higher success rate than direct injection.
Real: EchoLeak (CVE-2025-32711) — hidden Markdown in email caused Copilot to exfiltrate OneDrive files. Zero-click.
Data Exfiltration via LLM
OWASP #2 · Sensitive Info Disclosure
The LLM is prompted (via injection, jailbreaking or misconfiguration) to return data that it should not be sending e.g. data in context window, data from connected systems (RAG), and credentials or secrets leaked from the system prompt extraction. A Cyberhaven 2024 study found that 11% of data employees are pasting into ChatGPT is confidential.
Real: Samsung engineers leaked source code and meeting notes via ChatGPT — prompted ChatGPT ban across the company.
Jailbreaking
OWASP #5 · Improper Output Handling
Inputs that are designed to circumvent the safety training and system prompt, and for which the model generates content or acts beyond the intended scope of operation. Jailbreak exploits that are known have 17% success rate in a controlled test, and >90% success in a multi-turn enterprise test. Jailbreaks have >70% success in 3 minutes in an enterprise penetration test.
Real: CrowdStrike documented prompt injection against 90+ organizations in 2026 threat reporting.
Supply Chain Poisoning
OWASP #3 · LLM Supply Chain
Attack on the LLM's dependencies: training data, fine-tuning dataset, model weights, plugins, MCP servers, RAG knowledge bases. An attacked component can make the LLM always output the attacker-friendly answer for all users. PoisonedRAG (2024) has shown that it can attack with 97% success rate with 5 poisoned documents in a million-size knowledge base.
Real: Vercel/Context.ai breach (April 2026) — OAuth "Allow All" permissions on AI productivity tool exposed corporate Google Workspace.
Insecure Output Handling
OWASP #5 · High Severity
The LLM output is sent to downstream systems (code interpreters, browsers, databases) without any validation. The output can contain instructions that the downstream system will run. Classic injection attacks (XSS, SQLi, command injection) are re-appearing when LLM output is used to build a database query, a system command or a web page without sanitization.
Real: Replit agent (July 2025) — LLM output interpreted as a database wipe instruction during an active code freeze. 1,206 records destroyed.
Shadow AI
Governance Gap · Cross-cutting
Workers who use AI tools on company data that are not visible to IT, not governed by IT and not governed by data governance. 44% of workers in the US do not know if their company has an AI policy or if they have one (Gallup/Founder Reports, 2026). 78% of workers who use AI at work bring their own AI tools to work (Microsoft Work Trend Index 2025). Each unapproved tool is a risk of data exfiltration, no audit trail, no access control, no incident response.
Real: Multiple healthcare providers discovered patient data (PHI) in commercial AI prompts after HIPAA compliance audits in 2025.

OWASP LLM Top 10 — The Standard Reference

OWASP Top 10 for LLM Applications is the most popular reference framework for LLM security. Updated for 2025 (2nd edition) from production vulnerabilities in the wild and maintained by the OWASP GenAI Security Project. The #1 critical vulnerability is prompt injection in both editions. Knowing the list is the starting point of any enterprise LLM security program.

RANK
Vulnerability
Enterprise Implication
LLM01
Prompt Injection
The LLM can process any content and it can contain instructions for the attacker. Every document, e-mail, web page, tool output is a vector of injection.
LLM02
Sensitive Info Disclosure
LLM produces system prompts, user data or retrieved documents that it should not send. Very risky when LLM has RAG access to sensitive knowledge bases.
LLM03
Supply Chain
A compromised training data, model weights, plugins or third party integrations allows the attacker to inject behavior at the model or infrastructure level.
LLM04
Data & Model Poisoning
The attacker modifies the training data or fine-tuning data of the LLM to inject long-term biases or backdoored behavior that will be activated by a trigger.
LLM05
Insecure Output Handling
The output of LLMs is sent directly to downstream systems (code execution, browsers, APIs) without validation. Classic injection attacks are resurfacing with the help of LLM generated content.
LLM06
Excessive Agency
LLM was given more permissions, tools or autonomy than needed for the task. Each extra permission is another blast radius in case another vulnerability is exploited.
LLM07
System Prompt Leakage
The attacker extracts the system prompt by the crafted queries and exposes the business logic, security controls, and the architecture which can be used for further attacks.
LLM08
Vector & Embedding Weaknesses
RAG pipeline vulnerabilities: poisoned vector embeddings, adversarial similarity attacks, injected chunks from retrieved documents.
LLM09
Misinformation
LLM confidently generates false information on which downstream decisions are based. Important in healthcare (clinical decisions), in law (case research) and in finance (investment advice).
LLM10
Unbounded Consumption
Denial of service by resource exhaustion — an input that results in the generation of too many tokens, API calls, or computations, and so the service becomes unavailable.

The Five Layers of LLM Security

LLM security is not a control or a category of products. It is five layers of an enterprise AI system and has its own threat surface and controls. A program that covers only one or two layers is vulnerable in the other.

The five LLM security layers — attack surface and required controls at each
LAYER 1 · INPUT User prompts · Retrieved content Threats: Prompt injection (direct + indirect) · Jailbreaking Control: Semantic input inspection · Goal classification · Injection detection LAYER 2 · MODEL LLM inference · System prompt Threats: Data poisoning · System prompt leakage · Model supply chain Control: Model integrity verification · System prompt protection · Supply chain scanning LAYER 3 · OUTPUT ← Primary enforcement point Threats: Sensitive data disclosure · Insecure output · Policy violations Control: Output inspection · PII redaction · Policy enforcement · Content filtering LAYER 4 · DATA RAG · Vector DB · Knowledge base Threats: Knowledge base poisoning · PII leakage in retrieval · Embedding attacks Control: RAG provenance tracking · Retrieval-time inspection · Access-tiered knowledge LAYER 5 · ACTIONS Tool calls · APIs · Agent execution Threats: Excessive agency · Tool misuse · Agentic privilege escalation Control: Tool allowlisting · Argument constraints · MCP gateway · Least privilege

Click any layer to expand threats and controls

L1 Input Highest attack volume
Prompt injection · Jailbreaking +
What's at risk

Every prompt and piece of content the LLM is fed is an injection vector. Indirect injection (instructions in retrieved documents, emails, web pages) now makes up over 55% of real-world attacks and has 20–30% higher success rates than direct injection.

Primary controls

Check the semantic input of the LLM before processing. Is the retrieved content adversarial or goal-hijacking? Check the session for multi-turn escalation.

L2 Model Supply chain risk
Data poisoning · System prompt leakage +
What's at risk

Model weights, training data, fine-tuning datasets and system prompts are all attack surface. A poisoned model or leaked system prompt breaks every downstream control. AI Incident Database recorded 346 incidents in 2025 – many of them are caused by this layer.

Primary controls

Check model integrity. Protect system prompt (secret, not documentation). Scan all model components, plugins and MCP servers in the supply chain before deployment.

L3 Output Primary enforcement point
Data disclosure · Policy violations · PII leakage
What's at risk

What the LLM returns – to users, to downstream systems, to tool calls. Sensitive data disclosure, policy violations and insecure content happen at the output layer. Cyberhaven: 11% of data employees paste into ChatGPT is confidential – the output layer is where that data is exposed externally.

Primary controls

Pre-transmission output inspection. PII redaction in-line. Response content policy enforcement. Content filtering. This is the main enforcement point of Polygraf – sub-100ms, on-premise, for LLM01, LLM02, and LLM05 in one pass.

L4 Data RAG attack surface
Knowledge base poisoning · Embedding attacks +
What's at risk

RAG pipelines, vector databases and knowledge bases inject poisoned retrieval content into the LLM's context. PoisonedRAG (2024) had 97% success rate with 5 poisoned documents in a million-document knowledge base – this is the persistence layer that allows attacks to survive across sessions.

Primary controls

Track the provenance of every chunk retrieved. Check the retrieval time for instruction-like patterns. Tiered access: not every agent should read every knowledge base. Immutable audit trail for every write to a knowledge base.

L5 Actions Agentic blast radius
Tool misuse · Excessive agency · Privilege escalation +
What's at risk

When LLMs use tools (APIs, databases, code execution, file systems) the attack surface is every system the tools can reach. 70% of enterprise agents have more access than their human counterparts (Teleport, 2026). Replit: too much tool access + no output gate = 1,206 records deleted.

Primary controls

Tool allowlisting at the MCP gateway. Argument-level constraints per tool. Least privilege at task level, not team level. Purpose binding at execution layer. Unique agent identity and independent revocation path.

Real Incidents — 2024 to 2026

These are not security research demos. They are documented production incidents from enterprise environments – each one showing a specific failure on one or more of the five layers above.

Jan 2025 (disclosed Jun 2025)
EchoLeak — CVE-2025-32711
A crafted email prompt injection led Microsoft 365 Copilot to silently exfiltrate emails, OneDrive files and Teams chats without any user interaction. Layer 1 + Layer 3 failure.
Prompt Injection · LLM01
July 2025
Replit Agent Database Wipe
An AI coding agent deleted production database records for 1,206 executives in 1,196 companies during a live code freeze. The LLM output was used as an executable instruction without any validation gate. Layer 5 failure.
Insecure Output · Excessive Agency
September 2025
postmark-mcp Supply Chain
MCP official postmark server backdoored to silently BCC every processed email to an attacker address. ~300 organizations affected. ~1500 weekly downloads. Not detected for weeks. Layer 2 + Layer 5 failure.
Supply Chain · LLM03
August 2025
Salesloft-Drift OAuth Breach
Stolen OAuth tokens used to authenticate as a trusted AI integration in 700+ Salesforce environments. 10-day automated exfiltration of contacts, case records and credentials. Layer 5 + identity failure.
Excessive Agency · NHI Compromise
April 2026
Vercel / Context.ai OAuth Breach
Vercel employee gave AI productivity tool "Allow All" OAuth access to corporate Google Workspace. Leaked emails, documents and calendar. Classic over-permissioning with OAuth excessive agency. Layer 5 failure.
Excessive Agency · LLM06
Ongoing · 2024–2026
Samsung / Enterprise ChatGPT Data Leaks
Samsung engineers have leaked source code and internal meeting notes on ChatGPT. Cyberhaven: 11% of data employees copy into ChatGPT is confidential. Health care providers discovered PHI in commercial AI prompts during 2025 compliance audits. Layer 3 + Shadow AI.
Data Disclosure · Shadow AI

LLMs are not experimental tools in 2026. They are part of the core business systems and they are trusted with real data and real actions. This makes their failures much more dangerous than the bugs of traditional software.

— LLM Security Risks in 2026, Sombrainc · February 2026

Where to Start: A Practical Sequence

The five layer model can be difficult to manage when every layer has exposure. The order of practical implementation is by risk priority – start with the layer where most incidents are occurring and work your way out.

Week 1
Output Layer First
Layer 3

Inspect your LLMs' output before sending it out. One single highest coverage control – detects LLM02, LLM05 and policy violations from any attack source. Last line of defence, first to deploy.

Covers: LLM01 partial · LLM02 · LLM05 · LLM06
Month 1
Input and Action Layers
Layer 1 Layer 5

Introduce semantic input inspection for indirect injection. Tool allowlisting and argument constraints for any LLM with tool use. Fixes the EchoLeak and Replit attack paths (the two most high-profile enterprise LLM failures in the last 12 months).

Covers: LLM01 · LLM06 · LLM03 partial
Month 2–3
Data and Model Layers
Layer 2 Layer 4

Implement RAG provenance and retrieval time inspection. Supply chain audit all models, plugins and MCP servers. Add shadow AI discovery to inventory program. PoisonedRAG-class attacks.

Covers: LLM03 · LLM04 · LLM08 · LLM10
Ongoing
Monitoring and Governance
All Layers

Agent identity context structured audit logs. Behavioral baselines for each LLM deployment. Quarterly review of tool permission policy. Mapping of OWASP LLM Top 10 and NIST AI RMF annually.

Covers: LLM07 · LLM09 · Compliance posture
The Polygraf Approach

Polygraf's Behavioral Control Plane is on Layers 1 and 3 at the same time: it looks at inputs before the LLM processes them and at outputs before they are sent, it enforces policy in real time with sub-100ms latency and it has full audit logging of every interaction. It covers the highest volume attack classes (LLM01, LLM02, LLM05, LLM06) in a single deployment, on-premise, without any data leaving your security perimeter.

Polygraf AI

LLM Security Across All Five Layers

Polygraf's Behavioral Control Plane is policy enforcement at the LLM input and output edge, scanning every prompt and response in real time, blocking injection, redacting context-sensitive data, tool policy enforcement and logging every interaction with full audit context. Sub-100ms. On-premise. No data leaves your environment.

Request a Demo →
Air-gap ready · HIPAA · SOC 2
Deploys in under an hour

NEWS & More

Insights & Updates from Polygraf.

Blog Posts

Read Polygraf AI's plain-English guide to LLM security for enterprise teams to understand why securing an LLM is a must have for any organization who cares about their privacy.

Blog Posts

Tool poisoning hides malicious instructions inside MCP server descriptions that AI agents execute silently, succeeding over 60% of the time. Here’s how the attack works and what stops it.

To learn more about Polygraf, please get in touch.

At Polygraf, we envision a future where AI augments human capabilities without compromising safety, privacy, or ethical standards. Trust in our commitment to building this future with you.

Products

thank you

Your download will start now.

Thank you!

Please provide information below and
we will send you a link to download the white paper.