What Is LLM Security?
A Plain-English Guide
for Enterprise Teams

Large language models are being used in email, code, customer service, legal and finance workflows. None of them had a security manual. This is the manual that fills this gap – what LLM security is, why your current tools don't provide it and what needs to be done.

In this guide

01What LLM security actually means
02Why traditional security doesn't cover it
03The threat landscape — real attacks
04OWASP LLM Top 10
05The five security layers
06Real incidents
07Where to start

Guide · May 2026 · Based on OWASP LLM Top 10 v2025, MITRE ATLAS, IBM Cost of Breach 2025

77%

of businesses reported an AI-related security incident in 2024

IBM / Practical DevSecOps 2026

$4.88M

average cost of a data breach in 2024 — highest ever recorded at that time

IBM Cost of a Data Breach 2024

346

AI security incidents logged by the AI Incident Database in 2025

AIID / BlueRadius Cyber 2026

84%

maximum prompt injection success rate across common LLM deployments

SQ Magazine, 2026

What LLM Security Actually Means

LLM security is the effort to prevent misuse, manipulation, data leakage and unauthorized action of large language models and the systems on which they are based. It includes both sides of the LLM interaction: the inputs and the outputs and actions.

The simplest way to explain it: LLM is software that takes instructions in natural language as input. The same security knowledge a security team has for securing software applies: access control, input validation, audit logging, least privilege. However, natural language instructions cannot be validated like structured API parameters. An LLM that takes a malicious instruction in fluent English has no syntactic way of rejecting it. It has to evaluate it. The evaluation is where LLM specific security failures happen.

DEFINITION

LLM security is the set of controls over what an AI system can be told to do, what data it can read and write, what it can do, and how it is observed and enforced – at the model input/output, the data, the tool, and the identity layers.

The domain of LLM security has grown substantially as LLMs have moved from the lab to production. In 2023 LLM security was the prevention of chatbots from making bad statements. In 2026 it is about LLMs embedded in email, code repositories, financial flows, clinical notes and agentic systems that take real action in enterprise infrastructure. The failure modes have grown too.

Why Traditional Security Doesn't Cover It

Enterprise security stacks were designed with the assumption that software is predictable. A WAF blocks requests that match known bad patterns. A DLP system blocks files that contain credit card numbers. An IAM system allows or denies access based on identity and role. These systems work because the inputs they check have a structure (HTTP requests, file types, identity tokens) that can be matched against good or bad patterns.

LLMs process free-form natural language. Same input field that allows a valid customer question also allows a jailbreak attempt, an indirect prompt injection, or social engineering payload. There is no syntactic difference between a benign and a malicious prompt. No traditional tool can perceive that difference. And when the LLM takes an action downstream (writing to a database, sending an email, calling an API), those actions are triggered by the LLM's interpretation of its instruction, not by a human calling the function.

Security Capability

Traditional Tools

LLM Requirement

Input Validation

Pattern matching · Known signatures

Semantic intent classification

Data Loss Prevention

File/network PII regex scanning

Output + retrieval inspection

Access Control

Binary session-level grant/deny

Per-tool-call enforcement

Behavioral Monitoring

Statistical deviation from human baseline

Semantic policy compliance evaluation

Audit Logging

API calls · File access · Auth events

Full decision-chain capture

The Core Gap

Your security stack knows that a user was authenticated and used a resource. It doesn't know that the LLM the user was using was also processing a hidden instruction in a document it pulled and about to send your database credentials to an attacker-controlled address in a tool call that looked like legitimate API usage. This is what LLM security fills.

The Threat Landscape — What's Actually Happening

LLM security is not a hypothetical. The OWASP Top 10 for LLM Applications 2025 is based on attacks in production. The MITRE ATLAS – adversarial machine learning knowledge base – grew to 16 tactics and 84 techniques in 2025 with 14 new techniques for AI agents. In 2026, CrowdStrike's threat report had prompt injection attacks on 90+ organizations. The AI Incident Database has 346 AI incidents in 2025.

Five threat categories account for the majority of LLM security incidents in enterprise environments:

Prompt Injection

OWASP #1 · Two Consecutive Editions

The attacker injects malicious instructions in the text the LLM processes – in the user message (direct injection) or in the content the LLM retrieves (e.g. document, email) – and the LLM interprets the instruction as valid and follows it. Indirect injection is now the majority of the attacks we see (more than 55%), with 20–30% higher success rate than direct injection.

Real: EchoLeak (CVE-2025-32711) — hidden Markdown in email caused Copilot to exfiltrate OneDrive files. Zero-click.

Data Exfiltration via LLM

OWASP #2 · Sensitive Info Disclosure

The LLM is prompted (via injection, jailbreaking or misconfiguration) to return data that it should not be sending e.g. data in context window, data from connected systems (RAG), and credentials or secrets leaked from the system prompt extraction. A Cyberhaven 2024 study found that 11% of data employees are pasting into ChatGPT is confidential.

Real: Samsung engineers leaked source code and meeting notes via ChatGPT — prompted ChatGPT ban across the company.

Jailbreaking

OWASP #5 · Improper Output Handling

Inputs that are designed to circumvent the safety training and system prompt, and for which the model generates content or acts beyond the intended scope of operation. Jailbreak exploits that are known have 17% success rate in a controlled test, and >90% success in a multi-turn enterprise test. Jailbreaks have >70% success in 3 minutes in an enterprise penetration test.

Real: CrowdStrike documented prompt injection against 90+ organizations in 2026 threat reporting.

Supply Chain Poisoning

OWASP #3 · LLM Supply Chain

Attack on the LLM's dependencies: training data, fine-tuning dataset, model weights, plugins, MCP servers, RAG knowledge bases. An attacked component can make the LLM always output the attacker-friendly answer for all users. PoisonedRAG (2024) has shown that it can attack with 97% success rate with 5 poisoned documents in a million-size knowledge base.

Real: Vercel/Context.ai breach (April 2026) — OAuth "Allow All" permissions on AI productivity tool exposed corporate Google Workspace.

Insecure Output Handling

OWASP #5 · High Severity

The LLM output is sent to downstream systems (code interpreters, browsers, databases) without any validation. The output can contain instructions that the downstream system will run. Classic injection attacks (XSS, SQLi, command injection) are re-appearing when LLM output is used to build a database query, a system command or a web page without sanitization.

Real: Replit agent (July 2025) — LLM output interpreted as a database wipe instruction during an active code freeze. 1,206 records destroyed.

Shadow AI

Governance Gap · Cross-cutting

Workers who use AI tools on company data that are not visible to IT, not governed by IT and not governed by data governance. 44% of workers in the US do not know if their company has an AI policy or if they have one (Gallup/Founder Reports, 2026). 78% of workers who use AI at work bring their own AI tools to work (Microsoft Work Trend Index 2025). Each unapproved tool is a risk of data exfiltration, no audit trail, no access control, no incident response.

Real: Multiple healthcare providers discovered patient data (PHI) in commercial AI prompts after HIPAA compliance audits in 2025.

OWASP LLM Top 10 — The Standard Reference

OWASP Top 10 for LLM Applications is the most popular reference framework for LLM security. Updated for 2025 (2nd edition) from production vulnerabilities in the wild and maintained by the OWASP GenAI Security Project. The #1 critical vulnerability is prompt injection in both editions. Knowing the list is the starting point of any enterprise LLM security program.

RANK

            Vulnerability

            Enterprise Implication

LLM01

Prompt Injection

The LLM can process any content and it can contain instructions for the attacker. Every document, e-mail, web page, tool output is a vector of injection.

LLM02

Sensitive Info Disclosure

LLM produces system prompts, user data or retrieved documents that it should not send. Very risky when LLM has RAG access to sensitive knowledge bases.

LLM03

Supply Chain

A compromised training data, model weights, plugins or third party integrations allows the attacker to inject behavior at the model or infrastructure level.

LLM04

Data & Model Poisoning

The attacker modifies the training data or fine-tuning data of the LLM to inject long-term biases or backdoored behavior that will be activated by a trigger.

LLM05

Insecure Output Handling

The output of LLMs is sent directly to downstream systems (code execution, browsers, APIs) without validation. Classic injection attacks are resurfacing with the help of LLM generated content.

LLM06

Excessive Agency

LLM was given more permissions, tools or autonomy than needed for the task. Each extra permission is another blast radius in case another vulnerability is exploited.

LLM07

System Prompt Leakage

The attacker extracts the system prompt by the crafted queries and exposes the business logic, security controls, and the architecture which can be used for further attacks.

LLM08

Vector & Embedding Weaknesses

RAG pipeline vulnerabilities: poisoned vector embeddings, adversarial similarity attacks, injected chunks from retrieved documents.

LLM09

Misinformation

LLM confidently generates false information on which downstream decisions are based. Important in healthcare (clinical decisions), in law (case research) and in finance (investment advice).

LLM10

Unbounded Consumption

Denial of service by resource exhaustion — an input that results in the generation of too many tokens, API calls, or computations, and so the service becomes unavailable.

The Five Layers of LLM Security

LLM security is not a control or a category of products. It is five layers of an enterprise AI system and has its own threat surface and controls. A program that covers only one or two layers is vulnerable in the other.

The five LLM security layers — attack surface and required controls at each

Click any layer to expand threats and controls

L1 Input Highest attack volume

Prompt injection · Jailbreaking +

What's at risk

Every prompt and piece of content the LLM is fed is an injection vector. Indirect injection (instructions in retrieved documents, emails, web pages) now makes up over 55% of real-world attacks and has 20–30% higher success rates than direct injection.

Primary controls

Check the semantic input of the LLM before processing. Is the retrieved content adversarial or goal-hijacking? Check the session for multi-turn escalation.

L2 Model Supply chain risk

Data poisoning · System prompt leakage +

What's at risk

Model weights, training data, fine-tuning datasets and system prompts are all attack surface. A poisoned model or leaked system prompt breaks every downstream control. AI Incident Database recorded 346 incidents in 2025 – many of them are caused by this layer.

Primary controls

Check model integrity. Protect system prompt (secret, not documentation). Scan all model components, plugins and MCP servers in the supply chain before deployment.

L3 Output Primary enforcement point

Data disclosure · Policy violations · PII leakage −

What's at risk

What the LLM returns – to users, to downstream systems, to tool calls. Sensitive data disclosure, policy violations and insecure content happen at the output layer. Cyberhaven: 11% of data employees paste into ChatGPT is confidential – the output layer is where that data is exposed externally.

Primary controls

Pre-transmission output inspection. PII redaction in-line. Response content policy enforcement. Content filtering. This is the main enforcement point of Polygraf – sub-100ms, on-premise, for LLM01, LLM02, and LLM05 in one pass.

L4 Data RAG attack surface

Knowledge base poisoning · Embedding attacks +

What's at risk

RAG pipelines, vector databases and knowledge bases inject poisoned retrieval content into the LLM's context. PoisonedRAG (2024) had 97% success rate with 5 poisoned documents in a million-document knowledge base – this is the persistence layer that allows attacks to survive across sessions.

Primary controls

Track the provenance of every chunk retrieved. Check the retrieval time for instruction-like patterns. Tiered access: not every agent should read every knowledge base. Immutable audit trail for every write to a knowledge base.

L5 Actions Agentic blast radius

Tool misuse · Excessive agency · Privilege escalation +

What's at risk

When LLMs use tools (APIs, databases, code execution, file systems) the attack surface is every system the tools can reach. 70% of enterprise agents have more access than their human counterparts (Teleport, 2026). Replit: too much tool access + no output gate = 1,206 records deleted.

Primary controls

Tool allowlisting at the MCP gateway. Argument-level constraints per tool. Least privilege at task level, not team level. Purpose binding at execution layer. Unique agent identity and independent revocation path.

Real Incidents — 2024 to 2026

These are not security research demos. They are documented production incidents from enterprise environments – each one showing a specific failure on one or more of the five layers above.

Jan 2025 (disclosed Jun 2025)

EchoLeak — CVE-2025-32711

A crafted email prompt injection led Microsoft 365 Copilot to silently exfiltrate emails, OneDrive files and Teams chats without any user interaction. Layer 1 + Layer 3 failure.

Prompt Injection · LLM01

July 2025

Replit Agent Database Wipe

An AI coding agent deleted production database records for 1,206 executives in 1,196 companies during a live code freeze. The LLM output was used as an executable instruction without any validation gate. Layer 5 failure.

Insecure Output · Excessive Agency

September 2025

postmark-mcp Supply Chain

MCP official postmark server backdoored to silently BCC every processed email to an attacker address. ~300 organizations affected. ~1500 weekly downloads. Not detected for weeks. Layer 2 + Layer 5 failure.

Supply Chain · LLM03

August 2025

Salesloft-Drift OAuth Breach

Stolen OAuth tokens used to authenticate as a trusted AI integration in 700+ Salesforce environments. 10-day automated exfiltration of contacts, case records and credentials. Layer 5 + identity failure.

Excessive Agency · NHI Compromise

April 2026

Vercel / Context.ai OAuth Breach

Vercel employee gave AI productivity tool "Allow All" OAuth access to corporate Google Workspace. Leaked emails, documents and calendar. Classic over-permissioning with OAuth excessive agency. Layer 5 failure.

Excessive Agency · LLM06

Ongoing · 2024–2026

Samsung / Enterprise ChatGPT Data Leaks

Samsung engineers have leaked source code and internal meeting notes on ChatGPT. Cyberhaven: 11% of data employees copy into ChatGPT is confidential. Health care providers discovered PHI in commercial AI prompts during 2025 compliance audits. Layer 3 + Shadow AI.

Data Disclosure · Shadow AI

LLMs are not experimental tools in 2026. They are part of the core business systems and they are trusted with real data and real actions. This makes their failures much more dangerous than the bugs of traditional software.

— LLM Security Risks in 2026, Sombrainc · February 2026

Where to Start: A Practical Sequence

The five layer model can be difficult to manage when every layer has exposure. The order of practical implementation is by risk priority – start with the layer where most incidents are occurring and work your way out.

Week 1

Output Layer First

Layer 3

Inspect your LLMs' output before sending it out. One single highest coverage control – detects LLM02, LLM05 and policy violations from any attack source. Last line of defence, first to deploy.

Covers: LLM01 partial · LLM02 · LLM05 · LLM06

Month 1

Input and Action Layers

Layer 1 Layer 5

Introduce semantic input inspection for indirect injection. Tool allowlisting and argument constraints for any LLM with tool use. Fixes the EchoLeak and Replit attack paths (the two most high-profile enterprise LLM failures in the last 12 months).

Covers: LLM01 · LLM06 · LLM03 partial

Month 2–3

Data and Model Layers

Layer 2 Layer 4

Implement RAG provenance and retrieval time inspection. Supply chain audit all models, plugins and MCP servers. Add shadow AI discovery to inventory program. PoisonedRAG-class attacks.

Covers: LLM03 · LLM04 · LLM08 · LLM10

Ongoing

Monitoring and Governance

All Layers

Agent identity context structured audit logs. Behavioral baselines for each LLM deployment. Quarterly review of tool permission policy. Mapping of OWASP LLM Top 10 and NIST AI RMF annually.

Covers: LLM07 · LLM09 · Compliance posture

The Polygraf Approach

Polygraf's Behavioral Control Plane is on Layers 1 and 3 at the same time: it looks at inputs before the LLM processes them and at outputs before they are sent, it enforces policy in real time with sub-100ms latency and it has full audit logging of every interaction. It covers the highest volume attack classes (LLM01, LLM02, LLM05, LLM06) in a single deployment, on-premise, without any data leaving your security perimeter.

Polygraf AI

LLM Security Across All Five Layers

Polygraf's Behavioral Control Plane is policy enforcement at the LLM input and output edge, scanning every prompt and response in real time, blocking injection, redacting context-sensitive data, tool policy enforcement and logging every interaction with full audit context. Sub-100ms. On-premise. No data leaves your environment.

Request a Demo →

Air-gap ready · HIPAA · SOC 2
Deploys in under an hour

NEWS & More

Insights & Updates from Polygraf.

Blog Posts

What Is LLM Security? Enterprise Guide

Read Polygraf AI's plain-English guide to LLM security for enterprise teams to understand why securing an LLM is a must have for any organization who cares about their privacy.

Blog Posts

Least Privilege for AI Agents with MCP Controls

Agents with over-permissioned access turn every prompt injection into a breach. Learn how to implement least priviledge access.

Blog Posts

MCP Tool Poisoning: MCP vs AI Agents

Tool poisoning hides malicious instructions inside MCP server descriptions that AI agents execute silently, succeeding over 60% of the time. Here’s how the attack works and what stops it.

To learn more about Polygraf, please get in touch.

At Polygraf, we envision a future where AI augments human capabilities without compromising safety, privacy, or ethical standards. Trust in our commitment to building this future with you.

What Is LLM Security?
A Plain-English Guide
for Enterprise Teams

What LLM Security Actually Means

Why Traditional Security Doesn't Cover It

The Threat Landscape — What's Actually Happening

OWASP LLM Top 10 — The Standard Reference

The Five Layers of LLM Security

Real Incidents — 2024 to 2026

Where to Start: A Practical Sequence

LLM Security Across All Five Layers

NEWS & More

Insights & Updates from Polygraf.

Blog Posts

What Is LLM Security? Enterprise Guide

Blog Posts

Least Privilege for AI Agents with MCP Controls

Blog Posts

MCP Tool Poisoning: MCP vs AI Agents

To learn more about Polygraf, please get in touch.

Data Privacy

Data Provenance

Developers

Company

What Is LLM Security?A Plain-English Guidefor Enterprise Teams

What LLM Security Actually Means

Why Traditional Security Doesn't Cover It

The Threat Landscape — What's Actually Happening

OWASP LLM Top 10 — The Standard Reference

The Five Layers of LLM Security

Real Incidents — 2024 to 2026

Where to Start: A Practical Sequence

LLM Security Across All Five Layers

NEWS & More

Insights & Updates from Polygraf.

Blog Posts

What Is LLM Security? Enterprise Guide

Blog Posts

Least Privilege for AI Agents with MCP Controls

Blog Posts

MCP Tool Poisoning: MCP vs AI Agents

To learn more about Polygraf, please get in touch.

Data Privacy

Data Provenance

Developers

Company

thank you

Thank you!

What Is LLM Security?
A Plain-English Guide
for Enterprise Teams