What Is LLM Security? A Plain-English Guide for Enterprise Teams
Large language models are being used in email, code, customer service, legal and finance workflows. None of them had a security manual. This is the manual that fills this gap – what LLM security is, why your current tools don't provide it and what needs to be done.
Guide · May 2026 · Based on OWASP LLM Top 10 v2025, MITRE ATLAS, IBM Cost of Breach 2025
77%
of businesses reported an AI-related security incident in 2024
IBM / Practical DevSecOps 2026
$4.88M
average cost of a data breach in 2024 — highest ever recorded at that time
IBM Cost of a Data Breach 2024
346
AI security incidents logged by the AI Incident Database in 2025
AIID / BlueRadius Cyber 2026
84%
maximum prompt injection success rate across common LLM deployments
SQ Magazine, 2026
What LLM Security Actually Means
LLM security is the effort to prevent misuse, manipulation, data leakage and unauthorized action of large language models and the systems on which they are based. It includes both sides of the LLM interaction: the inputs and the outputs and actions.
The simplest way to explain it: LLM is software that takes instructions in natural language as input. The same security knowledge a security team has for securing software applies: access control, input validation, audit logging, least privilege. However, natural language instructions cannot be validated like structured API parameters. An LLM that takes a malicious instruction in fluent English has no syntactic way of rejecting it. It has to evaluate it. The evaluation is where LLM specific security failures happen.
DEFINITION
LLM security is the set of controls over what an AI system can be told to do, what data it can read and write, what it can do, and how it is observed and enforced – at the model input/output, the data, the tool, and the identity layers.
The domain of LLM security has grown substantially as LLMs have moved from the lab to production. In 2023 LLM security was the prevention of chatbots from making bad statements. In 2026 it is about LLMs embedded in email, code repositories, financial flows, clinical notes and agentic systems that take real action in enterprise infrastructure. The failure modes have grown too.
Why Traditional Security Doesn't Cover It
Enterprise security stacks were designed with the assumption that software is predictable. A WAF blocks requests that match known bad patterns. A DLP system blocks files that contain credit card numbers. An IAM system allows or denies access based on identity and role. These systems work because the inputs they check have a structure (HTTP requests, file types, identity tokens) that can be matched against good or bad patterns.
LLMs process free-form natural language. Same input field that allows a valid customer question also allows a jailbreak attempt, an indirect prompt injection, or social engineering payload. There is no syntactic difference between a benign and a malicious prompt. No traditional tool can perceive that difference. And when the LLM takes an action downstream (writing to a database, sending an email, calling an API), those actions are triggered by the LLM's interpretation of its instruction, not by a human calling the function.
Security Capability
Traditional Tools
LLM Requirement
Input Validation
Pattern matching · Known signatures
Semantic intent classification
Data Loss Prevention
File/network PII regex scanning
Output + retrieval inspection
Access Control
Binary session-level grant/deny
Per-tool-call enforcement
Behavioral Monitoring
Statistical deviation from human baseline
Semantic policy compliance evaluation
Audit Logging
API calls · File access · Auth events
Full decision-chain capture
Hover a row to see why the gap matters
The Core Gap
Your security stack knows that a user was authenticated and used a resource. It doesn't know that the LLM the user was using was also processing a hidden instruction in a document it pulled and about to send your database credentials to an attacker-controlled address in a tool call that looked like legitimate API usage. This is what LLM security fills.
The Threat Landscape — What's Actually Happening
LLM security is not a hypothetical. The OWASP Top 10 for LLM Applications 2025 is based on attacks in production. The MITRE ATLAS – adversarial machine learning knowledge base – grew to 16 tactics and 84 techniques in 2025 with 14 new techniques for AI agents. In 2026, CrowdStrike's threat report had prompt injection attacks on 90+ organizations. The AI Incident Database has 346 AI incidents in 2025.
Five threat categories account for the majority of LLM security incidents in enterprise environments:
Prompt Injection
OWASP #1 · Two Consecutive Editions
The attacker injects malicious instructions in the text the LLM processes – in the user message (direct injection) or in the content the LLM retrieves (e.g. document, email) – and the LLM interprets the instruction as valid and follows it. Indirect injection is now the majority of the attacks we see (more than 55%), with 20–30% higher success rate than direct injection.
Real: EchoLeak (CVE-2025-32711) — hidden Markdown in email caused Copilot to
exfiltrate OneDrive files. Zero-click.
Data Exfiltration via LLM
OWASP #2 · Sensitive Info Disclosure
The LLM is prompted (via injection, jailbreaking or misconfiguration) to return data that it should not be sending e.g. data in context window, data from connected systems (RAG), and credentials or secrets leaked from the system prompt extraction. A Cyberhaven 2024 study found that 11% of data employees are pasting into ChatGPT is confidential.
Real: Samsung engineers leaked source code and meeting notes via ChatGPT — prompted
ChatGPT ban across the company.
Jailbreaking
OWASP #5 · Improper Output Handling
Inputs that are designed to circumvent the safety training and system prompt, and for which the model generates content or acts beyond the intended scope of operation. Jailbreak exploits that are known have 17% success rate in a controlled test, and >90% success in a multi-turn enterprise test. Jailbreaks have >70% success in 3 minutes in an enterprise penetration test.
Real: CrowdStrike documented prompt injection against 90+ organizations in 2026 threat
reporting.
Supply Chain Poisoning
OWASP #3 · LLM Supply Chain
Attack on the LLM's dependencies: training data, fine-tuning dataset, model weights, plugins, MCP servers, RAG knowledge bases. An attacked component can make the LLM always output the attacker-friendly answer for all users. PoisonedRAG (2024) has shown that it can attack with 97% success rate with 5 poisoned documents in a million-size knowledge base.
Real: Vercel/Context.ai breach (April 2026) — OAuth "Allow All" permissions on AI
productivity tool exposed corporate Google Workspace.
Insecure Output Handling
OWASP #5 · High Severity
The LLM output is sent to downstream systems (code interpreters, browsers, databases) without any validation. The output can contain instructions that the downstream system will run. Classic injection attacks (XSS, SQLi, command injection) are re-appearing when LLM output is used to build a database query, a system command or a web page without sanitization.
Real: Replit agent (July 2025) — LLM output interpreted as a database wipe instruction
during an active code freeze. 1,206 records destroyed.
Shadow AI
Governance Gap · Cross-cutting
Workers who use AI tools on company data that are not visible to IT, not governed by IT and not governed by data governance. 44% of workers in the US do not know if their company has an AI policy or if they have one (Gallup/Founder Reports, 2026). 78% of workers who use AI at work bring their own AI tools to work (Microsoft Work Trend Index 2025). Each unapproved tool is a risk of data exfiltration, no audit trail, no access control, no incident response.
Real: Multiple healthcare providers discovered patient data (PHI) in commercial AI
prompts after HIPAA compliance audits in 2025.
OWASP LLM Top 10 — The Standard Reference
OWASP Top 10 for LLM Applications is the most popular reference framework for LLM security. Updated for 2025 (2nd edition) from production vulnerabilities in the wild and maintained by the OWASP GenAI Security Project. The #1 critical vulnerability is prompt injection in both editions. Knowing the list is the starting point of any enterprise LLM security program.
RANK
Vulnerability
Enterprise Implication
LLM01
Prompt Injection
The LLM can process any content and it can contain instructions for the attacker. Every document, e-mail, web page, tool output is a vector of injection.
LLM02
Sensitive Info Disclosure
LLM produces system prompts, user data or retrieved documents that it should not send. Very risky when LLM has RAG access to sensitive knowledge bases.
LLM03
Supply Chain
A compromised training data, model weights, plugins or third party integrations allows the attacker to inject behavior at the model or infrastructure level.
LLM04
Data & Model Poisoning
The attacker modifies the training data or fine-tuning data of the LLM to inject long-term biases or backdoored behavior that will be activated by a trigger.
LLM05
Insecure Output Handling
The output of LLMs is sent directly to downstream systems (code execution, browsers, APIs) without validation. Classic injection attacks are resurfacing with the help of LLM generated content.
LLM06
Excessive Agency
LLM was given more permissions, tools or autonomy than needed for the task. Each extra permission is another blast radius in case another vulnerability is exploited.
LLM07
System Prompt Leakage
The attacker extracts the system prompt by the crafted queries and exposes the business logic, security controls, and the architecture which can be used for further attacks.
LLM confidently generates false information on which downstream decisions are based. Important in healthcare (clinical decisions), in law (case research) and in finance (investment advice).
LLM10
Unbounded Consumption
Denial of service by resource exhaustion — an input that results in the generation of too many tokens, API calls, or computations, and so the service becomes unavailable.
The Five Layers of LLM Security
LLM security is not a control or a category of products. It is five layers of an enterprise AI system and has its own threat surface and controls. A program that covers only one or two layers is vulnerable in the other.
The five LLM security layers — attack surface and required controls at each
Click
any layer to expand threats and controls
L1InputHighest attack volume
Prompt injection · Jailbreaking+
What's at risk
Every prompt and piece of content the LLM is fed is an injection vector. Indirect injection (instructions in retrieved documents, emails, web pages) now makes up over 55% of real-world attacks and has 20–30% higher success rates than direct injection.
Primary controls
Check the semantic input of the LLM before processing. Is the retrieved content adversarial or goal-hijacking? Check the session for multi-turn escalation.
L2ModelSupply chain risk
Data poisoning · System prompt leakage+
What's at risk
Model weights, training data, fine-tuning datasets and system prompts are all attack surface. A poisoned model or leaked system prompt breaks every downstream control. AI Incident Database recorded 346 incidents in 2025 – many of them are caused by this layer.
Primary controls
Check model integrity. Protect system prompt (secret, not documentation). Scan all model components, plugins and MCP servers in the supply chain before deployment.
L3OutputPrimary enforcement point
Data disclosure · Policy violations · PII leakage−
What's at risk
What the LLM returns – to users, to downstream systems, to tool calls. Sensitive data disclosure, policy violations and insecure content happen at the output layer. Cyberhaven: 11% of data employees paste into ChatGPT is confidential – the output layer is where that data is exposed externally.
Primary controls
Pre-transmission output inspection. PII redaction in-line. Response content policy enforcement. Content filtering. This is the main enforcement point of Polygraf – sub-100ms, on-premise, for LLM01, LLM02, and LLM05 in one pass.
L4DataRAG attack surface
Knowledge base poisoning · Embedding attacks+
What's at risk
RAG pipelines, vector databases and knowledge bases inject poisoned retrieval content into the LLM's context. PoisonedRAG (2024) had 97% success rate with 5 poisoned documents in a million-document knowledge base – this is the persistence layer that allows attacks to survive across sessions.
Primary controls
Track the provenance of every chunk retrieved. Check the retrieval time for instruction-like patterns. Tiered access: not every agent should read every knowledge base. Immutable audit trail for every write to a knowledge base.
When LLMs use tools (APIs, databases, code execution, file systems) the attack surface is every system the tools can reach. 70% of enterprise agents have more access than their human counterparts (Teleport, 2026). Replit: too much tool access + no output gate = 1,206 records deleted.
Primary controls
Tool allowlisting at the MCP gateway. Argument-level constraints per tool. Least privilege at task level, not team level. Purpose binding at execution layer. Unique agent identity and independent revocation path.
Real Incidents — 2024 to 2026
These are not security research demos. They are documented production incidents from enterprise environments – each one showing a specific failure on one or more of the five layers above.
Jan 2025 (disclosed Jun 2025)
EchoLeak — CVE-2025-32711
A crafted email prompt injection led Microsoft 365 Copilot to silently exfiltrate emails, OneDrive files and Teams chats without any user interaction. Layer 1 + Layer 3 failure.
Prompt Injection · LLM01
July 2025
Replit Agent Database Wipe
An AI coding agent deleted production database records for 1,206 executives in 1,196 companies during a live code freeze. The LLM output was used as an executable instruction without any validation gate. Layer 5 failure.
Insecure Output · Excessive Agency
September 2025
postmark-mcp Supply Chain
MCP official postmark server backdoored to silently BCC every processed email to an attacker address. ~300 organizations affected. ~1500 weekly downloads. Not detected for weeks. Layer 2 + Layer 5 failure.
Supply Chain · LLM03
August 2025
Salesloft-Drift OAuth Breach
Stolen OAuth tokens used to authenticate as a trusted AI integration in 700+ Salesforce environments. 10-day automated exfiltration of contacts, case records and credentials. Layer 5 + identity failure.
Excessive Agency · NHI Compromise
April 2026
Vercel / Context.ai OAuth Breach
Vercel employee gave AI productivity tool "Allow All" OAuth access to corporate Google Workspace. Leaked emails, documents and calendar. Classic over-permissioning with OAuth excessive agency. Layer 5 failure.
Excessive Agency · LLM06
Ongoing · 2024–2026
Samsung / Enterprise ChatGPT Data Leaks
Samsung engineers have leaked source code and internal meeting notes on ChatGPT. Cyberhaven: 11% of data employees copy into ChatGPT is confidential. Health care providers discovered PHI in commercial AI prompts during 2025 compliance audits. Layer 3 + Shadow AI.
Data Disclosure · Shadow AI
LLMs are not experimental tools in 2026. They are part of the core business systems and they are trusted with real data and real actions. This makes their failures much more dangerous than the bugs of traditional software.
— LLM Security Risks in 2026, Sombrainc · February 2026
Where to Start: A Practical Sequence
The five layer model can be difficult to manage when every layer has exposure. The order of practical implementation is by risk priority – start with the layer where most incidents are occurring and work your way out.
Week 1
Output Layer First
Layer 3
Inspect your LLMs' output before sending it out. One single highest coverage control – detects LLM02, LLM05 and policy violations from any attack source. Last line of defence, first to deploy.
Covers: LLM01 partial · LLM02 · LLM05 · LLM06
Month 1
Input and Action Layers
Layer 1Layer 5
Introduce semantic input inspection for indirect injection. Tool allowlisting and argument constraints for any LLM with tool use. Fixes the EchoLeak and Replit attack paths (the two most high-profile enterprise LLM failures in the last 12 months).
Covers: LLM01 · LLM06 · LLM03 partial
Month 2–3
Data and Model Layers
Layer
2Layer
4
Implement RAG provenance and retrieval time inspection. Supply chain audit all models, plugins and MCP servers. Add shadow AI discovery to inventory program. PoisonedRAG-class attacks.
Covers: LLM03 · LLM04 · LLM08 · LLM10
Ongoing
Monitoring and Governance
All
Layers
Agent identity context structured audit logs. Behavioral baselines for each LLM deployment. Quarterly review of tool permission policy. Mapping of OWASP LLM Top 10 and NIST AI RMF annually.
Covers: LLM07 · LLM09 · Compliance posture
The Polygraf Approach
Polygraf's Behavioral Control Plane is on Layers 1 and 3 at the same time: it looks at inputs before the LLM processes them and at outputs before they are sent, it enforces policy in real time with sub-100ms latency and it has full audit logging of every interaction. It covers the highest volume attack classes (LLM01, LLM02, LLM05, LLM06) in a single deployment, on-premise, without any data leaving your security perimeter.
Polygraf AI
LLM Security Across All Five Layers
Polygraf's Behavioral Control Plane is policy enforcement at the LLM input and output edge, scanning every prompt and response in real time, blocking injection, redacting context-sensitive data, tool policy enforcement and logging every interaction with full audit context. Sub-100ms. On-premise. No data leaves your environment.
Read Polygraf AI's plain-English guide to LLM security for enterprise teams to understand why securing an LLM is a must have for any organization who cares about their privacy.
Tool poisoning hides malicious instructions inside MCP server descriptions that AI agents execute silently, succeeding over 60% of the time. Here’s how the attack works and what stops it.
To learn more about Polygraf, please get in touch.
At Polygraf, we envision a future where AI augments human capabilities without compromising safety, privacy, or ethical standards. Trust in our commitment to building this future with you.