How to Secure AI Agents
in Production

80% of enterprise security stacks are entirely unprepared to detect what AI agents can do in production. The 2026 data is unambiguous — and so is the playbook for closing the gap.

Based on 8 independent 2026 surveys · 2,300+ security leaders

By the numbers — 2026

88%

of orgs had confirmed or suspected AI agent security incidents

Gravitee, n=900+

average deployed agents per enterprise, most without security review

Gravitee 2026

92%

of CISOs lack full visibility into their AI agent identities

CSA / CISO AI Risk 2026, n=235

of enterprises have automated machine-speed controls governing agent behavior

Teleport, n=205

AI agents are not a pilot program anymore.80.9% of technical teams have moved past planning into testing or production. The average enterprise has 37 deployed agents. Most of them have been brought up by individual teams and not vetted by central security. Over half of them do not have any security monitoring or logging.

The security industry had a debate in 2024 if AI agents are a real enterprise risk. 2026 data has resolved the debate. It has not resolved how to actually secure them. This playbook combines the most recent research (OWASP Agentic Top 10, Gravitee and Teleport production surveys and documented 2025–2026 incidents) into a practical six-control framework that you can use today.

The Confidence Paradox

The most dangerous dynamic in enterprise AI security today is identified by Gravitee 2026: 82% of executives are confident that their current policies are enough to prevent agents from doing anything they are not supposed to do; 88% of companies have already had incidents that their policies did not stop. Executives have no relationship with reality.

Why Agents Break Traditional Security

Every security control an enterprise has built in the last 20 years assumes that a human is doing something and the system is executing it. SIEM, DLP, IAM, WAF all are designed to sit at the boundary of human intent and system execution.

AI agents break that model in a fundamental way. An agent plans, decides and acts across multiple systems in series with no human in the loop for a decision. It calls APIs, writes to databases, sends communications, and triggers workflows as a continuous autonomous process. The attack surface is not at one boundary but is distributed across every tool call the agent makes.

Where the security perimeter actually sits: traditional vs. agentic

The OWASP Agentic Top 10 — Published December 2025

OWASP published the Top 10 for Agentic Applications in December 2025 after peer review by more than 100 security researchers. The Agentic Top 10 is different from the LLM Top 10 and deals with risks of language models. The Agentic Top 10 is aimed at autonomous systems that plan, use tools, remember and communicate with other agents. Each of the vulnerabilities is prefixed with the ASI (Agentic Security Issue) prefix.

Code	Risk	Production example	Severity
ASI01	Goal hijacking	The agent's whole action sequence is hijacked by the attacker and this is done by writing instructions into the documents which are processed by the agent.	Critical
ASI02	Tool misuse	Summarization agent with database read access begins exfiltrating records outside its documented task scope.	Critical
ASI03	Identity and privilege abuse	70% of enterprise agents have more access than equivalent human roles (Teleport 2026). Agent inherits team permissions, not task permissions.	Critical
ASI04	Agentic supply chain	postmark-mcp package silently BCC'd every processed email to an attacker before removal — 1,643 downloads affected (September 2025).	Critical
ASI05	Unexpected code execution	CVE-2026-25253 (OpenClaw, CVSS 8.8): 341 malicious skills in the agent marketplace installed keyloggers across enterprise deployments.	Critical
ASI06	Memory poisoning	Hidden prompts stored false information triggered by future keywords. Google Gemini memory attack (Feb 2025): 73% of tested scenarios rated High to Critical. RAGPoison (Snyk, Aug 2025): 80%+ attack success rate at under 0.1% poison rate.	High
ASI07	Insecure inter-agent comms	Palo Alto Unit 42 "Agent Session Smuggling" (Nov 2025): rogue agents exploit built-in trust relationships in A2A protocol across multi-turn conversations.	High
ASI08	Cascading failures	Compromised vendor-check agent misdirects entire multi-agent procurement workflow (ServiceNow Now Assist, documented 2025).	High
ASI09	Human-agent trust exploitation	Persistent agents build false trust over multiple sessions before executing harmful action — invisible to single-session monitoring.	High
ASI10	Rogue agents	Agent who is outside of the allowed space and looks like a real one – is authenticated and works at machine speed (CyberArk 2026).	Medium

What's Already Happened in Production

These are not research demos in a sandbox. These are documented production incidents from 2025 and 2026 – the incidents the OWASP framework was built to solve.

postmark-mcp Supply Chain — Silent Email Exfiltration ASI04 September 2025

A malicious MCP server package silently BCC'd every email it processed to an address in the attacker's control. The package was presented as a legitimate Postmark integration. Individual BCC's did not cause any error state and therefore no anomaly detection was triggered. 1,643 downloads before removal. This is the canonical example of why the agent supply chain needs to be verified at install time and not after the incident.

1,643 enterprise installations affected. Every processed email silently copied to attacker.

Palo Alto Unit 42 — Agent Session Smuggling via A2A Protocol ASI07 November 2025

Unit 42 showed Agent Session Smuggling, where a malicious agent abuses the built-in trust of the Agent-to-Agent (A2A) protocol. In the rogue agent multi-turn conversation is ongoing, the agent changes its strategy and builds false trust and then attacks. The agents which are supposed to trust the collaborating agents by default are abused for an entire session. The ServiceNow Now Assist multi-agent procurement workflow was reported as a use case where a compromised vendor-check agent abused an entire cluster of the workflow.

Entire agent clusters redirectable via trust exploitation in inter-agent communication protocols.

Moltbook Platform — 506 Prompt Injections Spreading Across Agent Network ASI01 · ASI07 January–March 2026

The AI agent social network had 1.5 million autonomous AI agents that were run by 17k human operators. Unprotected database allowed any user to take over any agent in the network. 404 Media researchers found 506 prompt injections that were spreading on the agent network before being patched. Meta bought the platform on March 10, 2026. This shows at scale what happens when there are no authentication controls for inter-agent communication – injections don't stay contained, they spread.

506 injections across live network of 1.5M agents. All agents potentially compromised before discovery.

OpenClaw — 341 Malicious Skills in Agent Marketplace CVE-2026-25253 · CVSS 8.8 Disclosed February 3, 2026

OpenClaw reached 180,000+ GitHub stars in weeks. CVE-2026-25253 enabled one-click RCE through the agent skill marketplace. 341 malicious skills — 12% of the ClawHub marketplace — were confirmed installing keyloggers on enterprise deployments before the patch (patched January 30, 2026). The speed of viral adoption created a window where thousands of enterprises deployed a framework before its supply chain could be audited. Incident pattern: viral agent frameworks adopted faster than they can be security-reviewed.

Keyloggers active across enterprise deployments. Full credential and keystroke logging by attackers.

OpenAI Plugin Ecosystem — 47 Enterprises, 6-Month Dwell Time ASI04 · ASI03 2025–2026

Compromised agent credentials were harvested from 47 enterprise deployments through a supply chain attack on the OpenAI plugin ecosystem. Attackers accessed customer data, financial records, and proprietary source code. The breach remained active for six months before discovery — the characteristic detection delay of agent-based attacks, where individual actions appear legitimate and no single event crosses an alert threshold. Six-month dwell time is the direct consequence of having logs without decision-chain context.

6-month dwell time. Customer data, financial records, proprietary code exfiltrated across 47 enterprises.

"A runaway agent in 2026 won't look dramatic. It will appear legitimate, authenticate successfully, and act quickly. By the time a human notices something's wrong, the damage is already distributed across multiple systems."

— CyberArk, 2026 AI Security Predictions

The Six-Control Playbook

Gravitee and Kiteworks research converge on the same structural finding: the governance-containment gap is the primary AI agent security failure. 58–59% of organizations report monitoring and human oversight. Only 37–40% have containment controls — purpose binding and the ability to terminate a misbehaving agent. The six controls below close this gap.

Assign every agent a unique machine identity

PREREQUISITE FOR ALL OTHER CONTROLS

22%

currently do this

Only 22% of organizations treat AI agents as independent identities. The remaining 78% use shared service accounts or generic API keys — making attribution of any action impossible and revocation of a single agent impractical without disrupting multiple systems.

Each agent needs its own identity: a dedicated credential, a defined owner, an access scope, and a revocation path that doesn't cascade to other agents. Without this, your logs tell you a service account did something — not which of your 37 agents did it.

Source: Gravitee State of AI Agent Security 2026, n=900+

Implementation

Dedicated service account per agent. Unique API keys or certificates. Stored in secrets manager — never hardcoded. Rotate on schedule and immediately on suspected compromise. Huntress 2026: NHI (non-human identity) compromise is the fastest-growing enterprise attack vector.

Enforce least privilege at the task level, not the team level

HIGHEST ROI SINGLE CONTROL AVAILABLE

17%
vs
76%

incident rate
with vs. without

Teleport's 2026 survey found 70% of enterprise agents have more access than equivalent human roles. Organizations enforcing least-privilege access report a 17% incident rate. Those without it report a 76% incident rate. This is the largest measurable risk reduction from any tracked control.

Least privilege for agents is more granular than for humans. A summarization agent needs read access to one document store — it should not inherit the write permissions of the engineering team that deployed it.

Source: Teleport State of AI in Enterprise Infrastructure Security 2026, n=205

Implementation

Audit every agent's permissions against its documented task. Remove anything not explicitly required. For MCP-connected agents: scope each MCP server's tool exposure to the agent's exact function. Review quarterly as tasks evolve.

Implement inline input/output inspection at the tool boundary

PREVENTS ASI01, ASI02 — THE TOP TWO OWASP RISKS

<100ms

required latency

The Moltbook incident (506 injections) and all goal-hijacking attacks share the same root cause: agents acting on attacker-controlled content without inspection at the tool boundary. Inspection means evaluating every tool call input and every tool response before the agent acts on it.

This requires an inline layer — not post-hoc log review. If inspection adds 500ms per tool call, agentic workflows become unusable. Purpose-built SLMs running locally achieve sub-100ms enforcement without routing traffic outside your security perimeter.

Context: OWASP ASI01 and ASI02 both prevented by inline inspection at the tool call layer

Implementation

Inspection layer at agent-tool boundary. Evaluate: embedded instructions in input? Tool response overriding agent goals? Output containing data the agent shouldn't transmit? Latency target: sub-100ms. CPU-only enforcement — no GPU required for SLM-based inspection.

Build and maintain a complete agent inventory

PREREQUISITE FOR CONTROLS 5 AND 6

24%

have inter-agent visibility

Only 24.4% of organizations have full visibility into which agents are communicating with each other. More than half run without security oversight or logging. Shadow AI agent incidents cost an average of $670,000 more than standard incidents, driven by delayed detection and difficulty scoping the exposure.

You cannot govern what you cannot see — and right now, most security teams cannot see most of their agents. The inventory requirement isn't just hygiene: it's the prerequisite for every other control.

Source: Gravitee 2026; $670K shadow AI cost delta from AGAT Software analysis

Implementation

Mandatory registration for all agent deployments: agent name, owner team, task scope, tools accessed, permissions held, last review date. Monthly network scanning to surface unregistered agents. Any gap between registered and detected = unmanaged risk.

Implement structured audit logging with full decision-chain context

REQUIRED FOR INCIDENT RESPONSE AND COMPLIANCE

6mo

dwell time without logs

The six-month dwell time in the OpenAI plugin ecosystem breach was possible because there were no structured audit logs to detect anomalous agent behavior. Individual actions appeared legitimate. No single event triggered an alert. The pattern was only visible in aggregate — and without logs, the aggregate couldn't be reconstructed.

Agent audit logs differ from standard application logs: they must capture the decision context, not just the action. A log entry showing "agent wrote to database" is forensically useless. The session ID, preceding tool calls, context window state, and policy evaluation result are what IR actually needs.

OWASP: "incident response on an agent is forensics in the dark" without structured decision-chain logs

Implementation

Log: every tool call with inputs, every resource access, every goal assignment, every policy evaluation. Structured JSON, session ID linking all actions in a task chain. Retention: 90 days minimum for regulated environments. Alert triggers: actions outside declared hours, unexpected cross-agent communication, permission escalation attempts.

Build and test a kill switch before you need it

THE CONTAINMENT GAP — 60% CAN'T DO THIS TODAY

60%

cannot terminate a misbehaving agent

60% of the companies cannot kill a misbehaving agent once it starts working (Kiteworks 2026). In CyberArk's terms it's very clear: identity is the kill switch. An agent has an identity and a revocation path. Revoking that identity kills the agent. If the agents share credentials, revocation is collateral damage.

Test this quarterly. Pick one production agent. Revoke its credentials. Confirm it stops. Confirm no adjacent system is affected. Restore access. If you cannot complete this test cleanly, the architecture needs to change before an incident forces it.

Source: Kiteworks 2026 Data Security and Compliance Risk Forecast, n=225

Implementation

Quarterly kill-switch test: select one production agent, revoke credentials, verify clean stop, verify no cascade, restore. Target: terminate any agent in under 5 minutes with zero cascading system impact. Organizations that share credentials fail this test every time.

Where Most Enterprises Actually Stand

NeuralTrust's maturity model from their 2026 survey of 160+ CISOs places 46% of organizations in the Reactive tier (respond after incidents), 29% in Managed (basic monitoring, no containment), and fewer than 10% in Proactive governance. Understanding your tier tells you which control to implement first.

Control implementation by maturity tier

Sequencing matters

The six controls are interdependent in one direction: inventory enables identity, identity enables least privilege, least privilege scopes inspection, and all four make logging and the kill switch meaningful. Organizations that start with monitoring before establishing inventory are measuring the wrong thing. Start with the agent registry.

Three Actions This Week

Pull a complete list of every deployed agent. Owner, credentials used, whether those credentials are shared with any other system. One hour per team. Will surface agents your security team has never reviewed.

Audit one agent's permissions end to end. Document what it actually needs vs. what it currently has. Any permission beyond the documented task is a misconfiguration — not a configuration choice.

Test your kill switch on one non-critical agent. Revoke credentials. Confirm it stops. Confirm nothing else breaks. If you can't pass this test, you don't have containment — you have the illusion of it. The 60% who can't do this are one incident away from finding out the hard way.

Polygraf AI

Enforce Agent Policy at the I/O Layer

Polygraf's Behavioral Control Plane intercepts and controls every AI interaction inline — enforcing organizational policy on input and output, across user-facing and agentic AI, with zero data leaving your environment. Runs on existing infrastructure at sub-100ms latency. No GPU required.

Book a Demo →

Air-gap ready · HIPAA · SOC 2
Deploys in under an hour

NEWS & More

Insights & Updates from Polygraf.

Blog Posts

How to Run an AI Risk Assessment: A Framework for Security Teams

You can't manage AI risk you haven't measured. Polygraf's AI risk assessment framework helps security teams identify exposure, quantify likelihood, and manage all AI risks.

Blog Posts

AI Audit Trails: What You Need to Log and Why Regulators Are Checking

Regulators are asking for AI audit trails enterprises can't produce. Polygraf AI explains what to log and how long to retain it.

Blog Posts

AI Security Checklist for Enterprise Teams: 25 Controls to Implement Now

Most enterprises have deployed AI but skipped the security controls. Polygraf's 25-point AI security checklist maps the controls every enterprise needs to know.

To learn more about Polygraf, please get in touch.

At Polygraf, we envision a future where AI augments human capabilities without compromising safety, privacy, or ethical standards. Trust in our commitment to building this future with you.

How to Secure AI Agents
in Production

Why Agents Break Traditional Security

The OWASP Agentic Top 10 — Published December 2025

What's Already Happened in Production

The Six-Control Playbook

Where Most Enterprises Actually Stand

Three Actions This Week

Enforce Agent Policy at the I/O Layer

NEWS & More

Insights & Updates from Polygraf.

Blog Posts

How to Run an AI Risk Assessment: A Framework for Security Teams

Blog Posts

AI Audit Trails: What You Need to Log and Why Regulators Are Checking

Blog Posts

AI Security Checklist for Enterprise Teams: 25 Controls to Implement Now

To learn more about Polygraf, please get in touch.

Data Privacy

Data Provenance

Developers

Company

How to Secure AI Agentsin Production

Why Agents Break Traditional Security

The OWASP Agentic Top 10 — Published December 2025

What's Already Happened in Production

The Six-Control Playbook

Where Most Enterprises Actually Stand

Three Actions This Week

Enforce Agent Policy at the I/O Layer

NEWS & More

Insights & Updates from Polygraf.

Blog Posts

How to Run an AI Risk Assessment: A Framework for Security Teams

Blog Posts

AI Audit Trails: What You Need to Log and Why Regulators Are Checking

Blog Posts

AI Security Checklist for Enterprise Teams: 25 Controls to Implement Now

To learn more about Polygraf, please get in touch.

Data Privacy

Data Provenance

Developers

Company

thank you

Thank you!

How to Secure AI Agents
in Production