Threat Labs

What Security Leaders Need to Know Before Deploying AI Agents

By

Barak Sternberg

,

Nevo Poran

May 20, 2026

10

min read

AI agents are being deployed faster than the security controls to manage them. Three questions every security leader should be able to answer before any agent ships: what it can do, what controls exist if it misbehaves, and how you'll know when something goes wrong.

Table of contents

Toc Link

Only 34.7% of organizations have implemented dedicated monitoring and abuse detection for their AI agents. The rest are running production agents with no signal that would tell them something has gone wrong (According to a 2025 industry survey)

At a Glance

AI agents are being deployed faster than the security infrastructure to manage them. Most security teams are catching up to an attack surface that's already in production. This post covers three questions every security leader should be able to answer before any agent ships.

What this blog covers:

What your agents can actually do, and why the answer is rarely what you think
What controls matter if an agent misbehaves, and which ones actually work
How you'll know if something goes wrong, and what makes it hard to catch
What to do before the next agent ships

‍

Agents in Production, Security Not

The conversation around AI agent security has largely focused on the sophistication of attacks: prompt injection, tool abuse, multi-agent exploitation. Those threats are real. But in most enterprise environments, the more immediate problem is simpler: agents are in production before the basic security infrastructure to manage them exists.

In a recent Tenet Threat Labs assessment, over 20 agent tools in a production environment had no visibility or activity tracking. They were running, calling APIs, and processing customer data without appearing in any asset inventory. This is a pattern Tenet sees consistently: agent deployment outpacing the processes designed to track it.

‍

Three Questions Security Leaders Should Be Able to Answer

Before any agent goes into production, and as an ongoing audit of agents already running, security leaders need clear answers to three questions. Most organizations can't answer all three. Many can't answer any of them.

‍

Question 1

What can this agent actually do?

This sounds obvious. It rarely has a clean answer.

AI agents accumulate capabilities over time. An agent deployed to summarize customer emails gets calendar write access added three months later. A coding agent gets connected to a deployment pipeline. Permissions granted for one use case persist as the agent's scope expands. By the time a security team reviews the agent's access, the original scope is unrecognizable.

In the Tenet Threat Labs assessment referenced above, the same customer environment that had 20+ undiscovered agent tools also had agents with tool permissions that far exceeded what their defined function required. None of those permissions had been reviewed since initial deployment.

Knowing what an agent can do is a starting point, not an answer. Permissions, scopes, and access policies reflect intent at deployment time. They don't predict behavior at runtime. Non-deterministic agents can drift, be manipulated and hijacked, or take actions their designers didn't anticipate within their permitted scope. Runtime observability, which tracks behavior as it happens, correlates signals across layers, and normalizes sessions, is what closes the distance between what an agent is allowed to do and what it is actually doing.

‍

Question 2

What controls are in place if it misbehaves?

An agent can misbehave for several reasons: a direct attack, an indirect prompt injection via data it reads, a model update that changes its behavior, or simply an edge case its designers didn't anticipate. The controls that reduce risk across all of these are the same.

Input-to-action path control. Every piece of data an agent reads is a potential instruction vector. An email, a database record, an API response can all contain adversarial content that steers the agent's reasoning. Treating data the agent reads as untrusted, separate from the instructions it was given by its operator, is an architectural decision, not a configuration setting. In practice, this distinction is rarely enforced at the architectural level.

Kill switch. Agents will go off the rails. A model update shifts behavior, a malicious input redirects reasoning, an edge case surfaces that no one anticipated. The question is whether you have an automatic red button when it happens: a mechanism that detects out-of-bounds behavior and halts execution before damage is done, without requiring a human to notice first. Most approval workflows don't fill this role. They gate specific actions but can't catch manipulation that happens upstream of any approval trigger. In 2025, OpenAI's internal red-teaming demonstrated exactly this: an agent instructed to draft an out-of-office reply encountered a malicious email and sent a resignation letter to the user's CEO instead. No approval gate caught it. The agent's reasoning had already been redirected before any approval was triggered.

Output filtering. What the agent says matters as much as what it does. Agents with access to sensitive data like PII, financial records, and internal system configurations need output controls that prevent that data from surfacing in responses, tool calls, or downstream actions. A Tenet Threat Labs assessment identified three distinct sensitive data exfiltration flows that were structurally possible through the agent's normal operation, flows that hadn't been flagged in the original threat model.

‍

Question 3

How will you know if something goes wrong?

This is the question most organizations cannot answer. Not because the answer is technically difficult, but because no one has defined what "something going wrong" looks like for an AI agent.

Traditional security monitoring captures part of this picture. Network anomalies, process behavior, and access pattern deviations are real signals that exist and matter. But a meaningful share of agent-specific incidents happen at a layer those tools don't reach: inside the LLM's reasoning and agent-tool layer. An agent manipulated via indirect prompt injection is still running as the same authorized process, calling the same authorized tools. The anomaly isn't in its traffic or its access log. It's in its reasoning chain.

Effective observability for agentic systems requires correlated visibility across all three layers where incidents actually occur: the LLM layer, where reasoning and intent are formed, the OS, network, and API/MCP layer, where actions execute, and the agentic application layer, where sessions and tool chains are orchestrated. Seeing each layer individually isn't enough. The critical signal comes from correlating events across all three into a single, coherent agentic session view: what the agent reasoned, what it executed, and how those connect. Most security tools cover at most one or two of these without any correlation. Incidents that happen in the layers they don't reach go undetected. That's not a monitoring gap in the traditional sense. It's a visibility architecture problem specific to agentic systems.

When a Tenet Threat Labs team assessed a production customer environment with multiple AI Agents communicating between tools and each other, they identified and contained dozens of agentic security incidents in live environments (For example, Cross-User Data Exfiltration and data exfiltration via malicious MCP Server). Prior to the assessment, the customer had no existing process to detect, triage, or respond to agent-specific incidents. Every one of those incidents would have gone undetected without external monitoring.

Closing that visibility requires two distinct data streams: a log of every tool call an agent makes, with inputs, outputs, and triggering context, and a record of the LLM calls that drove those decisions: the reasoning chain, model inputs, and intent at each step. These are different data sources that tell different stories. Tool call logs show what the agent did. LLM call logs show why. Without both, you can reconstruct actions after the fact but not the reasoning behind them, and reasoning is where manipulation happens.

‍

Before the Next Agent Ships

The three questions above aren't a comprehensive security framework. They're a floor: the minimum a security leader should be able to answer before signing off on any agent going into production.

In practice, most organizations can't answer all three. The reasons are consistent: agents are deployed by product and engineering teams before security is involved, permissions are scoped for functionality rather than minimum access, and monitoring infrastructure for agent-specific behavior doesn't exist.

These steps build the foundation, but they don't replace continuous runtime visibility:

Establish visibility into your agent footprint: what exists, what it can do, and what it's doing at the moment of execution.
Assess blast radius before deployment. If this agent is compromised or drifts, what can it touch, modify, or exfiltrate? Define the mitigating controls before the agent ships, not after an incident.
Scope capabilities to their minimum functional footprint, and account for effects on connected systems and environments. Every capability beyond what the agent needs is a potential attack surface.
Build monitoring coverage across all agentic layers: LLM reasoning, the agent action layer, and the application session, and correlate them into a unified session view. Gaps in any one layer, or a lack of cross-layer correlation, means incidents go unseen.
Define a go/no-go threshold for new agent deployments: a minimum security bar before production, not after.

The agents are already running. The question is whether the security infrastructure around them is keeping pace.

‍

Get visibility into your agentic layer

‍

Barak Sternberg

CEO

Nevo Poran

CTO