Threat Labs

What is Agentjacking?

Ron Bobrov

Barak Sternberg

Nevo Poran

March 2, 2026

min read

Enterprise AI has moved from simple chat to autonomous agents that query DBs and execute code. This power brings a new threat: Agentjacking. These multi-stage attacks subvert an agent's reasoning, turning its autonomy into a weapon against your infrastructure.

Enterprise AI has evolved. We have moved past simple chat interfaces to autonomous agents with the power to query databases, call APIs, and execute code. But this autonomy comes with a trade-off: a new breed of sophisticated, multi-stage attacks that turn an agent's reasoning against itself.

At a Glance

TL;DR: Agentjacking, the takeover of AI agents for advanced exploits and attack chains, fundamentally changes the rules of cybersecurity. Here is how it works and why legacy defenses are no longer sufficient.

The shift toward agentic AI introduces a "Privilege Paradox": the more access you give an agent to make it useful, the more dangerous it becomes if compromised.

Defining the Term: Agentjacking occurs when an attacker hijacks an agent’s logic, using a chain of prompt injection, RAG poisoning, and tool abuse, to force it to perform unauthorized actions.
The Privilege Paradox: An agent’s capabilities are its greatest asset, but they also represent its biggest vulnerability.
The Visibility Gap: Legacy security layers lack the semantic context to distinguish between a valid complex instruction and a hijacked one.
The Reasoning Shift: Securing agents requires a fundamental move away from static, infrastructure-level controls toward the Reasoning Layer, where the "black box" of AI intent and non-deterministic behavior actually resides.

Anatomy of an Attack: The Agentjacking Killchain

In the security community, vulnerabilities like Prompt Injection or RAG Poisoning are often discussed in isolation. However, these are merely the tactical components used to achieve Agentjacking, the total hijacking of the agent’s logic.

Agentjacking is the result of a coordinated multi-stage compromise:

The Entry Point (Logic Manipulation): The attacker bypasses system directives using a variety of techniques, including goal manipulation, jailbreaking, or adversarial prompts. By overriding the agent’s core logic, the attacker replaces its intended business mission with a hostile set of instructions.
The Logical Override (Context Poisoning): By poisoning the agent’s active context window, the attacker effectively “reprograms” the agent’s mind. This can be achieved through multiple vectors, including RAG pipelines, tool-output poisoning, or compromised session history. Once the context is poisoned, the agent accepts malicious data as “truth,” losing the ability to distinguish between authorized and adversarial instructions.
The Execution (Chained Tool Abuse): The hijacked agent acts as an insider with high-level permissions to initiate chained actions, executing a sequence of unauthorized tasks across multiple internal systems. This goes far beyond simple data exfiltration, the agent can be forced to execute bash commands, trigger ransomware payloads, or move laterally through cloud infrastructure. It performs these complex, multi-stage operations at machine speed while appearing as a legitimate, authorized user.

The Privilege Paradox

Modern agents create a "Privilege Paradox." To provide value, they require deep access to customer CRMs, internal documentation, workflow triggers, and more.

Unlike a human employee, an agent lacks a proper moral compass, it may follow the most dominant instruction in its context window. A hijacked agent becomes a high-privileged "insider" that moves through your systems at machine speed, bypassing traditional identity checks because the agent itself is the authorized user.

The Visibility Gap: Why Legacy Controls Fail

Enterprise security has spent decades perfecting Signature-Based Detection and Fixed-Rule Logic. Agentjacking exposes the architectural limitations of these traditional stacks, which were never built for the "black box" of AI reasoning.

The Semantic Blindspot: Traditional WAFs look for malicious syntax (code). Agentjacking happens at the Reasoning Layer, where hijacked instructions look exactly like legitimate business queries. Without a signature to match, legacy tools approve these actions by default.
The Agentic Layer Gap: Legacy tools like DLP assume data is structured and follows a set path. In the agentic layer, data is vast and unstructured, moving through hand-offs between different agent "identities" that legacy systems cannot see. To a security tool, an entire multi-agent swarm looks like a single, authorized process.
The Non-Deterministic Challenge: Unlike traditional software, agents are highly non-deterministic. They don't follow a fixed code flow that can be scanned statically before deployment. An agent can shift its logic mid-session based on a single prompt, bypassing the static definitions and heuristics used by EDRs.
The Identity & Visibility Crisis: Over 90% of agents lack a unique machine identity. While legacy security scans identities and applications before they run, agentic behavior is a "black box" until runtime. You cannot statically scan an agent's intent, you can only secure it by monitoring the actions it executes in real-time.

Real-World Scenarios

1. The "Weaponized Assistant" Hijack (Tenet Research)

In this internal assessment, researchers targeted an autonomous CRM agent designed to act as a digital assistant, summarizing emails, managing customer interactions, and scheduling calls.

The Vulnerability: An attacker sent a spoofed email disguised as a legitimate "compliance request." When the user asked the assistant for a simple summary of their recent inbox, the agent encountered hidden instructions embedded in the message.
The Attack: The agent automatically executed the hidden directives, which commanded it to scan the user's recent email history, extract all credentials and confidential interaction logs, and sync that data directly into a calendar event owned by the attacker. By trusting the content of the email over the safety of the action, the agent functioned as an unwitting insider for the threat actor.

2. The "Clawdbot" Infrastructure Takeover (January 2026)

One of the first major public examples of systemic agent subversion involved Clawdbot (now renamed OpenClaw). Researchers discovered that the bot’s architecture allowed for an "unauthenticated 1-click RCE," where a single malicious link could hijack the agent's identity.

The Vulnerability: Over 1,200 instances were found exposed via Shodan. Attackers exploited CVE-2026-25253 to steal authentication tokens and execute arbitrary shell commands (RCE) on the host machine.
The Attack: Because the agent lacked Runtime Agent Defense, it was turned from a helpful "hands-on" assistant into an unauthenticated backdoor for lateral movement within corporate networks.
Source: CISA Vulnerability Summary SB26-040

3. "EchoLeak": Zero-Click Exfiltration in Microsoft 365 (June 2025)

The EchoLeak vulnerability (CVE-2025-32711) stands as a landmark case of a "zero-click" attack against a production enterprise agent. It allowed a remote attacker to steal confidential data simply by sending a crafted email.

The Vulnerability: By "spraying" a malicious payload into a victim's inbox, an attacker could poison the agent's RAG context. When Copilot retrieved the email to answer a routine user query, it followed hidden instructions to exfiltrate chat history and OneDrive files.
The Attack: The attack relied on a Semantic Bypass, using clever Markdown formatting to trick the agent into embedding stolen data into image URLs. These were automatically fetched by the client interface, leaking data silently without any user clicks.
Source: SOC Prime: EchoLeak CVE-2025-32711 Analysis

The Strategic Pivot: Shifting to the Reasoning Layer

Traditional security operates at the Infrastructure Layer, securing the GPUs, the networks, and the APIs that AI runs on. But Agentjacking occurs at the Reasoning Layer.

Because this layer is essentially a "black box" until the moment of execution, it introduces a set of challenges that cannot be solved by simply hardening the perimeter. The transition to agentic AI requires a fundamental shift in how we evaluate three core areas of risk:

Intent vs. Syntax: Legacy tools (like WAFs) are designed to catch malicious syntax, code patterns that "look" like an attack. In an agentic world, the syntax is natural language, it looks like a normal business request. The actual risk lies in the Intent behind that language. Security now faces the impossible task of differentiating between a legitimate user goal and a hijacked mission hidden in plain sight.
The Identity Paradox: Most agents operate without a unique, verifiable machine identity. They exist as "faceless" actors that inherit the permissions of whoever triggered them. This makes static identity management (IAM) largely irrelevant. You cannot secure a "who" that traditional systems can’t identify, you must instead find a way to verify the "what", the specific sequence of actions being performed.
Non-Deterministic Flow: Unlike traditional applications with fixed code paths, an agent’s behavior is written on the fly. You cannot statically scan an agent for vulnerabilities because its next action depends entirely on the unstructured data it encounters at runtime, meaning the "code" is essentially being rewritten in every session.

Move Fast, Stay Secure

You shouldn't have to choose between deploying autonomous agents and maintaining your security posture.

Tenet Security provides the architectural foundations necessary to protect the reasoning layer, enabling your team to innovate and deploy at scale without the risk of functional takeover.

Get an Agentic Risk Assessment