Skip to content
Agent Month

How to fix: prompt injection / agent following malicious instructions

Cause

Content the agent ingests (a web page, a document, a ticket) contains instructions that hijack its behavior.

The fix

  1. 1Treat anything the agent reads from outside as untrusted, adversarial input.
  2. 2Constrain tool permissions to the minimum the task needs — no standing access to destructive actions.
  3. 3Route internal system access through audited MCP servers with scoped permissions and logging.
  4. 4Add human-in-the-loop confirmation for high-impact or irreversible actions.
  5. 5Monitor and log tool calls so injection attempts are visible after the fact.

Prevent it

Design agents with least-privilege tools, audited access, and human gates on risky actions so injection can’t cause real damage.

Frequently asked questions

What causes “prompt injection / agent following malicious instructions”?

Content the agent ingests (a web page, a document, a ticket) contains instructions that hijack its behavior.

How do I prevent “prompt injection / agent following malicious instructions” from recurring?

Design agents with least-privilege tools, audited access, and human gates on risky actions so injection can’t cause real damage.