back to blog
GuideAI Security

What Prompt Injection Looks Like When the AI Has Tools

valkant/May 2026

Prompt injection grew 540 percent in the last twelve months according to industry telemetry, and the number is going to keep climbing. The reason is not that the attack got more clever. The reason is that companies have spent two years gluing language models into production workflows with access to tools, databases, customer data, and outbound APIs. Every one of those integrations multiplies the blast radius of a single bad prompt.

The version of prompt injection that everyone learned first is the toy version. Type something into a chatbot that overrides its system prompt and watch it say something it should not. That class still matters for brand-safety reasons, but it is not where the real risk lives. The real risk lives in apps that retrieve content from somewhere the user does not control directly. Emails that get summarised. Documents that get analysed. Tickets that get routed. Webpages that get fetched by a research agent. Every one of those inputs is an attack surface and the attacker is not the user. The attacker is whoever authored the content the model is about to read.

Once a tool-using agent reads attacker-controlled content, the injection no longer needs to bypass anything. The model is going to follow whatever instructions it finds because that is what it is optimised to do. If the agent has access to a send-email tool, you have a phishing engine that runs inside the victim's tenant. If it has access to a database query tool, you have an exfiltration primitive. If it can browse the web with the user's session, you have a confused deputy that does authenticated requests on behalf of the attacker.

The system prompt is also not the secret you think it is. We routinely extract system prompts from production assistants. They contain the names of internal tools, the structure of the agent's decision tree, the names of customer-segment flags, and occasionally references to API endpoints the assistant talks to. Knowing the system prompt is reconnaissance for the rest of the attack chain. The value of the extraction is not the prompt itself. It is everything you learn from it.

When we test LLM applications, we look at four things. What sources of content does the model trust implicitly. What tools does it have access to and what authority do those tools carry. What output filtering sits between the model and the user, and what filtering sits between the model and downstream systems. And finally, what does the model leak about itself when you ask it the right questions in the wrong order. Almost every production deployment falls down on at least one of those four. Most fall down on two or three.

There is no purely technical fix for prompt injection. Guardrails help. Output validation helps. Constrained tool definitions help. None of them eliminate the underlying property that the model cannot reliably distinguish data from instructions when both arrive through the same channel. The practical answer is to treat every LLM call the way you treat eval. Assume any input can become code. Build the surrounding architecture accordingly.