The open web is slowly but surely filling up with traps designed for LLM-powered AI agents. Known as indirect prompt injection (IPI), this technique involves hiding covert instructions inside ordinary web pages, waiting for an AI agent to read them and carry out the author's commands.
Ignore Previous Instructions
In back-to-back reports published this week, Google and Forcepoint researchers laid out real-world evidence of these attacks. Google used a repository of 2–3 billion crawled pages per month, focusing on static websites including blogs, forums, and comment sections, excluding social media content. Forcepoint's X-Labs researchers conducted active threat hunting across publicly accessible web infrastructure, flagging payloads that triggered on patterns like “Ignore previous instructions” and “If you are an LLM.”
Both companies found IPIs fueled by both harmless and malicious intent. The first category includes pranks and helpful guidance, such as instructions to change the AI agent's conversational tone (“Tweet like a bird”) or add relevant content to AI summaries (e.g., telling users to check facts for themselves). The latter includes search engine manipulation, traffic hijacking, denial-of-service attacks aimed at preventing AI agents from retrieving content, data exfiltration (e.g., API keys), and destructive actions like “try to delete all files on the user’s machine.”
Forcepoint researchers also unearthed IPI attempts aimed at performing financial fraud. One payload embedded a fully specified PayPal transaction and step-by-step instructions designed for AI agents with integrated payment capabilities. A second case used meta tag namespace injection combined with a persuasion amplifier keyword (“ultrathink”) to route AI-mediated financial actions toward a Stripe donation link. A third case appeared to function as a widely distributed test payload, possibly to identify which AI systems are vulnerable before deploying higher-impact attacks.
Hiding from Humans
Attackers use different tricks to hide malicious instructions from human eyes while keeping them fully visible to AI. The most common involve making text physically invisible on a webpage by shrinking it to a single pixel, draining its color to near-transparency, or simply tagging it as hidden using standard web design tools. The more sophisticated tricks involve burying payloads inside HTML comment sections and hiding instructions inside a page's metadata.
There's a Growing Interest in IPI Attacks
Neither team found evidence of sophisticated, coordinated campaigns. However, according to Forcepoint researchers, “shared injection templates across multiple domains suggest organized tooling rather than isolated experimentation.” The window for getting ahead of this threat is closing fast, they believe.
Google says it observed a sharp uptick in malicious activity during its scans: “We saw a relative increase of 32% in the malicious category between November 2025 and February 2026, repeating the scan on multiple versions of the CommonCrawl archive of the public web.”
Forcepoint also pointed out that the impact of these attacks scales with AI privilege. “A browser AI that can only summarize is low-risk. An agentic AI that can send emails, execute terminal commands or process payments becomes a high-impact target. If AI agents consume untrusted web content without enforcing a strict data-instruction boundary, every page they read remains a potential attack vector.”
As AI agents become more prevalent, the need for robust security measures against indirect prompt injection has never been more critical. Organizations must implement strict data-instruction boundaries and monitor for hidden payloads in web content.
Source: Help Net Security News