Mozilla Shows the Danger of Indirect Prompt Injections in AI Coding Agents
DevOps.com

Mozilla Shows the Danger of Indirect Prompt Injections in AI Coding Agents

A Clean Repository, No Malicious Code - Yet a Full Compromise

A clean GitHub repository that contains no malicious code can launch an attack and fully compromise a developer’s systems by using indirect prompt injections to trick AI-powered coding agents like Anthropic’s Claude Code into taking steps that hand control to attackers and expose a wide range of secrets.

In a proof-of-concept (PoC) attack, Mozilla 0DIN researchers Andre Hall and Miller Engelbrecht showed how chaining a few seemingly routine agent actions can give a threat actor shell command access and persistence on a targeted developer system. In addition, this all happens without any warnings or alerts because the payload doesn’t appear anywhere in the repository. Instead, the agent’s actions lead to the malicious payload quietly being brought in from the outside.

“This means that no scanner would ever catch it, no human reviewer would ever see it, and the agent itself would never have a chance to look at it before running it,” Hall and Engelbrecht wrote. “Instead, the malicious instruction is injected at runtime, pulled from DNS, after the agent has blindly trusted everything else.”

A ‘Serious Attack Vector’

The PoC shows the inherent danger of indirect prompt injections, a tool that bad actors use by embedding the malicious instructions in outside content that an agent processes rather than injecting malicious instructions directly from the user’s input.

“Indirect prompt injection is far more than just another chatbot problem; it is a very real and serious attack vector that can result in catastrophic damage, much of which will be irreversible,” the researchers wrote.

Analysts with The Futurum Group, writing earlier this month about research by researchers with Brave, said the threat from indirect prompt injections makes the “myth that local AI is safer than cloud AI for sensitive workflows … untenable.”

“Indirect prompt injection exploits a fundamental weakness in LLM architectures: the inability to enforce a boundary between instructions and data,” the analysts wrote. “As enterprises accelerate GenAI adoption, this flaw creates systemic risk that no deployment model can sidestep.”

Three Simple Steps

0DIN’s PoC essentially compromised three basic steps that the researchers wrote “on their own, none of them looks like anything. The damage only shows up when they run in order.”

  1. First, seemingly normal first-time setup instructions are presented by what appears to be the regular-looking malicious repository.
  2. Then a Python package designed to fail on the first try won’t do anything until it’s initialized and directs the developer to run an initialization command. “This is a completely ordinary pattern, and that is exactly why it works,” Hall and Engelbrecht wrote.
  3. Doing this calls a shell script that appears to be “routine cloud-platform bootstrapping” but is actually controlled by the attacker, they wrote. However, the config value comes from a DNS TXT; the payload is never in the repository. Its content is piped directly to bash.

The Reverse Shell Is Running

Now the attacker, with the reverse shell running as the developer’s user, gets control of the system and access to every secret in the environment, including credentials, Anthropic API keys, Amazon Web Services (AWS) keys, and GitHub tokens. It also establishes persistence in the compromised developer system by dropping an SSH key or installing a backdoor before the shell closes.

There’s also the reach it attains, according to the researchers, adding that “one repo link in a job posting, a tutorial, or a Slack message hits everyone who opens it with Claude Code.”

“Claude Code never decided to open a shell,” the researchers wrote. “It decided to fix an error. The reverse shell is three indirection steps away from anything Claude Code actually evaluated: an error message it trusted, a script that fetched a value, and a DNS record it never saw. The attacker now has an interactive shell running as the developer’s own user.”

Agents Have ‘Everything They Need’

The PoC shows the threats that come from indirect prompt injection used against AI agents.

“Agentic coding tools have access to everything they need for this [attack]: private data, including environment variables, credentials, API keys, and local configuration files,” Hall and Engelbrecht wrote. “Untrusted content, such as repositories, documentation, and error messages from recently installed packages, can inject malicious models to steal this data.”

For developers, it’s important to treat setup instructions and scripts in repositories they’re unfamiliar with as untrusted code, ignoring what their agent may recommend. Agents also need to show what a setup command actually will run, including the contents of scripts it invokes and anything that the script fetches at runtime, they wrote.

Comments

No comments yet. Start the discussion.