Five ways an x402 payment can go wrong - and which ones you can catch before your agent pays
The Attack Surface
Pulling from the papers above, the recurring vectors are:
- Dynamic payTo swap. In x402 V2 the destination address can change per request. A seller (or a man-in-the-middle) quotes you address A, then returns address B in the actual payment requirement. Your agent pays B.
- Malicious 402 / overcharging. The endpoint quotes an absurd price, or a price that drifts upward across calls, and a naive agent just pays whatever the 402 says.
- Insecure transport. The quote - including the address you're about to pay - arrives over plaintext HTTP, where anyone on the path can rewrite it.
- Sybil-induced discovery. An attacker floods a discovery surface with fake, well-reviewed-looking endpoints to steer your agent toward a wallet they control.
- Prompt-injection-to-payment. Content the agent reads convinces it to send funds somewhere it shouldn't.
Here's the part nobody says out loud: some of these are checkable with pure local logic, and some are not. Conflating them is why "agent payment security" sounds harder than it is.
What You Can Catch Locally, Before the Payment
Vectors 1, 2, and 3 are structural. You don't need a reputation graph or a threat feed to catch them - you need a few deterministic checks run against the request the instant before your agent signs it. No network call, no service dependency, no trust in a third party.
This is exactly the slice Frisk's lite mode handles. It runs entirely on your machine, ships with zero runtime dependencies, and returns a verdict - allow, review, or block - with reasons. Here's the whole thing in use:
import { Client } from "frisk-screen";
const client = new Client(); // lite mode, no API key
const result = await client.screen(
"0x9a3f1b2c3d4e5f60718293a4b5c6d7e8f9a0bc12",
{
endpoint: "https://api.seller.x402/quote",
amount: 2.5,
asset: "USDC",
observedPayTo: quote.payTo, // what the endpoint actually told us to pay
policy: {
maxPerCall: 5.0,
allowedAssets: ["USDC"],
},
}
);
if (!result.allowed) {
console.log(result.verdict, result.reasons);
// e.g. "block", ["payTo differs from the expected counterparty"]
}
(There's a Python package with the identical API - pip install frisk-screen, same screen() call.)
Now map each check back to an attack:
Dynamic payTo swap → catch it
You know the counterparty you intended to pay. You also have the payTo the endpoint actually returned. If they differ, that's the V2 swap attack, and it's a one-line comparison:
if (
request.observedPayTo &&
request.observedPayTo.toLowerCase() !== counterparty
) {
// the address moved between quote and payment - don't pay
}
This is the single most valuable local check, because the swap is invisible to a human reviewing code - it only happens at runtime, per request.
Overcharging → catch it with policy
You can't know the "fair" price of an arbitrary endpoint without market data, but you absolutely know your own limits. A per-call ceiling and an asset allowlist are deterministic and offline:
if (policy.maxPerCall !== undefined && amount > policy.maxPerCall) {
/* review */
}
if (policy.allowedAssets && !policy.allowedAssets.includes(asset)) {
/* review */
}
This won't tell you a $2 call should cost $0.05. It will stop your agent from silently paying $400 because a malicious 402 said so. Most overcharging damage is just the absence of a spending limit.
Insecure transport → catch it
If the quote that carries the payment address came over http://, the address is untrustworthy on arrival. Refuse to act on it:
if (endpoint && !endpoint.toLowerCase().startsWith("https://")) {
/* downgrade */
}
Plus the obvious hygiene: is the counterparty even a well-formed address? A malformed counterparty is either a bug or a probe, and either way you shouldn't pay it. (Lite also runs a local seed blocklist - an offline check against known-bad addresses; the live, continuously updated list is the one thing here that belongs to the hosted service.)
That's a handful of deterministic checks, all running before a single token moves, all in code you can read in one file. No service to trust. This is the floor every x402 agent should have, and it's the part I made free and MIT precisely because it shouldn't be behind anyone's API - including mine.
What You Cannot Catch Locally - and Where I'll Be Honest
Vectors 4 and 5 - Sybil discovery and prompt-injection-to-payment - are different in kind. A locally-running function genuinely cannot know that an address belongs to a Sybil cluster, or that an endpoint with a clean-looking history has been quietly draining wallets for a week. That requires reputation data: a graph of who-paid-whom across many agents, accumulated over time. No amount of clever offline code substitutes for it.
And there's a third category the papers above actually spend most of their pages on, which no screening library - lite or hosted - should claim to fix: the protocol- and settlement-layer attacks. Payment replay; the settlement races where a server delivers before payment finalizes (the "paid-but-denied" and "unpaid-service" outcomes the Five Attacks paper centers on); facilitator trust and economic DoS against endpoints. Those live in the x402 spec, the facilitator, and the on-chain settlement path - not in the request your agent is about to sign. Frisk screens the counterparty and the shape of the transaction; it does not, and cannot, repair the protocol underneath it.
So this post is deliberately scoped to the vectors a pre-payment check can actually touch - pretending a screening call closes a replay or atomicity hole would be the other half of how agent-payment security gets oversold. So lite mode is upfront about this: it always reports "low" confidence and it does not claim to detect Sybil attacks. Pretending a local check can catch a reputation problem is how you ship false confidence, which is worse than no check at all.
This is the line between the open-source library and the hosted service - and I'd rather state it plainly than blur it for a pitch. The hosted side of Frisk is where reputation history and threat intelligence would live, and it is early; the part I'm comfortable telling every x402 developer to install today is the deterministic floor above. If you're shipping an agent that pays, start there. The five checks cost you nothing and close the attacks that are actually closeable in your own process.
Takeaways
Agent payment safety isn't one problem. It's three: structural checks you can do before paying, reputation you have to source from data, and protocol/settlement holes that sit below any screening call. Solve them separately - and don't let a tool for one pretend to cover the others.
The deterministic floor - payTo-swap detection, spending policy, transport and address sanity - catches three of the five documented x402 attack classes, with no network call and no trust in anyone. Ship it.
Be skeptical of anything claiming to detect Sybil/reputation attacks with purely local logic. That category needs data, and honesty about the boundary is the whole game.
Frisk is MIT and the lite engine is dependency-free: npm i frisk-screen / pip install frisk-screen. Source, threat-model notes, and the (short, readable) check logic are on GitHub. If you're building on x402 and you catch a vector I missed, open an issue - that's exactly the kind of thing this should accrete.
The reputation-backed hosted tier is in early access. If that's the part you need, email support@tryfrisk.dev - but the deterministic floor above is free, MIT, and yours to ship today regardless.
Sources referenced: "Five Attacks on x402" and "A402" (arXiv); Halborn, "x402 Explained: Security Risks and Controls"; AgentLISA x402 security position paper.
Comments
No comments yet. Start the discussion.