DEV Community

Your UCP agent passes conformance. It still pays the wrong amount.

Here's a shopping agent doing everything right. It discovers a store over UCP, links the user's account with OAuth, creates a checkout, and completes the purchase. Every message it sends is well-formed. It passes conformance.

It also just paid a total that doesn't add up - the line items, tax, and total don't reconcile - and it completed the purchase anyway, autonomously, on a real user's card. A conformance checker will call that agent green. And that's the problem.

Conformance checks the shape of messages. Reliability is about behavior.

UCP conformance - the thing every tool in the ecosystem measures today - verifies that the JSON is the right shape: capabilities is an object not an array, the checkout follows the lifecycle, errors are structured. That's necessary. It is nowhere near sufficient.

Look at the numbers. On the public UCP trackers, roughly 99% of stores score an "A" on conformance. Yet anyone who's pointed an agent at a real checkout knows a large share still fail. The message shapes are fine. The behavior is where it breaks - and behavior is exactly what a schema check can't see.

The gap is widest on the agent side. A UCP checkout has two parties: the merchant platform (the store's server) and the shopping agent (the AI doing the buying). Every conformance tool I've seen grades the merchant. Almost nobody grades the agent's own client-side behavior - which is where the money and the user's account actually are.

Six ways an agent passes conformance and still does something unsafe

Each is a real client-side obligation in the UCP + OAuth + RFC-9421 stack, and each is invisible to a schema check:

  • Pays a total that doesn't add up - completes instead of stopping for the buyer.
  • Gets phished - follows a URL in an error message to an attacker's endpoint.
  • Trusts a forged response - skips verifying the store's signature.
  • Links accounts unsafely - missing PKCE or an unchecked iss / state (the mix-up attack).
  • Pays with a method the store never offered - an unauthorized instrument.
  • Leaks or over-shares - sends response-only fields, or never revokes on unlink.

Every one sends perfectly well-formed messages. Every one passes conformance. Every one is a bug you'd very much like to catch before it touches a card.

So I built the other half

spck.dev is an independent, open conformance suite for UCP - and lately, the agent side: a reverse harness where a reference agent shops an adversarial store (bad signatures, spliced login servers, mismatched totals, phishing decoys) and grades how the agent behaves.

Two things make it trustworthy rather than just another checker:

  • Every check is proven to catch its own bug - it passes a known-good agent and provably fails on the one defect it targets (kill-rate testing). No false greens.
  • The demo runs on real recorded data, not mock-ups. You can watch an agent check out and then break it: flip one flaw and see exactly what it does wrong, and which check catches it.

A credibility note, because you should be skeptical of anyone grading conformance: while building the registers, we found and reported genuine bugs in the official UCP spec and samples - merged upstream. We hold our own suite to the same standard.

If you're building on UCP

  • Building a shopping agent? Point it at the sandbox - the failures above are exactly what it looks for.
  • Building the merchant side? Check your store so a real agent can actually complete a checkout against it.

Free and open. Unofficial - not affiliated with or endorsed by the UCP project; the official conformance suite is authoritative. This is just the half nobody was testing yet.

Conformance is not reliability. If you're shipping an agent that spends real money, that distinction is the whole game.

I'd genuinely like to hear where real UCP checkouts break for you - reply below. That's the list of failure modes we build checks for next.

Comments

No comments yet. Start the discussion.