DEV Community 2h ago

I built a UCP conformance checker where every check has to prove it can catch its own bug

The worry: checks that can't fail

Most quick conformance checks boil down to "got a 200, looks fine." A check that never fails when the server is actually broken isn't a check - it's decoration, and it's dangerous because it hands you false confidence. So I tried to hold the tool to one rule: No check ships until I've proven it fails when the server is wrong.

How each check earns trust

Every check is anchored to something I didn't write myself:

Kill-rate testing. For each check, I inject the specific defect it's meant to catch - drop a required field, flip a status code, corrupt the body. If the check still passes, it's a false-pass hazard and it's blocked from release. A check only ships if it catches its own injected bug and passes cleanly on a known-good server.
The official schema validator as the oracle. Rather than hand-rolling JSON-Schema logic (a classic source of subtle divergence), it shells out to the official ucp-schema validator, so payloads are judged against the spec's own schemas - not my interpretation of them.
Spec citations. Each check points at a specific normative clause in the pinned spec, so a result is traceable rather than "trust me."

The whole suite also tests itself in CI - it goes red if any check loses its ability to catch the defect it's for.

What it turned up (with the caveat that I might be missing context)

Pointed at real implementations, a few things stood out. I'm framing these as "here's what I observed," not gotchas:

The official Node.js reference sample appears to serve capabilities as a JSON array and services.<name> as an object, where the pinned 2026 profile schema seems to require a keyed object and an array, respectively. The Python reference server and a live production Shopify store both use the schema-shaped forms, which is what made me think it's a real deviation rather than spec ambiguity - but I filed it upstream with a repro in case I've misread something.
A few reference gaps it flags rather than silently passing (e.g. error bodies using {detail, code} vs the spec's fuller envelope; a version-negotiation status-code difference between the spec and the official test suite).

None of this is a knock on the UCP project - the spec is genuinely good and the samples are useful. Surfacing drift like this is exactly what a conformance tool is for.

Trying it

pip install spck-conformance
spck-conformance --server https://your-store.example.com --init merchant.json
spck-conformance --server https://your-store.example.com --config merchant.json

Or paste a store URL at spck.dev/check for an instant discovery + profile check (nothing to install).

Or wire it into CI:

- uses: vishkaty/ucp-conformance@main
  with:
    server: https://your-store.example.com

It's capability-adaptive (only runs checks for what your server actually declares), reports not-tested honestly instead of silently passing, and shows expected requirement vs your actual response for anything that deviates.

Source, methodology, and the self-test harness are all in the open: github.com/vishkaty/ucp-conformance.

If you're working with UCP and something here looks wrong - especially the reference-sample findings - I'd really like to hear it.

Read on DEV Community ↗ ← Back to News

I built a UCP conformance checker where every check has to prove it can catch its own bug

The worry: checks that can't fail

How each check earns trust

What it turned up (with the caveat that I might be missing context)

Trying it

Comments