I built a UCP conformance checker where every check has to prove it can catch its own bug
The worry: checks that can't fail
Most quick conformance checks boil down to "got a 200, looks fine." A check that never fails when the server is actually broken isn't a check - it's decoration, and it's dangerous because it hands you false confidence. So I tried to hold the tool to one rule: No check ships until I've proven it fails when the server is wrong.
How each check earns trust
Every check is anchored to something I didn't write myself:
Kill-rate testing. For each check, I inject the specific defect it's meant to catch - drop a required field, flip a status code, corrupt the body. If the check still passes, it's a false-pass hazard and it's blocked from release. A check only ships if it catches its own injected bug and passes cleanly on a known-good server.
The official schema validator as the oracle. Rather than hand-rolling JSON-Schema logic (a classic source of subtle divergence), it shells out to the official
ucp-schemavalidator, so payloads are judged against the spec's own schemas - not my interpretation of them.Spec citations. Each check points at a specific normative clause in the pinned spec, so a result is traceable rather than "trust me."
The whole suite also tests itself in CI - it goes red if any check loses its ability to catch the defect it's for.
What it turned up (with the caveat that I might be missing context)
Pointed at real implementations, a few things stood out. I'm framing these as "here's what I observed," not gotchas:
The official Node.js reference sample appears to serve
capabilitiesas a JSON array andservices.<name>as an object, where the pinned 2026 profile schema seems to require a keyed object and an array, respectively. The Python reference server and a live production Shopify store both use the schema-shaped forms, which is what made me think it's a real deviation rather than spec ambiguity - but I filed it upstream with a repro in case I've misread something.A few reference gaps it flags rather than silently passing (e.g. error bodies using
{detail, code}vs the spec's fuller envelope; a version-negotiation status-code difference between the spec and the official test suite).
None of this is a knock on the UCP project - the spec is genuinely good and the samples are useful. Surfacing drift like this is exactly what a conformance tool is for.
Trying it
pip install spck-conformance
spck-conformance --server https://your-store.example.com --init merchant.json
spck-conformance --server https://your-store.example.com --config merchant.json
Or paste a store URL at spck.dev/check for an instant discovery + profile check (nothing to install).
Or wire it into CI:
- uses: vishkaty/ucp-conformance@main
with:
server: https://your-store.example.com
It's capability-adaptive (only runs checks for what your server actually declares), reports not-tested honestly instead of silently passing, and shows expected requirement vs your actual response for anything that deviates.
Source, methodology, and the self-test harness are all in the open: github.com/vishkaty/ucp-conformance.
If you're working with UCP and something here looks wrong - especially the reference-sample findings - I'd really like to hear it.
Comments
No comments yet. Start the discussion.