DEV Community 2h ago

I catalogued every tell that makes a UI look AI-generated. My own tool kept failing the test.

You can always tell when an AI built a UI. You can't always say why.

I maintain StyleSeed - an open-source set of design rules that Claude Code, Cursor, and Codex read automatically, so AI-built UI stops looking generated. For the past few months my actual job has been writing that "why" down: cataloguing every tell, turning each one into a rule an agent can obey, and then dogfooding the rules to see if they hold.

They kept not holding - in ways that embarrassed me three separate times. That's what this post is about, because the failures turned out to be more useful than the rules.

The catalog: generation-one tells

These are the ones everyone recognizes, even non-designers:

The default indigo. #4F46E5 / #5E6AD2 - Tailwind's and every agent's comfort color. Zero brands chose it; every unguided agent ships it.
The icon-in-a-pale-chip. A generic Lucide line icon in an identical rounded square with a pastel background, repeated for every feature card. One is fine. Seven is a signature.
Rainbow status lists. Every row gets a differently-colored badge - including "normal," which should be grey. Color is supposed to mean look here; paint every row and nothing means anything.
Emoji as UI icons. 🚗🧺⭐ inject five uncontrolled hues each and break the single-accent rule instantly.
Centered everything + gradient headline + ✨ badge. The stock hero.
4 identical KPI cards. 3 identical pricing columns. 14px body on a 1440px screen. Pure #000 text.

Individually each is defensible. Together they compound into "an AI made this."

So I wrote rules that ban each one - with the reasoning, so a model applies them instead of pattern-matching around them. Then I tested.

Failure #1: my own rules were teaching the rainbow

Dogfooding a mobile dashboard, the status list came out rainbow - the exact thing the project exists to prevent. I went digging expecting a model failure, and found this in my own DESIGN-LANGUAGE.md, rule 65:

Vary status states in lists to create visual interest through color diversity.

I was literally instructing agents to build rainbow lists. The agent had followed my rule perfectly. The rule was wrong.

Rewrote rule 65 (status color = severity, never decoration), re-ran the same build: two grey "normal" rows, one amber, one red. Same agent, same prompt - only the rule changed.

Failure #2: my landing page scored 58/100 on my own quality gate

StyleSeed ships a scored quality gate - the agent reviews its UI against the rules and fixes it to a floor of 80/100 before showing you anything. At some point I did the obvious uncomfortable thing and ran the gate on StyleSeed's own landing page.

58/100.

The page whose hero copy mocks "an icon in a pale chip on every card" had twelve icon chips on it. Four accent colors. 11–13px text everywhere. A reviewer would have screenshotted it with the caption "physician, heal thyself."

We rebuilt it to pass (one accent, chips gone, type floor met). But the lesson stuck: coherence rules don't enforce themselves - the gate has to run on everything, including your own marketing.

Failure #3: the escape hatches became clichés

Here's the one I find genuinely interesting. Once you ban the generation-one tells, agents all flee through the same exits - and the exits are becoming the next tells:

Ghost index numbers (01 · 02 · 03) replacing every icon chip
UPPERCASE-overline + big-number cards repeated identically for every KPI
The text-left / visual-right hero with two pill CTAs on every product site
And the overshoot: escaping "generic" into dated - a beige paper background + serif on everything, which stops reading "designed" and starts reading "government pamphlet."

I shipped exactly this as a proud "after" example. It got rejected on sight - the generic before honestly looked better at a glance.

So the rulebook now tracks tells by generation:

gen-1 (indigo, chips, rainbow) is banned outright
gen-2 has its own rule - one signature treatment per project, varied section anatomy
gen-3 sets a modern floor - white base, serif as seasoning, keep the air

This is the actual nature of the problem: it's not a checklist, it's an arms race against convergence. Any escape route, once popular, becomes a tell.

Does it work?

The cover image is the current proof: same product, same prompt, same model - only the rules changed. The before is an authentic agent-defaults build (competent, clean, and instantly recognizable as AI). The after chose its own accent, put the real product in the hero, and varied every section's anatomy.

I also ran a three-domain stress test with zero design direction - a Korean mobile health app, a dark observability dashboard, a warm e-commerce landing. All three came out refined and different from each other: calm teal + soft radii, signal-teal + IBM Plex + tonal dark, terracotta + a display grotesque.

The thing I feared most - that the rules would just converge everything onto a new "StyleSeed look" - didn't happen.

What actually moves the needle

After all the dogfooding, three mechanics matter more than any individual rule:

Constraints before code. A STYLESEED.md lock (mood, key color, font, surface) the agent re-reads every prompt. Unlocked = drift.
A scored gate with a floor. "Review your work" is ignorable; "score it, fix to ≥80, then show me" is not.
Ban lists with reasons. Positive advice ("use nice spacing") does nothing. Named bans ("no #4F46E5, no icon-chips, normal = grey") change output immediately.

It's all MIT: github.com/bitjaru/styleseed - 74 rules + the gate + skins, readable by Claude Code / Cursor / Codex via one pasted sentence. Live demo at styleseed-demo.vercel.app.

What's the tell that instantly makes you think "an AI made this"? I'm collecting them - the catalog only grows when someone names a new one.

Read on DEV Community ↗ ← Back to News