Not Enough SMEs or Customers to Make Your Evals? Make Some!
If you have been building with AI for a while, you probably remember when personas were how everyone learned to prompt. Something like: You are Rob Pike, architecting a new service in Go. Be opinionated and concise, and explain the tradeoffs behind your choices. How should I structure the packages? We all did some version of that. Then the field kind of moved on. We still hand agents a personality here and there, but persona prompting stopped being the headline move. It started to feel like a beginner trick.
I think personas are coming back. Not for voice and flavor this time, but for something more serious: they are a way to manufacture a stand-in for a human you cannot actually reach. And it turns out that solves a real problem in evals.
The bind
Here is the problem I keep running into. You need evals. Good evals need data that looks like real usage. And to make data that looks like real usage, you need access to real users or subject matter experts. But a lot of the time, you do not have enough of either, at least not yet.
Maybe the product is early and there are not enough users to learn from. Maybe your SMEs are real but hard to reach, the kind of people who do not have time to sit with you for a thousand test cases. In my case it is often structural. I do consulting work, and a lot of the time we are building for a company that has its own clients. There is a layer between us and the people who will actually use the thing. I may never get in a room with the end user at all. That is not a budget problem or a laziness problem. It is just the shape of the engagement, and no amount of asking nicely changes it.
The reframe
You do not need real users to start. You need realistic ones. And realistic ones you can build.
The key word is build, because the difference between a useful persona and a useless one is where it comes from. A persona you invent out of your own head is just your assumptions wearing a costume. The whole thing only works if the personas are grounded in the closest real humans you can actually reach.
The method
Here is what I actually did.
Build a knowledge base. I create a repo and fill it with everything I have learned about the end user. Meeting notes from discovery calls. Transcripts, when the client is comfortable being recorded. Research I have done on my own from public sources. Anything that tells me something true about who this user is and where they get stuck. The point is to triangulate across sources instead of leaning on a single conversation.
Point Claude Code at it and generate personas. I use Claude Code to read across the whole knowledge base and synthesize a small set of basic personas. Not a chat window, a repo. That matters, because the personas are coming out of a curated body of evidence rather than one prompt and a vibe.
Have the client validate them. This is the step that does the real work, and it is the one I would not skip. I send the personas back to the client to correct, enrich, and push back on. Sometimes the client has a UX research team and real user data to check against. Sometimes the client just knows their customer in their bones from working with them for years. Both are legitimate. The human who cannot sit for a thousand eval cases can still look at five personas and say, no, this one would never do that. That validation is what turns the personas from my guesses into the client's real knowledge, captured and structured.
Turn the personas into agents that exercise the product. Once a persona is validated, it goes back in the repo and becomes something I can use on demand. I can tell Claude Code, use this persona and hold a multi-turn conversation with the chatbot the way this user would. Out the other side I get user-phrased turns, full multi-turn conversations, and edge-case inputs. Those become cases in the eval set.
I want to be honest about how simple this was. The first version was basic. I was not doing anything clever. And it still gave us genuinely interesting outputs, the kind of conversations and edge cases we would not have thought to write by hand.
Where this breaks
A validated persona is a proxy, not ground truth, and it is worth being clear-eyed about that. The personas are only as good as the evidence underneath them and the client's willingness to correct them. If your discovery was thin, your personas will be confidently wrong, and a confidently wrong eval is worse than no eval.
They will also tend to miss the genuinely weird stuff, the inputs no one anticipated, because they are built from what people already expect their users to do. Real humans are stranger than any persona you will validate in a meeting.
So I do not treat this as a replacement for real user data. I treat it as a way to start before you have it, and as a complement once you do. The day real users show up, you check your personas against them and find out how close you got.
Who should try this
If you are building an AI product and you are stuck on evals because you cannot get to your users yet, this is worth an afternoon. You do not need a research team or a big budget. You need whatever real signal you can gather, a place to put it, and a willingness to let the people closest to the user tell you where you got it wrong.
Start small. One knowledge base, a handful of personas, one validation pass, one multi-turn conversation generated from a persona you trust. See what comes out. You can make it fancier later, or you may find, like I did, that the basic version already earns its keep.
Now I am exploring whether the same personas can serve as a regression harness for product improvements. More on that if it pans out.
Comments
No comments yet. Start the discussion.