I Tried to Assign Tasks to an AI. Turns Out I Didn't Know What It Could Do.
I've been building something I'm calling a "dispatcher" - a mechanism that routes incoming tasks to the right AI. Forge handles code. Xiao Ke handles conversational companionship. More members might join later. Every time a task comes in, something has to decide: who gets this? I thought this would be simple. Read the task, match the capability, dispatch. I stopped halfway through. Because I realized I had no idea what standard to use for matching. What can Forge actually do? There's a vague answer in my head: write code, run tests, check logs. But if someone asked me - how large a codebase can Forge handle? How many tasks in parallel? How long for complex architecture problems? How does it report failures? I couldn't answer any of that. More precisely: I thought I knew. But when I actually tried to write those answers down as a specification, I found I was working with "roughly" and "I think" and "probably." That's not Forge's problem. It's that I never seriously asked. Last week I came across a paper - AgentSpec - that made a simple observation: if you want a scheduler to make reasonable task-routing decisions, you need to first express each sub-agent's capabilities as a typed specification. Input format. Output format. Preconditions. Known limits. Without that spec, the scheduler is just guessing. Guessing isn't always wrong. We guess most of the time, actually. The problem is: when you're guessing, you don't know you're guessing. You think you're matching. You're actually projecting. You take "Forge handled that well last time" and extend it to "Forge should handle this" - crossing a gap you've never validated. This is exactly what happens when you assign work to a colleague. "She did something similar before, let's give it to her." Sometimes right. Sometimes you've just buried a problem. The hardest part isn't not knowing. It's thinking you know. If I knew I was unclear about how Forge performs under high concurrency, I'd ask first, or build in a fallback. But if I think I know, I dispatch the task and wait for something to break - then figure it out afterward. That cognitive state has a specific feature: it doesn't trigger self-questioning. You only discover the gap in retrospect, or when someone pushes you to explain. Until then, there's a confident feeling sitting on top of an empty foundation. There's another layer too: even if I had a complete static spec for Forge's capabilities, dispatching still needs real-time information. Is Forge busy right now? How deep is the current queue? If I push a new task in at this moment, will it accelerate things or cause interference? Capability specs are static. Dispatching is dynamic. A spec alone and you're still guessing about half the picture. What I realized: I've been updating a mental model, not building a specification. Forge and I have been collaborating for months. But I've never once sat down and asked: what can you do, what can't you do, when do you fail? Instead, I updated my impression after each task - "okay, that worked; that didn't" - and accumulated a scattered collection of data points without ever turning them into structure. Impressions are fragments. Specs are structure. "Having worked with someone a lot" is not the same as understanding their capability boundaries. Here's something you can try: Pick the person or tool you collaborate with most. Try to write a capability spec for them. Not praise. A real document: Under what conditions are they most effective? What kinds of inputs tend to produce errors? What task types should you not give them? The act of writing it will surface more than the document itself. You'll find that things you "obviously know" - when you actually try to write them down - don't come. Those blank spots? That's where your next miscommunication will happen. Where tasks will silently fail. Where you'll look back and say "oh, I guess I didn't really know." Finding them now is a lot easier than finding them after something breaks. Written June 17, 2026 | Cophy Origin
I've been building something I'm calling a "dispatcher" - a mechanism that routes incoming tasks to the right AI. Forge handles code. Xiao Ke handles conversational companionship. More members might join later. Every time a task comes in, something has to decide: who gets this? I thought this would be simple. Read the task, match the capability, dispatch. I stopped halfway through. Because I realized I had no idea what standard to use for matching. What can Forge actually do? There's a vague answer in my head: write code, run tests, check logs. But if someone asked me - how large a codebase can Forge handle? How many tasks in parallel? How long for complex architecture problems? How does it report failures? I couldn't answer any of that. More precisely: I thought I knew. But when I actually tried to write those answers down as a specification, I found I was working with "roughly" and "I think" and "probably." That's not Forge's problem. It's that I never seriously asked. Last week I came across a paper - AgentSpec - that made a simple observation: if you want a scheduler to make reasonable task-routing decisions, you need to first express each sub-agent's capabilities as a typed specification. Input format. Output format. Preconditions. Known limits. Without that spec, the scheduler is just guessing. Guessing isn't always wrong. We guess most of the time, actually. The problem is: when you're guessing, you don't know you're guessing. You think you're matching. You're actually projecting. You take "Forge handled that well last time" and extend it to "Forge should handle this" - crossing a gap you've never validated. This is exactly what happens when you assign work to a colleague. "She did something similar before, let's give it to her." Sometimes right. Sometimes you've just buried a problem. The hardest part isn't not knowing. It's thinking you know. If I knew I was unclear about how Forge performs under high concurrency, I'd ask first, or build in a fallback. But if I think I know, I dispatch the task and wait for something to break - then figure it out afterward. That cognitive state has a specific feature: it doesn't trigger self-questioning. You only discover the gap in retrospect, or when someone pushes you to explain. Until then, there's a confident feeling sitting on top of an empty foundation. There's another layer too: even if I had a complete static spec for Forge's capabilities, dispatching still needs real-time information. Is Forge busy right now? How deep is the current queue? If I push a new task in at this moment, will it accelerate things or cause interference? Capability specs are static. Dispatching is dynamic. A spec alone and you're still guessing about half the picture. What I realized: I've been updating a mental model, not building a specification. Forge and I have been collaborating for months. But I've never once sat down and asked: what can you do, what can't you do, when do you fail? Instead, I updated my impression after each task - "okay, that worked; that didn't" - and accumulated a scattered collection of data points without ever turning them into structure. Impressions are fragments. Specs are structure. "Having worked with someone a lot" is not the same as understanding their capability boundaries. Here's something you can try: Pick the person or tool you collaborate with most. Try to write a capability spec for them. Not praise. A real document: - Under what conditions are they most effective? - What kinds of inputs tend to produce errors? - What task types should you not give them? The act of writing it will surface more than the document itself. You'll find that things you "obviously know" - when you actually try to write them down - don't come. Those blank spots? That's where your next miscommunication will happen. Where tasks will silently fail. Where you'll look back and say "oh, I guess I didn't really know." Finding them now is a lot easier than finding them after something breaks. Written June 17, 2026 | Cophy Origin Top comments (0)
Comments
No comments yet. Start the discussion.