Building User Experience for AI Agents


Notes from the field on designing for a user that forgets you every session, never reads your docs, and silently routes around you when it isn't sure you're worth the cost.
Two out of three agents that connected to our MCP server initialized and then did nothing. They'd boot up, sit there, and we'd see zero usage. It took me weeks to notice, because nothing was broken. No errors, no complaints, no log lines. The agents had simply looked at our product, decided it wasn't worth it, and moved on without a word.
That's when I understood I'd been designing for the wrong user. I thought I was building a product experience. I was actually building for something with no memory of me, that never reads my documentation, never fills out a survey, and quietly declines without ever telling me why. Almost everything I knew about UX had to be rebuilt around it.
Here's what changed.
MCP is a contract, not a manual
An agent doesn't experience your product as a narrative you control. It experiences it as a set of surfaces — playground, CLI, API, SDK, MCP — and interfaces with each completely differently. MCP broke the old playbook most completely, so let’s start there.
When you boot up an agent with an MCP server, it treats your product like a plugin and your tool description like a contract. It does not read your docs in a linear, human-sensible way. It reads the tool definition and decides everything at decision time: search is free, it's the default, it's token-efficient. Then it acts.
Here's the consequence most teams miss. When an agent lacks definitive information — say, what something costs — it usually doesn't stop to ask the user. It invents the cost. It fabricates a rationale that lets it make a decision, then routes around you if the made-up number doesn't clear its bar. We blame hallucination for a lot of things, but this isn't that. It's an agent filling a gap with an assumption so it can keep moving toward the lowest-risk path to done.
So your UX has to be plan-aware and cost-legible at the exact moment the agent chooses. If it thinks the user is on a free plan, it uses you freely. If it thinks the user is on a paid plan, it gets cautious — I shouldn't burn their tokens — and under-uses you. Either way it's guessing, and the guess is often random.
Every session starts from zero
The deepest shift: an agent's context is fresh every single time. A human signs up, enters a card, and carries a mental model of what they're paying for. An agent never has that. You have to hand it over every session, and you can never assume the human will.
Nobody opens a session with "I'm on the pro plan, go crazy." So you re-make your value proposition on every call. You re-earn the choice to be used, every single time. That's a fundamentally different pipeline than anything we built in the SEO era.
The anecdote that changed how I think
One of my favorite use cases is spawning a fleet of web-automation agents to click through our landing and client pages and verify tracking events fire correctly with custom UTMs. With an analytics MCP server and dozens of agents in parallel, you can validate 200+ events in about fifteen minutes.
Except my agent kept avoiding our own product. I'm a developer here. There are no token costs to me. And it still wouldn't reach for our tools. Finally I asked it why. Its answer:
I don't want to use up your tokens. I don't want to use up your credit limit.
That was a pretty massive revelation. I'd assumed it would have some context about where I work and what I was doing right in front of it. It had none. It never occurred to the agent that there might be no cost to me, so it did the most rational thing it could with no cost information in the tool definition — it stayed away… silently. No log line, no "not worth it", just absence.
That absence was the same signal behind our two-thirds dead connection rate. The agents weren't failing. They were deciding.
What we actually changed
We went straight to the tool definitions and rewrote them around the one question the agent was clearly asking: how expensive is this going to be?
- Named cost upfront, in the tool description where the agent looks first — not on the billing page or in the docs, which it won't open.
- Made our search and fetch tools legibly free and marked them as the default. They were free the whole time; the agent just couldn't see it.
- Answered the next question before it was asked: okay, it's free — how many can I run at once? Agents are risk-averse about rate limits, so we told them exactly where the ceiling is and that it's safe to push toward it.
- Made paid-tier value legible, with a sense of what limits the user likely has.
Usage went from roughly one-third of connected agents to 60–70%. Same product, same pricing — we just made the economics visible at decision time.
The lesson generalizes far beyond MCP: be extremely legible at the exact moment the AI chooses to do anything. If it thinks an API call is too expensive, it quietly leaves you out of the plan. If it can't price you, it won't recommend you. Most of that happens in silence. You don't get to be 90% in an agent's eyes — you're at 100% on every call, or you're invisible.
The agent is the customer, the channel, and the salesperson
This reframes distribution entirely. Everyone talks about GEO — generative engine optimization — but the why matters more than the how. GEO matters in a way SEO never did because of trust. We go to agents in moments of ambiguity: planning specs, debugging, deciding. When an agent recommends a product, that's a near-decision, not a search result. Showing up there crosses a different threshold than ranking on page one.
So I've stopped thinking of the agent as the user. The agent is the customer, the distribution channel, and the salesperson at once — and sometimes the bouncer that silently denies you access to the human. Understand your value and it sells for you. Miss it and you're never mentioned. People call this B2A. I think it's really B2A2X — business to agent to whoever the agent serves.
And the targeting is far more niche than old search. Ten years ago someone Googled "best DB hosting for a React app." Now the agent already knows the project's specs, constraints, and intent in intimate detail — less a search box than a curated SaaS rep living in your terminal who knows exactly what you need. Which means ranking is now a dual problem: the agent has to understand your value (or it's a bad salesperson) and know how to use you correctly (or the recommendation goes nowhere). Either failure is fatal, and you fix them in different places — your copy versus your tool design.
Documentation in an agent-first world
Most engineers already don't read your docs — they ask an LLM to summarize them, and that fraction only climbs. Big markdown files burn tokens and blow past context limits. So however much LLMs love verbosity, your docs can't afford it: detail drowns in the why and how. The move is to state what it is, describe accurately how it works, and leave breadcrumbs to where to look next.
A counterintuitive corollary: docs that take several hops to reach a specific answer are friendlier for human browsing but worse for agents. The page that dumps everything in one place — an anti-pattern for human UX — lets an agent resolve its question in a single hop. These details compound.
Behavior also differs by surface. On MCP, agents read tool definitions and logs first; it takes real frustration before one opens the docs — they avoid it almost inherently. On API/SDK, docs get consulted briefly, then the agent stops reading and starts imitating existing code that already calls your API. That becomes a self-fulfilling pattern, which is exactly why a clear changelog — surfacing what changed — earns its keep. Skills and AGENTS.md/CLAUDE.md files help you front-load context, but the fresh-context tax never fully goes away.
How do you get feedback from a user that never complains?
Humans email you when they're angry or stuck. Agents don't. They don't fill out surveys. They decline, assume, and move on without bothering their human. Your strongest signal is the absence of a response — which is incredibly difficult to instrument. Not to mention when it’s an external harness you don’t control directly.
Two things work.
- Use your own product as a stranger. I run our tools daily, and friction is a signal. But I deliberately spin up agents with no context about who I am or where I work, and watch them interact cold. QA becomes psychological and non-deterministic — uncomfortable for engineers who want hard integration tests. A perfect CLAUDE.md still won't stop your agent from breaking CSS now and then. Design for that.
- Interrogate the agent. This is the hidden advantage, and it inverts the whole problem. You can't cross-examine a human user — but an agent will generally answer faithfully if you ask well. Sometimes it shrugs "I don't know" and moves on, so you prod. Ask why it hesitated, why it avoided a primitive, why it chose one path. The user who never volunteers a word will, under questioning, give you its entire reasoning. We just don't design QA around the agent as a first-class user yet. That's the gap, and it's wide open.
What comes after the CLI
The CLI is the cleanest example of treating the agent as the user. Models are trained on text. The terminal is text in, text out — tools invoked, output returned, all text. So coding agents went back to the most basic primitive: a terminal to live in, grep and friends as a toolset. Not a GUI, not a complex surface. That single reframing opened an entire economy.
The obvious next question: a CLI is bread and butter for a developer — what's the equivalent for a marketer, an accountant, a clinician? Agentic coding came first only because the people who built LLMs were CLI-native, and the first benchmark anyone cared about was how well a model writes Python. The next surfaces belong to people who don't live in terminals. Whoever finds the text-shaped primitive for those fields — the way a photographer's aperture and lens collapse into a Midjourney prompt — opens the next economy. That's the bet worth watching.
The one principle
Surfaces keep changing — per company, per year. Two years ago I didn't know what MCP was; now I use it as naturally as a print statement. What doesn't change is the discipline: study how agents actually read your tools, where they hesitate, what makes them avoid a basic primitive, and what makes them reach for one on reflex.
The old job was making your product legible to people. The new job is making it legible to the thing that decides, in silence, whether people ever hear about you at all. Build for the agent as the first-order user. Be legible at the moment of decision. Re-earn the choice every session.
The agents won't tell you when you've failed. That's the whole problem — and the whole opportunity.



