We Shipped an MCP Server. Then We Shipped a CLI. The CLI Won.

Divya Lath, DevRelApr 4, 2026

If you've been building with AI coding agents for the past few months, you've probably noticed the same pattern I have: the agent is smart enough to figure out what to do, but the moment it needs to interact with the real web (fetch a page, extract data, navigate a site) everything slows to a crawl. Not because the model is bad. Because the interface between the agent and the tool is wrong.

For the past year, MCP has been the default answer. Need your agent to use a tool? Ship an MCP server. It's standardized, it's typed, it has a schema. Everybody's doing it.

At TinyFish, we did it too. We shipped an MCP server. It works. Agents discover our tools, read the schema, make structured calls. It checked every box.

Then we shipped a CLI for the same capabilities. And honestly, it wasn't even close. The CLI path is faster, uses a fraction of the tokens, and produces better results on every workload that involves more than a couple of web operations.

I don't think MCP is bad. I think the ecosystem made a wrong turn when it started using a discovery protocol as an execution layer. Let me explain.

MCP's Real Problem Isn't the Protocol. It's How We're Using It.

MCP solved tool discovery, a standard way for an agent to ask "what can I use?" and get a structured answer. That's genuinely useful. The mistake was routing all execution through the same interface. When every web fetch, every data extraction, every page interaction goes through MCP tool calls, three things compound:

Token overhead eats your context alive. A single MCP tool call (the schema reference, the serialized request, the JSON response envelope) costs 500 to 2,000 tokens. The actual data you wanted might be 200 tokens. That's 3 to 10x overhead on every call.

Run 20 fetches in a research task. That's 15K to 40K tokens burned on protocol plumbing before the agent has done any reasoning. Claude Code's context window is not infinite. Every token wasted on MCP overhead is a token the agent can't use to think.

I keep seeing agents degrade mid-task: responses get shallower, earlier context drops out, the agent forgets what it was doing. Nine times out of ten, it's because the context window is filled up with tool call overhead, not useful information.

Nothing hits the disk. MCP servers are sandboxed. They return data to the agent, but they can't write files, can't pipe output, can't create anything that persists after the session. If your agent extracts data from 50 pages, every byte passes through the context window. There's no "save this and come back to it later." It all has to live in working memory, all at once.

No composition. This is the one that bothers me most.

tinyfish agent run "Get the product price" --url https://example.com/pricing --sync | jq '.result.price' | sort -n > prices.txt

(three tools piped together, output to disk, zero context tokens on intermediate results). You cannot do this through MCP. Each call is an isolated request-response. No piping. No redirection. No parallel execution. You lose the entire unix toolkit.

CLI Maps to How Agents Actually Work

Claude Code, Cursor, OpenCode: these agents live in a terminal. They have bash, a file system, pipes and redirects. The substrate is unix.

When you give an agent a CLI tool instead of an MCP server, everything changes: output goes to disk (not context), commands compose natively, and there's zero lifecycle management. No server discovery, no connection state, no session keepalive.

A single bash pipeline replaces 5 to 10 MCP round-trips:

That pipeline costs ~100 tokens in context. The equivalent MCP interaction burns 10K+ across sequential tool calls, each bloating the context with request/response envelopes. This isn't subtle.

What We're Seeing in Production

We run both interfaces on the same infrastructure: same APIs, same web capabilities, two access patterns. Here's what the data shows:

The cause is straightforward: the agent has more room to hold intermediate thinking, backtrack, and synthesize when it isn't burning context on protocol overhead. This isn't a benchmark we designed to make CLI look good. It's production telemetry from the same system, same customers, two interfaces. CLI wins because the bottleneck in most agent work today is context capacity, and CLI is 10x more efficient with that resource.

A Concrete Example

Your agent needs to analyze competitor pricing across 15 SaaS companies. Here's what happens to the context window:

Agent reads pricing.json: 2K tokens of clean data. Reasons over the full dataset with 95% of its context still available. Same task. Same underlying web infrastructure. Radically different agent performance.

What We Built

Our CLI gives agents one powerful primitive: tinyfish agent run. You describe what you want in natural language, point it at a URL, and the TinyFish agent handles the rest: JS rendering, bot protection, CAPTCHAs, navigation, multi-step interaction.

Two modes let you pick the right output for your workflow. Streaming gives you real-time progress events. Sync waits for the complete result. Both output JSON to stdout, pipe to any unix tool, and keep the context window clean.

"But What About Discovery?"

Fair question. This is MCP's actual strength, and I'll give it full credit. If an agent has never seen a tool before, MCP's schema-based discovery is elegant.

In practice, though, this matters less than the ecosystem assumes. AI coding agents don't operate in a vacuum. They have system prompts, project configs, README files, documentation in the working directory. Claude Code reads your repo. Cursor reads your workspace. The agent learns about tools the way a developer does: through docs that already exist in the project.

If your agent doesn't know a CLI tool exists, the fix is a line in your README, not a runtime protocol layer.

MCP discovery is valuable in multi-agent orchestration where agents spin up dynamically and need to negotiate capabilities on the fly. That's real. It's also a niche case today. Most AI coding work is a single agent in a terminal session, with a toolkit defined at startup.

Where I Think This Is Going

The March 21st CLI rush wasn't random. Browser Use, Firecrawl, Browserbase: all shipping CLIs around the same time. The ecosystem is catching up to what production usage has been showing for months: agents need CLI access to be useful for real work.

MCP will keep its place in tool discovery and multi-agent orchestration, and yes, it's evolving. Streaming, file system access, and other improvements are on the roadmap. But the architectural mismatch runs deeper than missing features. A request-response protocol optimized for discovery is being used as an execution layer in a unix environment that already has a better execution model built in.

For the core loop of AI coding (the thing most developers actually do every day with Claude Code and Cursor) CLI is the right default. It's how the terminal works. It's how unix works. It's how agents should work.

We learned this the hard way. We shipped MCP first, saw the token economics, and shipped a CLI. The CLI won. I think most teams building agent tooling will land in the same place.

npm install -g @tiny-fish/cli

docs.tinyfish.ai/cli

tinyfish.ai: 500 free steps