Context vs. Tokens in Agentic Coding: When to Use Subagents, and When to Fork

TinyFish TeamJul 1, 2026·6 min read

Most guidance on agentic coding ends at a single instruction: split the work into subagents and run them in parallel. It sounds complete until the invoice arrives. The moment you start delegating real work to coding agents, you hit a tradeoff almost nobody names — and which side of it hurts you depends entirely on which cost you forgot to watch.

Here's the uncomfortable part. The tactic that keeps your orchestrating agent's context window lean is usually the same tactic that burns the most tokens overall. The two goals pull against each other. Optimize the wrong one and you'll either quietly triple your spend or suffocate your main agent's memory.

After a long, multi-step debug-and-fix run of our own, this is the model we settled on.

Three ways a coding agent can hand off work

When an agent needs to knock out a chunk of work, it has roughly three moves:

Do it inline, in the main session. Every file it reads, every search it runs, every test it executes piles up in the primary conversation.
Spawn fresh subagents, one per task. Each starts from an empty context and hands back only a summary.
Fork the session. You spin off an agent that inherits the parent's entire conversation, works in the background, and returns a summary when it's done.

On paper these feel interchangeable. They aren't. Each one lands at a different spot on two cost axes that happen to be in direct tension.

The two cost axes nobody separates

Axis 1 — the main agent's context window. This is the working memory your orchestrating agent reasons over. It inflates quickly: one multi-file investigation can dump thousands of lines of tool output straight into it. A swollen context is slower, costs more per turn, and eventually triggers summarization that quietly loses detail.

Axis 2 — total tokens spent. This is everything, added up: the main agent plus every agent it delegates to.

The compressed version:

Inline work is the worst for your main context, but the best for total token spend.
Fresh subagents are the best for keeping the main session clean, and usually the worst for total tokens.
A fork is the middle path — it protects the main context and is often far cheaper than fresh subagents when the tasks share context.

You cannot drive both axes to zero at once. So the first thing to get straight is which one you're actually optimizing for.

The re-discovery tax: the hidden cost of fresh subagents

Why do fresh subagents run up the biggest token bill? Because of a charge that stays invisible until you go looking for it: re-discovery.

A fresh subagent starts knowing nothing. Before it can touch the real task, it has to re-find the relevant files, re-run the same searches, re-read the same code, and re-reach the same conclusions the main agent is already holding in memory.

Kick off three fresh subagents for three related fixes and you pay that understanding tax three separate times. If those tasks touch overlapping files — and related tasks almost always do — the same files get read three times over.

That's the trap. Splitting related work across fresh subagents feels efficient because the parallelism is right there in front of you. But for work that shares context, it's frequently the most expensive option on the table. The parallelism is genuine. So is the redundant context-loading sitting underneath it, and that part is never free.

There's exactly one situation where fresh subagents are the correct default: independent tasks with little shared context, where you want hard isolation — separate review, separate rollback, and zero risk of two agents editing the same file at the same moment. Outside of that, they're usually the wrong reflex.

The fork: the middle path that breaks the tradeoff

When your tasks share context, a fork is the move that escapes the tradeoff entirely.

It inherits everything the parent has already worked out, so it pays no re-discovery tax. And because it runs in the background, its own tool output — the genuinely noisy part — never lands back in the main context. That's most of the upside of both extremes at once: the orchestrating session stays lean, and you don't re-buy understanding you already own.

The advantage compounds when the work is a pipeline rather than a one-shot.

In our own run, a single investigation snowballed into a chain: diagnose a wrong value showing up in the UI, fix it across a parser, an API field, and a frontend component, write a one-off data migration, wire it all together, and commit. Instead of spinning up a new agent per step — each one re-learning the codebase from scratch — we resumed the same fork across every step. Each follow-up built directly on the understanding from the last.

By the final step, that fork already knew the exact functions, the exact data shapes, and the exact edge case that had tripped us up earlier. None of it had to be re-explained or re-derived. A fresh agent would have restarted from zero every single time.

A quick decision heuristic for agentic coding

You already know where one fact lives? Don't delegate. Just grab it.
Several independent tasks, little overlap, and you want isolation? Fresh subagents.
You already hold rich context and the tasks overlap? Stay inline, or use a single fork.
You want to protect the main context and skip re-discovery? Fork — especially a long-lived one you resume across a multi-step pipeline.
Optimizing purely for total tokens? Inline beats spawning anything, because nothing gets re-loaded.
Optimizing purely for a clean main context? Push the noisy work out — but reach for a fork over fresh subagents whenever the work shares context.

The takeaway

"Use subagents" isn't a strategy. The real question, every time, is this: are you protecting context, or protecting the token budget? Those are different goals with different answers, and the biggest waste comes from optimizing one while silently bleeding out on the other.

The single most underused move in agentic coding is the long-lived fork — one agent that accumulates context across an entire investigation-to-commit pipeline, instead of an army of fresh agents each rediscovering the world from scratch.

Re-discovery is the cost hiding in plain sight. Stop paying it twice.

FAQ

What is a coding agent?

A coding agent is an AI system that can read a codebase, run searches and tests, edit files, and carry out multi-step engineering tasks with limited human direction — rather than just autocompleting a line at a time.

What is a fork in agentic coding?

A fork is a delegated agent that inherits the parent agent's full conversation and context, then runs on its own — usually in the background — and reports back a summary. Think of it like branching a session: the child starts already knowing everything the parent figured out, so it skips the re-discovery work a fresh subagent would have to repeat. Because it runs in the background, its noisy tool output never lands back in the main context window. That's what makes it the middle path — lean main context without re-paying the understanding tax. The exact mechanics vary by tool; some frameworks call this session cloning, resuming, or background subagents.

How do you reduce token usage in agentic coding?

Keep work inline when you already hold the relevant context, since nothing has to be re-loaded. Avoid firing off multiple fresh subagents for tasks that share files — that forces each one to re-discover the same code and multiplies your token spend.

When should you use subagents vs. a fork?

Use fresh subagents for independent tasks with little shared context when you want hard isolation. Use a fork when the tasks overlap and you want to protect the main context without re-paying the re-discovery tax.

How do you manage the context window in a long agent session?

Push noisy, output-heavy work out of the main session — but prefer a fork over fresh subagents when that work builds on context you've already established, so the orchestrating agent's window stays lean without losing continuity.