What Happens Between Your Agent's Tool Calls And Why It's Slower Than You Think

TinyFishie, TinyFish ObserverApr 8, 2026·Updated May 19, 2026·9 min read

Every tool call in your agent pipeline has a number attached to it. Search: 450ms. Fetch: 3 seconds. Browser navigation: 5 seconds. These numbers get measured, reported, optimized.

Nobody measures what happens between them.

At TinyFish, we started instrumenting the gaps. Not the tool calls themselves, but the transitions: the moment a search result comes back to the moment a fetch request goes out. The moment a fetched page is parsed to the moment a browser session opens. The moment a browser action completes to the moment the agent's next reasoning call begins.

What we found: in a typical multi-tool agent pipeline, these inter-call gaps account for 30 to 40 percent of total end-to-end latency. Not the tools. Not the LLM. The spaces in between.

This post walks through a real agent turn, gap by gap, and explains where that time goes.

Anatomy of a Single Agent Turn

Consider a straightforward competitive intelligence task: "Find the current enterprise pricing for [competitor], verify it on their actual pricing page, and return structured JSON."

An agent using a typical multi-tool stack executes this as a sequence of tool calls:

Search (Exa/Tavily) to find the competitor's pricing page URL
Fetch (Firecrawl) to pull the rendered content of that page
LLM reasoning to determine if the fetched content has the pricing, or if the agent needs to navigate deeper
Browser session (Browserbase) to load the page and interact with it (expand pricing tiers, toggle annual/monthly, dismiss popups)
Extract structured data from the browser's rendered DOM
Return JSON to the calling application

Six steps. Each one has its own latency. But between each step there's a transition that most observability tools don't capture.

The Six Gaps

We instrumented this exact workflow across 200 runs on a fragmented stack. These are the median gap times.

Gap 1: Search response → Fetch request (120-180ms)

The search API returns JSON. Your agent code deserializes it, extracts the top URL, validates it, constructs a new HTTP request with different headers and auth credentials for the fetch service, serializes that request, and sends it over a new TCP connection.

Each individual operation here is fast. JSON parsing: 5-10ms. URL validation: negligible. But the fetch service requires its own authentication. That's a new TLS handshake (50-80ms in a cold connection), an API key validation on their end (20-40ms), and response header negotiation.

None of this shows up in either the search latency or the fetch latency. It lives in the gap.

Gap 2: Fetch response → LLM reasoning (80-150ms)

The fetch API returns rendered page content, usually as markdown or structured HTML. Your agent needs to prepare this for the LLM: trim it to fit context window limits, format it as a tool result, assemble it with the conversation history and system prompt, serialize the whole payload as JSON, and POST it to the LLM API.

Context assembly is the hidden cost here. If your agent maintains conversation history (which most do for multi-step tasks), the payload grows with each turn. A third-turn context assembly that includes two previous tool results can hit 30-50ms just in string concatenation and JSON serialization. Add network overhead to the LLM endpoint, and you're at 80-150ms before the model starts generating tokens.

Gap 3: LLM decision → Browser session request (200-400ms)

This is typically the largest gap. The LLM decides the agent needs to open a browser session. Your agent code parses the LLM's tool call response, validates the parameters, then constructs a CDP session request to a completely different service (Browserbase, Steel, etc.).

A new service means a new TCP connection, a new TLS handshake, a new API key authentication. If the browser service requires session configuration (proxy settings, browser profile, viewport size), that's additional payload construction and validation. If the service is in a different geographic region than your search and fetch providers, add network latency accordingly.

And this is before the browser cold start. The 200-400ms gap happens before the 5-to-10-second cold start clock even begins.

Gap 4: Browser action complete → Result extraction (50-100ms)

After the browser navigates and interacts, the agent needs to extract data from the current page state. In a CDP-based setup, this means sending a JavaScript evaluation command over the WebSocket connection, waiting for execution, receiving the DOM snapshot, parsing it, and formatting it for the agent.

This gap is smaller than the others but repeats for every single browser action. In a five-step navigation, it adds 250-500ms total.

Gap 5: Extraction result → Next LLM reasoning (80-150ms)

Same pattern as Gap 2: assembling the extraction result into the conversation context, serializing, sending to the LLM. But now the context is larger because it includes the entire conversation history plus multiple tool results. Serialization time grows.

Gap 6: Final LLM response → Application return (30-60ms)

The LLM produces the final structured JSON. Your agent parses the response, validates the schema, and returns it to the calling application. The smallest gap, but it's still there.

The Math

Adding up the median gap times from our 200-run sample:

Gap 1 (search→fetch): 150ms Gap 2 (fetch→LLM): 110ms Gap 3 (LLM→browser): 300ms Gap 4 (browser action→extract): 75ms × 5 actions = 375ms Gap 5 (extract→LLM): 120ms × 4 reasoning turns = 480ms Gap 6 (final→return): 40ms

Total gap time: ~1,455ms

Total tool call time for the same workflow (search + fetch + browser cold start + 5 navigation steps + LLM reasoning): ~18 seconds.

Gap time as percentage of total: ~8% in absolute terms, but 30-40% of the compressible latency budget. The browser cold start (5-10s) dominates absolute time, but it's a single fixed cost. The gaps compound with every additional tool call and every additional reasoning step. A more complex workflow with 10 tool calls and 8 reasoning turns sees gap time climb to 3-4 seconds.

For pipelines running thousands of tasks daily, those 1.5 to 4 seconds per task add up to hours of cumulative wait time.

Why Gaps Exist: The Architectural Root Cause

These gaps aren't bugs. They're a structural consequence of assembling an agent pipeline from independent services.

Each service has its own network boundary. Every transition between tools requires leaving one service's infrastructure, traversing the public internet, and entering another service's infrastructure. Even with fast connections, each boundary crossing costs 50-100ms minimum.

Each service has its own authentication. Exa validates your API key differently than Firecrawl validates yours differently than Browserbase validates yours. There's no shared auth context. Every transition re-proves identity.

Each service has its own serialization format. Exa returns JSON in one schema. Firecrawl returns markdown or a different JSON schema. Browserbase communicates over CDP WebSocket with its own message format. Your agent code is a translation layer between incompatible formats, and translation takes time.

Context doesn't persist across boundaries. After your search call, the search service forgets you exist. After your fetch call, the fetch service forgets you exist. Every service interaction is stateless, which means your agent code has to reconstruct context for every single call.

What Elimination Looks Like

In TinyFish's architecture, search, fetch, browser, and agent execution run on the same infrastructure. When a search result needs to flow into a fetch operation, that transition is an internal function call, not a network request. No TCP connection. No TLS handshake. No separate authentication. No format translation.

The result: gaps that cost 150-400ms in a fragmented stack cost single-digit milliseconds in ours.

Specifically:

Gap 1 (search→fetch) drops from 150ms to ~2ms. Same infrastructure, same auth context, same process.

Gap 3 (LLM decision→browser) drops from 300ms to ~5ms. The browser is already warm (as we detailed in our latency tax analysis). No cold start, no new connection, no new authentication.

Gaps 4 and 5 (extraction and re-reasoning loops) drop proportionally because the browser and the agent share state directly rather than through serialized messages.

Total gap time for the same workflow: ~50ms instead of ~1,455ms.

This is the part of the unified infrastructure argument that gets overlooked. Most discussions about unified vs. fragmented stacks focus on billing simplification or developer experience. The latency compression from eliminating serialization boundaries is a quantifiable engineering advantage that shows up in every single agent turn.

When Gaps Matter Most (And When They Don't)

Not every pipeline is gap-sensitive. A nightly batch job that runs 1,000 tasks over 8 hours can afford 1.5 extra seconds per task. The total added time is 25 minutes, probably not worth re-architecting for.

Gaps become the bottleneck in three scenarios:

Real-time agent interactions. When a user is waiting for an agent response (chat interfaces, voice assistants, interactive tools), every second of latency degrades the experience. Shaving 1.5 seconds off a 5-second response is the difference between "fast" and "tolerable."

High-frequency monitoring. Competitive intelligence pipelines that check 500 sites every hour need each task to complete quickly so the cycle can restart. 1.5 extra seconds per task across 500 tasks adds 12.5 minutes to each cycle. That's 12.5 minutes of stale data in a market that moves in real-time.

Deep multi-step workflows. Workflows with 15 or 20 tool calls (authenticated portal navigation, multi-page form submission, complex data validation) see gap time scale linearly with step count. A 20-step workflow accumulates 5-8 seconds of pure gap time. Combined with the codified learning savings that compress per-step reasoning time, eliminating serialization gaps is the second largest lever for reducing total workflow time.

How to Measure Your Own Gaps

If you're running a multi-tool agent pipeline and want to know how much time you're losing to inter-call overhead, here's a simple instrumentation approach:

Wrap every tool call with timestamps before and after. But also timestamp the moment your agent code starts preparing the next call (after the previous result is received) and the moment the next call's request actually goes out. The difference between "result received" and "next request sent" is your gap.

In Python, it's roughly:

Do this for every transition. Sum the gaps. Compare to total end-to-end time. If gaps are under 10% of total time, your pipeline is either simple (few tool calls) or already well-optimized. If they're over 25%, serialization overhead is a meaningful contributor to your latency budget.

TinyFish's observability layer (available on Starter plans and above) surfaces this breakdown automatically in the run history, including per-step timing and inter-step overhead.

Frequently Asked Questions

What is serialization overhead in AI agent pipelines?

Serialization overhead is the time spent converting data between formats and transmitting it across service boundaries during an agent's execution. When an agent calls a search API, receives JSON, parses it, constructs a new request for a fetch API, authenticates with that service, and deserializes the response, each of those steps adds latency that doesn't appear in any individual tool's reported performance. In multi-tool pipelines, this overhead accumulates across every transition and typically accounts for 30 to 40 percent of compressible latency.

Why don't standard observability tools catch this latency?

Most monitoring tools track individual API call durations. They measure how long the search call took, how long the fetch call took, how long the LLM inference took. The time between calls, spent on parsing, request construction, authentication handshakes, and network traversal, falls into a gap that no single tool's metrics capture. You need end-to-end instrumentation at the agent orchestration level, not just per-tool metrics, to see it.

How much latency do serialization gaps add to a typical agent turn?

In our measurements across 200 runs on a multi-tool stack, median total gap time was approximately 1.5 seconds for a six-step workflow. For more complex workflows with 15 to 20 steps, gap time scaled to 5 to 8 seconds. In TinyFish's unified infrastructure, the same transitions cost roughly 50ms total because tool calls route internally rather than across network boundaries.

Does this matter if I'm only making one or two tool calls per agent turn?

For simple pipelines with one search call and one LLM response, gaps are minimal (under 200ms) and unlikely to be your bottleneck. The impact scales with pipeline complexity. Once your agent makes four or more tool calls per turn, or operates in a multi-step workflow with reasoning loops, gap time becomes a significant fraction of total latency.

How does unified infrastructure eliminate serialization overhead?

When all web primitives (search, fetch, browser, agent execution) run on the same infrastructure, transitions between them are internal function calls rather than network requests. No TCP connections to establish, no TLS handshakes, no separate API authentication, no format translation between incompatible schemas. TinyFish's architecture routes all operations through shared infrastructure, reducing inter-call transitions from 150 to 400 milliseconds each to single-digit milliseconds.

Can I measure serialization gaps in my existing pipeline?

Yes. Timestamp the moment each tool call's result is received and the moment the next tool call's request is sent. The difference is your gap for that transition. Sum all gaps and compare to total end-to-end latency. If gaps exceed 25 percent of total time, serialization overhead is a meaningful contributor worth addressing.

Try TinyFish Free

500 free steps, no credit card. The fastest way to test whether TinyFish fits your workflow.

Start free →