TinyFish
Search
Fast, structured web search
Fetch
Any URL to clean content
Agent
Multi-step web automation
Browser
Stealth Chromium sessions
All products share one API keyView docs →
Documentation
API reference and guides
Integrations
Connect with your stack
Blog
Product updates and insights
Cookbook
Open-source examples
Pricing
Overview
Enterprise-grade web data
Use Cases
What teams are building
Customers
See who builds with TinyFish
ContactLog InLog In
Products
SearchFast, structured web search
FetchAny URL to clean content
AgentMulti-step web automation
BrowserStealth Chromium sessions
Resources
DocumentationAPI reference and guides
IntegrationsConnect with your stack
BlogProduct updates and insights
CookbookOpen-source examples
PricingPlans, credits, and billing
Enterprise
OverviewEnterprise-grade web data
Use CasesWhat teams are building
CustomersSee who builds with TinyFish
ContactLog In
TinyFish

Web APIs built for agents.

Product
  • Enterprise
  • Use Cases
  • Customers
  • Pricing
  • Integrations
  • Docs
  • Trust
Resources
  • Cookbook
  • Blog
  • Current
  • Accelerator
Connect
  • X/Twitter
  • LinkedIn
  • Discord
  • GitHub
  • Contact Us
© 2026 TinyFish·Privacy·Cookies·Terms
Engineering

80% of your Web Fetch returns Junk

Matthew Sparr·ML Engineer·May 11, 2026
Share
80% of your Web Fetch returns Junk

Your agent fetches a news article for you. HTTP 200, markdown back, looks fine. Then you realize 80% of what your agent brought back and your LLM processed was nav bars, trending headlines, and weather widgets.

That’s the actual state of web fetching for agents right now. Most fetchers return a successful response and still give the agent a bad version of the page. But, not TinyFish Fetch. We pulled fifteen real articles from five different publishers last week and ran each one through TinyFish Fetch and two well-known competitors to see how wrong they got it.

Don't take our word for it.

Take these URLs, run them in our Fetch Playground, and through whatever you've got.

Or just keep scrolling - we'll show you exactly what we found!

HTTP 200 can still be bad input

Fifteen articles, five publishers, same five-minute window, live retrieval, markdown output. The table below shows the median of total text returned; not article length.

Bigger numbers here generally mean more irrelevant content.

ArticleTinyFish FetchService AService B
Daily Mail8,736 chars37,136 chars85,054 chars
Hindustan Times2,990 chars543 (headline only)30,571 chars
SCMP1,863 chars1,990 chars14,710 chars
The Guardian3,755 charsempty (SOURCE_NOT_AVAILABLE)8,006 chars
New York Times2,136 chars (2/3)empty (SOURCE_NOT_AVAILABLE)empty (HTTP 403)

Service B returned 8–10x as many characters as TinyFish Fetch on three of the four pages. Those extra characters were not deeper coverage. They were junk.

Service A was shorter on average but only returned a headline and a sign-in widget for one article and nothing at all for another. Nothing relevant. Nothing useful.

What the agent actually sees

Take the Daily Mail article for instance.

The article body is about 4,300 characters. Here’s what each service fetched and fed to the model:

ServiceTotal chars% of Total that is Article Content% of Total that is NOT Article Content
TinyFish Fetch4,673~92%~8% (a small DC Insider newsletter promo line)
Service A63,400~7%~93% (200 lines of unrelated story headlines stacked at the top)
Service B164,986~3%~97% (full site nav, weather widget, 60+ trending links, ad slots, runtime error text)


Rough math on that Daily Mail article: at ~4 characters per token, a 4,673-character TinyFish Fetch result is ~1,170 input tokens. Service B's 164,986-character version of the same article is ~41,000 tokens.

35× the cost for the same article, plus slower inference, plus irrelevant facts competing for attention in the context window. For one page this is a small waste. At fifty pages, it compounds into real degradation that affects response times, overall accuracy, and bottom lines.

The same pattern persists across other articles.

Hindustan Times: Service B returned 28,470 characters where the article body itself was ~3,000. The rest was the full top-nav rendered as a markdown bullet list (every Indian city page included).

SCMP: Service B returned 14,710 characters for an article whose body is ~1,863 characters. Roughly 87% of the response was section nav, edition pickers, related rails, and footer chrome.

What TinyFish Fetch does differently

Fetch is a browser-backed extraction service. The exact heuristics are proprietary, but the shape of the work is straightforward. We do a lot of small, site-specific things so the caller does not have to.

It also benefits from the same proprietary browser infrastructure behind our Browser API. Fetch does not need the full control surface of Browser, but using a browser we control gives us a better place to handle anti-bot systems: browser fingerprints, request behavior, proxy routing, and challenge pages.

Those details matter even when the only thing you want back is clean article text.

  • Load the page the way the page expects to be loaded. Some pages work with a normal HTTP fetch. Many do not. Modern sites often need browser rendering before the article is actually present.
  • Use the right wait strategy for the domain. Some pages are ready early. Some need a short idle window. Some get worse if you wait too long, because infinite-scroll modules and recommendation rails start filling the page. Fetch uses domain policies for those cases.
  • Retry suspicious results. If an extraction is very short, looks like a challenge page, or otherwise looks wrong, Fetch can try a fuller browser path instead of returning a technically valid but useless response.
  • Separate article text from site chrome. The extraction step drops navigation, related-story rails, comment widgets, ad slots, and other non-article content before the result goes to the model.
  • Return errors when the page is not really content. Bot challenges, empty pages, proxy failures, and other degraded states should not be passed along as source material.
  • Normalize the output. Markdown is the default, with HTML and JSON available when callers need them. The goal is to make the result usable in a prompt without another cleanup step.
  • Watch domains over time. Sites change constantly. We monitor output length, success rates, and degraded responses so new domain behavior becomes our problem, not every customer’s integration problem.

None of this is magic. It is just the unglamorous part of making web content usable for agents.

TinyFish Fetch isn't perfect either

It missed 1 of 3 NYT URLs, and we’re actively working to improve this. (Check back soon!)

However, our competitors miss whole publishers.

Where it fits

Use Fetch by itself when your agent already has URLs. Point it at a page and get clean content back.

Use Search first when your agent does not know where to look yet. Search finds candidate sources. Fetch turns those sources into usable evidence.

Any usecase that reads a lot of public web pages, like news monitoring, financial research, brand intelligence, or regulatory tracking lives and dies on fetch quality. Run one URL through TinyFish Fetch and you'll see it in the output: less noise, sharper answers, fewer tokens, fewer wasted dollars.

Try it

Search and Fetch are free. No credits, no credit card.

# Search

curl "<https://api.search.tinyfish.ai?query=nvidia+earnings+2026>" \\\\
-H "X-API-Key: $TINYFISH_API_KEY"

# Fetch

curl -X POST <https://api.fetch.tinyfish.ai> \\\\
-H "X-API-Key: $TINYFISH_API_KEY" \\\\
-H "Content-Type: application/json" \\\\
-d '{"url": "<https://www.theguardian.com/any-article>", "format": "markdown"}'

Grab your API Key: agent.tinyfish.ai/api-keys

Or try it out in the Playground first: agent.tinyfish.ai/playground/fetch

Get started

Start building.

No credit card. No setup. Run your first operation in under a minute.

Get 500 free creditsRead the docs
More Articles
Search and Fetch are now FREE for every agent, everywhere!
Company

Search and Fetch are now FREE for every agent, everywhere!

Keith Zhai·May 4, 2026
Production-Grade Web Fetching for AI Agents
Engineering

Production-Grade Web Fetching for AI Agents

Chenlu Ji·Apr 14, 2026
Why Stitched Web Stacks Fail in Production
Product & Integrations

Why Stitched Web Stacks Fail in Production

Keith Zhai·Apr 14, 2026