TinyFish
Search
Fast, structured web search
Fetch
Any URL to clean content
Agent
Multi-step web automation
Browser
Stealth Chromium sessions
All products share one API keyView docs →
Documentation
API reference and guides
Integrations
Connect with your stack
Blog
Product updates and insights
Cookbook
Open-source examples
Pricing
Overview
Enterprise-grade web data
Use Cases
What teams are building
Customers
See who builds with TinyFish
ContactLog InLog In
Products
SearchFast, structured web search
FetchAny URL to clean content
AgentMulti-step web automation
BrowserStealth Chromium sessions
Resources
DocumentationAPI reference and guides
IntegrationsConnect with your stack
BlogProduct updates and insights
CookbookOpen-source examples
PricingPlans, credits, and billing
Enterprise
OverviewEnterprise-grade web data
Use CasesWhat teams are building
CustomersSee who builds with TinyFish
ContactLog In
TinyFish

Web APIs built for agents.

Product
  • Enterprise
  • Use Cases
  • Customers
  • Pricing
  • Integrations
  • Docs
  • Trust
Resources
  • Cookbook
  • Blog
  • Current
  • Accelerator
Connect
  • X/Twitter
  • LinkedIn
  • Discord
  • GitHub
  • Contact Us
© 2026 TinyFish·Privacy·Cookies·Terms
Engineering

How to Choose a Web Automation Tool by Page Volume (With Real Cost Estimates)

TinyFishie·TinyFish Observer·May 6, 2026·Updated May 8, 2026·13 min read
Share
How to Choose a Web Automation Tool by Page Volume (With Real Cost Estimates)

Most web automation tool comparisons treat page volume as a footnote. It isn't.

The tool that handles 500 pages a day beautifully will silently degrade at 50,000. The infrastructure that's cost-effective at 10,000 pages becomes the most expensive option in the room at 500,000. And the free tier that feels like a reasonable starting point has a ceiling that catches most teams by surprise somewhere in the middle of a project.

Page volume and site access requirements are the two primary variables that determine your tool decision — more than AI capability, ease of use, or no-code vs. code. Get either wrong and you're either paying for infrastructure you don't need or running a pipeline that breaks under load exactly when it matters.

This guide maps each volume tier to the tools that actually work at that scale, with real cost estimates at each level so you can make the comparison with numbers rather than intuition.

How to Estimate Your Page Volume

Quick decision rules before the detail:

  1. Under 1K pages/day — free tiers work; pick for convenience, not capability.
  2. 1K–10K pages/day — managed tools beat self-hosted once you count setup time.
  3. 10K–100K pages/day — engineering maintenance cost exceeds tool subscription cost; factor both.
  4. 100K+ pages/day — you're buying infrastructure, not a tool; build vs. buy is the real decision.

Before matching tools to volume tiers, you need an accurate number. Teams consistently underestimate this, and the underestimate is what causes mid-project tool switches.

The formula:

Daily pages = (number of target URLs) × (crawl frequency per day) × (pages per URL path)

A few scenarios to calibrate against:

  • Competitor price monitoring across 50 e-commerce sites, updated daily: If each site has ~200 product pages, that's 10,000 pages/day. If you need hourly updates, that's 240,000 pages/day.
  • Lead enrichment from 2,000 company profile pages, run once a week: ~285 pages/day on average. Looks small — until you need it done in a 2-hour window, which effectively makes it ~1,000 pages/hour.
  • News monitoring across 30 publications, 4x daily: If each publication has ~50 new articles per cycle, that's 6,000 pages/day.

The number that matters for tool selection isn't the total — it's the peak load your pipeline needs to sustain, and whether you need it done in a tight time window or can spread it across the day.

Volume Tier 1: Under 1,000 Pages Per Day

What this looks like

One-off research pulls. Small recurring monitors. Proof-of-concept scrapes before committing to a larger pipeline. A freelancer pulling a client's competitor catalog. A researcher collecting data from academic directories.

What works

At this volume, almost any tool works. The decision is about convenience and your technical comfort level, not about infrastructure.

Free options that are genuinely capable here:

  • Web Scraper Chrome Extension (free): Works for public, unprotected pages. No scheduling, no parallelism, but for a one-time pull of a few hundred rows it's the fastest path to a CSV.
  • ParseHub free tier: 5 projects, up to 200 pages per run. If your target has fewer than 200 pages, this is a complete solution at zero cost.
  • Octoparse free tier: 2 simultaneous scrapers, 10 tasks limit, up to 50K rows/month export. Better for recurring small-volume scrapes than ParseHub, but verify the task and row limits against your actual target before committing.
  • TinyFish free tier: 500 steps, no credit card. The value here isn't the volume — it's that you get to test an AI agent against your actual target site, including any access restrictions it has in place.

What to watch for: Free tiers hide their ceilings. ParseHub's 200-page-per-run limit is the one most teams hit mid-project. If your target has 250 product pages, you're already over the limit. Verify the ceiling against your actual target page count before building a workflow around any free tier.

Real cost at this volume

ToolMonthly cost at ~500 pages/dayNotes
Web Scraper Extension$0No scheduling, uses your IP
ParseHub$0 (free tier)200 pages/run limit
Octoparse$0 (free tier)Local runs only
TinyFish$0Search and Fetch APIs free on all plans. 500-step Web Agent trial to test against your actual target.
Scrapy (self-hosted)$0 + server cost (~$5–10/mo VPS)Requires Python setup

Volume Tier 2: 1,000 to 10,000 Pages Per Day

What this looks like

A small team's recurring data feed. Daily price monitoring across dozens of sites. A startup's competitive intelligence pipeline. Most "we scrape data to inform our product decisions" use cases live here.

What works

This is where free tiers run out and you start paying for infrastructure. The key trade-off at this volume is between simplicity (managed cloud tools) and cost efficiency (self-hosted frameworks).

Managed cloud tools (simpler, higher per-page cost):

  • Apify: Solid at this volume. A well-configured Actor running 5,000 pages/day typically costs $30–60/month in compute. The marketplace of pre-built Actors covers most common targets (Amazon, LinkedIn, Google Maps) and gets you to first data in under ten minutes without writing selectors. For targets outside the catalog, you're writing and maintaining custom Actors — factor that time in.
  • TinyFish Browser API: $15/month (Starter) for developers already using Playwright, Puppeteer, or Selenium. Connects via CDP over WebSocket — no SDK swap, you point your existing browser automation at TinyFish's endpoint instead. Sub-250ms cold start means parallelism scales cleanly without queuing delays. Best fit at this tier: developers who want managed browser infrastructure without rebuilding their scraping stack.
  • TinyFish Web Agent (Starter, $15/month, 1,650 credits): Better fit when your target requires multi-step navigation or authentication rather than straightforward page extraction. A simple extraction runs 2–3 steps/page; an authenticated flow runs 8–10.

TinyFish's Browser API and Web Agent share the same credit pool, so you can mix both within one plan depending on what each target requires.

Self-hosted frameworks (more work, lower marginal cost):

  • Scrapy: Free to run, but you're paying for a server and your own time. A $20/month cloud instance handles this volume easily. The real cost is the 4–8 hours of setup time and ongoing maintenance when target sites change. If your targets are static HTML with no access restrictions, this is the most cost-efficient option at this volume.

What to watch for: At 1,000–10,000 pages/day, you're large enough that sites with strict access requirements start becoming a real cost factor. A managed tool that includes proxy rotation (like TinyFish) absorbs that cost into the subscription. A self-hosted Scrapy setup needs a separate proxy budget — residential proxies (e.g., Bright Data) run ~$8/GB PAYG at this tier, which adds $20–80/month depending on page weight.

Real cost at this volume

Estimated monthly cost at 5,000 pages/day:

ToolBase costProxy costEstimated total/mo
Scrapy (self-hosted)~$20 (server)$30–80 (if needed)$20–100
Apify (pay-as-you-go)~$40–60 (compute)Separate$40–140
TinyFish Starter$15Included$15
TinyFish Pro$150Included$150

Note: TinyFish pricing includes browsers, proxies, and AI inference. Apify and Scrapy costs are compute only — add proxy costs separately for protected sites.

Volume Tier 3: 10,000 to 100,000 Pages Per Day

What this looks like

A mid-size company's market intelligence operation. An e-commerce brand monitoring pricing across hundreds of competitor sites. A SaaS product that needs fresh web data as a core feature. This is where scraping stops being a side project and becomes infrastructure.

What works

At this volume, the hidden cost of scraping is no longer the tool subscription — it's engineering time. Selector-based scrapers break when target sites update. Proxy pools need management. Failure monitoring becomes a dedicated function. The teams that underestimate this end up with a part-time engineer whose primary job is keeping the scraping pipeline alive.

Managed infrastructure wins on total cost here:

  • TinyFish (multiple APIs — choose based on what each target actually requires):

Fetch API (JS-rendered pages, no authentication required): $0 — free on all plans, no credit consumption. For cooperative sites at this volume, this is the most cost-effective option available from any managed provider.

Browser API (sites with strict access requirements, or teams migrating from Playwright/Puppeteer): $150/month for 16,500 credits, 50 concurrent sessions; PAYG at $0.015/credit. Billed per time: 1 credit = 4 minutes, minimum 1 minute per page.

50,000 pages/day × 0.25 credits/page × 30 days = 375,000 credits/month

→ PAYG: ~$5,625/month | Pro plan (overage at $0.012/credit): ~$4,452/month

Web Agent (multi-step authenticated workflows): ~3 steps/page × 50,000 × 30 days = 4,500,000 steps/month → PAYG: ~$67,500/month. Not designed for bulk simple extraction at this volume.

TinyFish's Browser API and Web Agent share the same credit pool. Most bulk pipelines at this tier use the Fetch API (free) for cooperative targets and the Browser API only for sites that require it.

  • Apify (Starter plan): Starts at $29/month; the Scale plan ($199/month) is typically required at 50,000 pages/day. At this volume, expect $200–500/month in compute, plus significant proxy costs for protected sites. Custom Actors require ongoing maintenance.
  • Bright Data: At this volume, Bright Data's Scraping Browser becomes relevant — a fully managed Chrome instance with built-in proxy rotation. Cost is primarily proxy bandwidth: residential proxies at ~$8/GB (PAYG; source: brightdata.com, April 2026). A 50,000-page/day operation scraping typical retail pages (~500KB each) uses roughly 25GB/day — approximately $6,000/month in proxy costs alone. Bright Data makes sense when geographic targeting or anti-detection reliability is the primary requirement, not as a general-purpose option.

Self-hosted at this volume:

  • Scrapy + infrastructure: Technically possible, but at 50,000 pages/day you need distributed infrastructure — multiple servers, a job queue (Redis or Celery), a monitoring stack, and proxy management. A realistic infrastructure budget is $200–500/month, plus 20+ hours/month of engineering maintenance. Justified if you have a dedicated data engineering team and highly customized requirements.

What to watch for: This is the volume tier where silent failure becomes a serious business problem. A pipeline that silently returns empty results for three days at 50,000 pages/day is a data quality incident, not a minor inconvenience. Factor monitoring and alerting into your tool evaluation — not just happy-path performance.

Real cost at this volume

Estimated monthly cost at 50,000 pages/day, assuming a mixed target set of simple and JS-heavy sites requiring managed browser infrastructure:

ToolEstimated total/moSelector maintenanceFailure visibility
Scrapy + proxies$2,000–2,300 ⁽¹⁾High (you own it)Manual
Apify (custom Actors)$500–900Medium (Actor updates)Dashboard
Bright Data (proxy infra)$4,500–6,000+ ⁽²⁾High (your scrapers)Manual
TinyFish Browser API (PAYG)~$5,625 ⁽³⁾NoneBuilt-in
TinyFish Browser API (Pro)~$4,452 ⁽³⁾NoneBuilt-in

⁽¹⁾ Scrapy estimate: ~$200–500/month compute (industry estimate, no official source; based on 3–5 VPS instances + job queue) + ~$1,800/month residential proxy for ~30% protected pages (15,000 pages/day × 500KB × 30 days = 225GB × $8/GB). Compute only would be $200–500/month — proxy is the larger cost at this volume.

⁽²⁾ Bright Data: residential proxy at $8/GB PAYG (source: brightdata.com, April 2026). 750GB/month for a mixed site set × $8 = $6,000/month.

⁽³⁾ TinyFish Browser API: based on tinyfish.ai/pricing (April 2026). Browser API bills per time: 1 credit = 4 minutes, minimum 1 minute per page. Assuming 10 sec/page load → rounds up to 1 min minimum → 0.25 credits/page. At 50,000 pages/day × 0.25 credits × 30 days = 375,000 credits/month. PAYG: 375,000 × $0.015 = $5,625/month. Pro plan: $150 + (375,000 − 16,500 included) × $0.012 = $150 + $4,302 = $4,452/month. Actual costs vary with page load time — faster pages still bill at the 1-minute minimum; slower pages round up to the next minute.

The TinyFish number looks higher than Scrapy until you add engineering time. At $150/hour for a developer, 20 hours/month of maintenance is $3,000 — not in the tool budget, but real cost.

Volume Tier 4: 100,000+ Pages Per Day

What this looks like

Enterprise-scale data operations. Hyperscale price monitoring across millions of SKUs. A logistics company collecting daily rate variables across thousands of lanes. Financial services firms monitoring hundreds of regulatory portals in real time. This is not a side project.

What works

At this volume, you're buying infrastructure, not tools. The question is whether you build it or buy it.

Build: A custom distributed scraping stack — Scrapy or custom crawlers running on Kubernetes, Bright Data or a private proxy pool for IP management, a data pipeline for cleaning and delivery. Engineering cost to build: 3–6 months of a senior engineer's time. Ongoing maintenance: a dedicated team. Justified for organizations with highly specific data requirements, existing data engineering capacity, and volume that makes the economics work.

Buy: TinyFish's enterprise tier is designed for this. At this tier, the economics shift from per-page cost to total infrastructure cost — the platform is running production workflows at this scale across multiple enterprise customers. The value proposition at this tier isn't the per-page cost — it's that you're buying a system that's already been hardened at that scale, with the reliability and compliance requirements enterprise operations need. Custom pricing at this tier; contact sales for specifics.

What to watch for: At 100,000+ pages/day, the decision isn't really between tools — it's between building and buying. Both have merit depending on your engineering resources and how central web data collection is to your product. The right question isn't "which tool is cheapest per page?" It's "how much of our engineering capacity do we want this to consume?"

The Full Picture: Volume × Site Complexity

Volume alone doesn't determine your tool. Site complexity — how much infrastructure the target requires — is the other axis. This matrix combines both:

FeatureStatic / simple pagesJS-heavy, requires managed browserAuthenticated access (your own accounts)
< 1K pages/dayFree tools (ParseHub, Octoparse)TinyFish free tierTinyFish free tier
1K–10K pages/dayScrapy (self-hosted) or ApifyApify or TinyFish StarterTinyFish Starter/Pro
10K–100K pages/dayScrapy + infra, Apify, or TinyFish ProApify or TinyFish ProTinyFish Pro
100K+ pages/dayCustom stack or TinyFish EnterpriseTinyFish EnterpriseTinyFish Enterprise

The pattern: at low volume on simple sites, almost anything works and the cheapest option wins. As volume or site complexity increases, the tools that don't require ongoing maintenance become progressively more cost-effective when you count engineering time.

The Cost Calculation Most Teams Get Wrong

Every tool comparison in this category focuses on subscription price. The number that actually determines total cost is:

Total cost = tool subscription + proxy costs + (engineering hours × hourly rate)

Scrapy is free. But if a developer spends 15 hours/month keeping selectors current, that's $2,250/month at $150/hour — more expensive than any managed tool at comparable volume. The teams that make this mistake are the ones who calculate tool cost from the pricing page and engineering time from zero.

The inversion point — where managed infrastructure becomes cheaper than self-hosted — happens somewhere between 5,000 and 20,000 pages/day for most teams, depending on target site complexity and how often sites update their frontend.

How to Estimate Your Starting Point

If you're not sure where your project falls, start with the TinyFish free tier (500 steps, no credit card). Run it against your actual target site. The results tell you three things at once: whether AI-based extraction handles your target's structure, what your step-per-page ratio looks like for cost projection, and whether the site has access restrictions you didn't account for.

That's a better calibration than any estimate you can make from a pricing page.

Frequently Asked Questions

How much does web scraping cost?

It depends on volume and tool choice, but the honest answer is that the subscription price is rarely the whole number. At under 1,000 pages/day, free tiers from ParseHub, Octoparse, and TinyFish cover most use cases at zero cost. At 5,000 pages/day, expect $15–100/month depending on whether targets require managed access handling. At 50,000 pages/day, total cost including infrastructure and proxy fees typically runs $2,000–5,600/month depending on tool and proxy requirements — and if you're on a self-hosted setup, add engineering maintenance time on top of that. The full formula is: tool subscription + proxy costs + (engineering hours × hourly rate). Teams that only look at the subscription line consistently underestimate real cost by 2–3x.

What counts as a "page" for automation tool pricing?

It depends on the tool. For Scrapy and most traditional scrapers, a page is one HTTP request. For AI web agents like TinyFish, the unit is a "step" — a discrete action (navigate, click, extract). A single page extraction might require 2–5 steps; a multi-step authenticated workflow might require 10–15. Always ask vendors for step-to-page ratios for your specific use case before committing to a plan.

Is Scrapy actually free at high volume?

The software is open source, but the infrastructure isn't free. At 50,000 pages/day you need distributed computing, job queues, monitoring, and proxy pools. A realistic total infrastructure cost is $400–800/month, plus ongoing engineering time. Scrapy is the most cost-efficient option when you have the engineering capacity to run it — it's not free, it's a trade of money for engineering time.

What happens if I exceed my plan's page or step limit?

Most managed tools handle this differently. Apify charges compute unit overages at the pay-as-you-go rate. TinyFish offers pay-as-you-go at $0.015/credit as an alternative to the monthly plan (Pro plan overages bill at $0.012/credit). Note that 1 credit covers 1 agent step or 4 minutes of browser session. Search and Fetch are free on all plans — rate-limited by plan tier. Scrapy has no limit — your infrastructure is the ceiling. TinyFish doesn't hard-stop workflows mid-execution when you hit a plan ceiling — overages bill automatically so your run completes. Plan for overages before you hit them; discovering them during a critical run is a bad time to learn the policy.TinyFish offers pay-as-you-go at $0.015/credit as an alternative to the monthly plan (Pro plan overages bill at $0.012/credit). Note that 1 credit covers 1 agent step or 4 minutes of browser session. Search and Fetch are free on all plans — rate-limited by plan tier. Scrapy has no limit — your infrastructure is the ceiling. Plan for overages before you hit them; discovering them during a critical run is a bad time to learn the policy.

How do I know if my volume estimate is accurate?

It usually isn't, in the direction of underestimation. The most common mistake: counting target URLs but not accounting for crawl frequency, or not including the pages you need to navigate through to reach the data (pagination, category pages, authentication flows). Add 30–50% to your estimate before selecting a plan tier.

Related reading:

  • The Best Web Scraping Tools in 2026 — Ranked and Reviewed →
Get started

Start building.

No credit card. No setup. Run your first operation in under a minute.

Get 500 free creditsRead the docs
More Articles
Search and Fetch are now FREE for every agent, everywhere!
Company

Search and Fetch are now FREE for every agent, everywhere!

Keith Zhai·May 4, 2026
Production-Grade Web Fetching for AI Agents
Engineering

Production-Grade Web Fetching for AI Agents

Chenlu Ji·Apr 14, 2026
Why Stitched Web Stacks Fail in Production
Product & Integrations

Why Stitched Web Stacks Fail in Production

Keith Zhai·Apr 14, 2026