
Most web automation tool comparisons treat page volume as a footnote. It isn't.
The tool that handles 500 pages a day beautifully will silently degrade at 50,000. The infrastructure that's cost-effective at 10,000 pages becomes the most expensive option in the room at 500,000. And the free tier that feels like a reasonable starting point has a ceiling that catches most teams by surprise somewhere in the middle of a project.
Page volume and site access requirements are the two primary variables that determine your tool decision — more than AI capability, ease of use, or no-code vs. code. Get either wrong and you're either paying for infrastructure you don't need or running a pipeline that breaks under load exactly when it matters.
This guide maps each volume tier to the tools that actually work at that scale, with real cost estimates at each level so you can make the comparison with numbers rather than intuition.
Quick decision rules before the detail:
Before matching tools to volume tiers, you need an accurate number. Teams consistently underestimate this, and the underestimate is what causes mid-project tool switches.
The formula:
Daily pages = (number of target URLs) × (crawl frequency per day) × (pages per URL path)A few scenarios to calibrate against:
The number that matters for tool selection isn't the total — it's the peak load your pipeline needs to sustain, and whether you need it done in a tight time window or can spread it across the day.
One-off research pulls. Small recurring monitors. Proof-of-concept scrapes before committing to a larger pipeline. A freelancer pulling a client's competitor catalog. A researcher collecting data from academic directories.
At this volume, almost any tool works. The decision is about convenience and your technical comfort level, not about infrastructure.
Free options that are genuinely capable here:
What to watch for: Free tiers hide their ceilings. ParseHub's 200-page-per-run limit is the one most teams hit mid-project. If your target has 250 product pages, you're already over the limit. Verify the ceiling against your actual target page count before building a workflow around any free tier.
| Tool | Monthly cost at ~500 pages/day | Notes |
|---|---|---|
| Web Scraper Extension | $0 | No scheduling, uses your IP |
| ParseHub | $0 (free tier) | 200 pages/run limit |
| Octoparse | $0 (free tier) | Local runs only |
| TinyFish | $0 | Search and Fetch APIs free on all plans. 500-step Web Agent trial to test against your actual target. |
| Scrapy (self-hosted) | $0 + server cost (~$5–10/mo VPS) | Requires Python setup |
A small team's recurring data feed. Daily price monitoring across dozens of sites. A startup's competitive intelligence pipeline. Most "we scrape data to inform our product decisions" use cases live here.
This is where free tiers run out and you start paying for infrastructure. The key trade-off at this volume is between simplicity (managed cloud tools) and cost efficiency (self-hosted frameworks).
Managed cloud tools (simpler, higher per-page cost):
TinyFish's Browser API and Web Agent share the same credit pool, so you can mix both within one plan depending on what each target requires.
Self-hosted frameworks (more work, lower marginal cost):
What to watch for: At 1,000–10,000 pages/day, you're large enough that sites with strict access requirements start becoming a real cost factor. A managed tool that includes proxy rotation (like TinyFish) absorbs that cost into the subscription. A self-hosted Scrapy setup needs a separate proxy budget — residential proxies (e.g., Bright Data) run ~$8/GB PAYG at this tier, which adds $20–80/month depending on page weight.
Estimated monthly cost at 5,000 pages/day:
| Tool | Base cost | Proxy cost | Estimated total/mo |
|---|---|---|---|
| Scrapy (self-hosted) | ~$20 (server) | $30–80 (if needed) | $20–100 |
| Apify (pay-as-you-go) | ~$40–60 (compute) | Separate | $40–140 |
| TinyFish Starter | $15 | Included | $15 |
| TinyFish Pro | $150 | Included | $150 |
Note: TinyFish pricing includes browsers, proxies, and AI inference. Apify and Scrapy costs are compute only — add proxy costs separately for protected sites.
A mid-size company's market intelligence operation. An e-commerce brand monitoring pricing across hundreds of competitor sites. A SaaS product that needs fresh web data as a core feature. This is where scraping stops being a side project and becomes infrastructure.
At this volume, the hidden cost of scraping is no longer the tool subscription — it's engineering time. Selector-based scrapers break when target sites update. Proxy pools need management. Failure monitoring becomes a dedicated function. The teams that underestimate this end up with a part-time engineer whose primary job is keeping the scraping pipeline alive.
Managed infrastructure wins on total cost here:
Fetch API (JS-rendered pages, no authentication required): $0 — free on all plans, no credit consumption. For cooperative sites at this volume, this is the most cost-effective option available from any managed provider.
Browser API (sites with strict access requirements, or teams migrating from Playwright/Puppeteer): $150/month for 16,500 credits, 50 concurrent sessions; PAYG at $0.015/credit. Billed per time: 1 credit = 4 minutes, minimum 1 minute per page.
50,000 pages/day × 0.25 credits/page × 30 days = 375,000 credits/month
→ PAYG: ~$5,625/month | Pro plan (overage at $0.012/credit): ~$4,452/month
Web Agent (multi-step authenticated workflows): ~3 steps/page × 50,000 × 30 days = 4,500,000 steps/month → PAYG: ~$67,500/month. Not designed for bulk simple extraction at this volume.
TinyFish's Browser API and Web Agent share the same credit pool. Most bulk pipelines at this tier use the Fetch API (free) for cooperative targets and the Browser API only for sites that require it.
Self-hosted at this volume:
What to watch for: This is the volume tier where silent failure becomes a serious business problem. A pipeline that silently returns empty results for three days at 50,000 pages/day is a data quality incident, not a minor inconvenience. Factor monitoring and alerting into your tool evaluation — not just happy-path performance.
Estimated monthly cost at 50,000 pages/day, assuming a mixed target set of simple and JS-heavy sites requiring managed browser infrastructure:
| Tool | Estimated total/mo | Selector maintenance | Failure visibility |
|---|---|---|---|
| Scrapy + proxies | $2,000–2,300 ⁽¹⁾ | High (you own it) | Manual |
| Apify (custom Actors) | $500–900 | Medium (Actor updates) | Dashboard |
| Bright Data (proxy infra) | $4,500–6,000+ ⁽²⁾ | High (your scrapers) | Manual |
| TinyFish Browser API (PAYG) | ~$5,625 ⁽³⁾ | None | Built-in |
| TinyFish Browser API (Pro) | ~$4,452 ⁽³⁾ | None | Built-in |
⁽¹⁾ Scrapy estimate: ~$200–500/month compute (industry estimate, no official source; based on 3–5 VPS instances + job queue) + ~$1,800/month residential proxy for ~30% protected pages (15,000 pages/day × 500KB × 30 days = 225GB × $8/GB). Compute only would be $200–500/month — proxy is the larger cost at this volume.
⁽²⁾ Bright Data: residential proxy at $8/GB PAYG (source: brightdata.com, April 2026). 750GB/month for a mixed site set × $8 = $6,000/month.
⁽³⁾ TinyFish Browser API: based on tinyfish.ai/pricing (April 2026). Browser API bills per time: 1 credit = 4 minutes, minimum 1 minute per page. Assuming 10 sec/page load → rounds up to 1 min minimum → 0.25 credits/page. At 50,000 pages/day × 0.25 credits × 30 days = 375,000 credits/month. PAYG: 375,000 × $0.015 = $5,625/month. Pro plan: $150 + (375,000 − 16,500 included) × $0.012 = $150 + $4,302 = $4,452/month. Actual costs vary with page load time — faster pages still bill at the 1-minute minimum; slower pages round up to the next minute.
The TinyFish number looks higher than Scrapy until you add engineering time. At $150/hour for a developer, 20 hours/month of maintenance is $3,000 — not in the tool budget, but real cost.
Enterprise-scale data operations. Hyperscale price monitoring across millions of SKUs. A logistics company collecting daily rate variables across thousands of lanes. Financial services firms monitoring hundreds of regulatory portals in real time. This is not a side project.
At this volume, you're buying infrastructure, not tools. The question is whether you build it or buy it.
Build: A custom distributed scraping stack — Scrapy or custom crawlers running on Kubernetes, Bright Data or a private proxy pool for IP management, a data pipeline for cleaning and delivery. Engineering cost to build: 3–6 months of a senior engineer's time. Ongoing maintenance: a dedicated team. Justified for organizations with highly specific data requirements, existing data engineering capacity, and volume that makes the economics work.
Buy: TinyFish's enterprise tier is designed for this. At this tier, the economics shift from per-page cost to total infrastructure cost — the platform is running production workflows at this scale across multiple enterprise customers. The value proposition at this tier isn't the per-page cost — it's that you're buying a system that's already been hardened at that scale, with the reliability and compliance requirements enterprise operations need. Custom pricing at this tier; contact sales for specifics.
What to watch for: At 100,000+ pages/day, the decision isn't really between tools — it's between building and buying. Both have merit depending on your engineering resources and how central web data collection is to your product. The right question isn't "which tool is cheapest per page?" It's "how much of our engineering capacity do we want this to consume?"
Volume alone doesn't determine your tool. Site complexity — how much infrastructure the target requires — is the other axis. This matrix combines both:
| Feature | Static / simple pages | JS-heavy, requires managed browser | Authenticated access (your own accounts) |
|---|---|---|---|
| < 1K pages/day | Free tools (ParseHub, Octoparse) | TinyFish free tier | TinyFish free tier |
| 1K–10K pages/day | Scrapy (self-hosted) or Apify | Apify or TinyFish Starter | TinyFish Starter/Pro |
| 10K–100K pages/day | Scrapy + infra, Apify, or TinyFish Pro | Apify or TinyFish Pro | TinyFish Pro |
| 100K+ pages/day | Custom stack or TinyFish Enterprise | TinyFish Enterprise | TinyFish Enterprise |
The pattern: at low volume on simple sites, almost anything works and the cheapest option wins. As volume or site complexity increases, the tools that don't require ongoing maintenance become progressively more cost-effective when you count engineering time.
Every tool comparison in this category focuses on subscription price. The number that actually determines total cost is:
Total cost = tool subscription + proxy costs + (engineering hours × hourly rate)
Scrapy is free. But if a developer spends 15 hours/month keeping selectors current, that's $2,250/month at $150/hour — more expensive than any managed tool at comparable volume. The teams that make this mistake are the ones who calculate tool cost from the pricing page and engineering time from zero.
The inversion point — where managed infrastructure becomes cheaper than self-hosted — happens somewhere between 5,000 and 20,000 pages/day for most teams, depending on target site complexity and how often sites update their frontend.
If you're not sure where your project falls, start with the TinyFish free tier (500 steps, no credit card). Run it against your actual target site. The results tell you three things at once: whether AI-based extraction handles your target's structure, what your step-per-page ratio looks like for cost projection, and whether the site has access restrictions you didn't account for.
That's a better calibration than any estimate you can make from a pricing page.
How much does web scraping cost?
It depends on volume and tool choice, but the honest answer is that the subscription price is rarely the whole number. At under 1,000 pages/day, free tiers from ParseHub, Octoparse, and TinyFish cover most use cases at zero cost. At 5,000 pages/day, expect $15–100/month depending on whether targets require managed access handling. At 50,000 pages/day, total cost including infrastructure and proxy fees typically runs $2,000–5,600/month depending on tool and proxy requirements — and if you're on a self-hosted setup, add engineering maintenance time on top of that. The full formula is: tool subscription + proxy costs + (engineering hours × hourly rate). Teams that only look at the subscription line consistently underestimate real cost by 2–3x.
What counts as a "page" for automation tool pricing?
It depends on the tool. For Scrapy and most traditional scrapers, a page is one HTTP request. For AI web agents like TinyFish, the unit is a "step" — a discrete action (navigate, click, extract). A single page extraction might require 2–5 steps; a multi-step authenticated workflow might require 10–15. Always ask vendors for step-to-page ratios for your specific use case before committing to a plan.
Is Scrapy actually free at high volume?
The software is open source, but the infrastructure isn't free. At 50,000 pages/day you need distributed computing, job queues, monitoring, and proxy pools. A realistic total infrastructure cost is $400–800/month, plus ongoing engineering time. Scrapy is the most cost-efficient option when you have the engineering capacity to run it — it's not free, it's a trade of money for engineering time.
What happens if I exceed my plan's page or step limit?
Most managed tools handle this differently. Apify charges compute unit overages at the pay-as-you-go rate. TinyFish offers pay-as-you-go at $0.015/credit as an alternative to the monthly plan (Pro plan overages bill at $0.012/credit). Note that 1 credit covers 1 agent step or 4 minutes of browser session. Search and Fetch are free on all plans — rate-limited by plan tier. Scrapy has no limit — your infrastructure is the ceiling. TinyFish doesn't hard-stop workflows mid-execution when you hit a plan ceiling — overages bill automatically so your run completes. Plan for overages before you hit them; discovering them during a critical run is a bad time to learn the policy.TinyFish offers pay-as-you-go at $0.015/credit as an alternative to the monthly plan (Pro plan overages bill at $0.012/credit). Note that 1 credit covers 1 agent step or 4 minutes of browser session. Search and Fetch are free on all plans — rate-limited by plan tier. Scrapy has no limit — your infrastructure is the ceiling. Plan for overages before you hit them; discovering them during a critical run is a bad time to learn the policy.
How do I know if my volume estimate is accurate?
It usually isn't, in the direction of underestimation. The most common mistake: counting target URLs but not accounting for crawl frequency, or not including the pages you need to navigate through to reach the data (pagination, category pages, authentication flows). Add 30–50% to your estimate before selecting a plan tier.
Related reading:
No credit card. No setup. Run your first operation in under a minute.