
Playwright is the correct answer to scraping JavaScript-heavy websites. Until it isn't.
For single tasks, small volumes, and sites you control or monitor closely, Playwright handles dynamic content better than any alternative. It executes JavaScript, waits for network requests to complete, handles SPAs, and gives you CDP-level control over every interaction. The answer to "how do I scrape a dynamic website?" is almost always Playwright or a managed version of it.
But there's a point where Playwright stops being the right tool—not because something better exists for the same problem, but because the problem itself has changed.
Playwright is right when:
The problem has changed when:
The core issue with scraping dynamic websites is that HTTP requests don't get you the final DOM—they get you the HTML skeleton that JavaScript populates after load. Playwright launches a real Chromium instance, executes the JavaScript, and returns the fully-rendered DOM. This solves the fundamental problem.
For developer tools documentation, SaaS product pages, React-rendered content, and most modern sites that use frameworks to populate content client-side, Playwright is the right call. The API is clean, the tooling (Trace Viewer, codegen) is excellent, and Python and TypeScript bindings are both mature.
Crawl4AI, Selenium, and Splash all solve the same rendering problem with different trade-offs. Selenium has broader language support and legacy ecosystem depth. CrawlAI outputs LLM-ready markdown natively. Splash integrates with Scrapy. Playwright tends to win on developer experience and execution speed for new projects.
Rendering JavaScript is step one. The harder problems show up at step two.
Authentication and session state. Scraping authenticated portals—supplier pricing pages, internal dashboards, gated data—requires maintaining session state across runs. Playwright supports this through persistent browser profiles and context storage, but the implementation isn't trivial: you need to handle login flows, session expiry, re-authentication on timeout, and cookie management. Multiply this across 50 different portals with different session architectures and it becomes significant engineering work.
Selector maintenance. Every page.locator() call you write is a dependency on the current DOM structure of the target site. When the site redesigns—which happens on a schedule you don't control—your selectors break. This is the largest hidden cost of Playwright-based scraping at scale: not the initial development, but the ongoing maintenance of selectors against sites that update without notice. A 50-site scraping operation at realistic update frequencies means debugging broken selectors several times per week.
Concurrency economics. Playwright instances consume significant memory—each headless Chrome process runs 100–300MB depending on the page. Running 10 concurrent instances on a standard server is manageable. Running 50 starts to require dedicated infrastructure, session pooling, and process management. Running 200 in parallel requires a distributed architecture. This overhead is real and scales linearly with the number of concurrent tasks.
Anti-bot evolution. Sites protected by DataDome, Kasada, or Cloudflare Bot Management update their detection logic on a schedule that isn't public. A Playwright script that worked last month may fail today not because you changed anything, but because the detection system got smarter. Managing this requires either ongoing tuning or a solution that maintains the counter-detection layer for you.

The threshold where Playwright stops being the right tool is roughly: when the engineering overhead of maintaining the scraping infrastructure costs more than the value of the data, or when you need concurrency and session management at a scale that requires dedicated infrastructure.
At that point, the options are:
Managed scraping services (Zyte, Scrapfly, Firecrawl): Handle browser rendering in the cloud. You still write extraction logic, but you don't manage browsers. Better for extraction at scale; still selector-dependent.
AI-powered extraction (Firecrawl /extract, Crawl4AI with AI models): Use LLMs to identify and extract data without explicit selectors. More resilient to layout changes. Works well when the target structure is consistent enough for a model to generalize.
AI web agents (TinyFish): You describe what you want in plain English; the agent handles navigation, rendering, authentication, and extraction. No selectors to maintain. Effective for multi-step workflows and authenticated portals where writing and maintaining explicit navigation logic is the primary cost. The trade-off is cost per task compared to optimized custom scripts.
The honest decision framework is about where your engineering time goes. If you're spending more time maintaining scrapers than using the data, the architecture has shifted.
Playwright stays the right tool when:
The architecture shifts when:
TinyFish is designed for the second set of cases. Test it against your actual authenticated, protected, or multi-step targets: 500 free steps, no credit card, no setup.
---
Yes, but the right alternative depends on what "dynamic" means for your target. If the site loads content via JavaScript after the initial page load, you need a tool that executes JavaScript—Playwright, Selenium, Puppeteer, or a managed service like Firecrawl. If the site just has complex DOM structure but renders server-side, you can use requests + BeautifulSoup without a full browser. The key question is whether the data you need exists in the initial HTML response or only after client-side JavaScript executes.
Several common causes: headless Chrome behaves differently from headed Chrome in ways that some sites detect (missing GPU, different canvas fingerprint, different WebGL signatures). Production environments often have different IP ranges that are more likely to be flagged. Anti-bot systems that allow low-volume traffic from a single IP will block the same traffic at higher volume or from cloud IP ranges. Resource constraints in containerized environments also cause timing issues that don't appear locally.
Both execute JavaScript in a real browser. Playwright is generally faster (async architecture, parallel contexts), has better developer tooling (Trace Viewer, codegen), and handles modern async patterns more cleanly. Selenium has broader language support (Java, C#, Ruby) and a larger ecosystem of existing test infrastructure. For new Python or TypeScript scraping projects, Playwright is usually the better choice. For Java enterprise teams or projects with existing Selenium investment, Selenium's advantages outweigh the migration cost.
With Playwright: save cookies/session storage after logging in and restore them on subsequent runs. Playwright supports this through page.context().storageState(). The challenge at scale is managing session expiry, re-authentication, and session state across many targets simultaneously. For authenticated scraping across many portals, AI agents that handle login as part of the goal description are often more practical than maintaining session management logic per-site.
Firecrawl is better when your goal is extracting clean, LLM-ready content from public pages—documentation, blog posts, product pages. It abstracts browser management and outputs clean markdown natively. Playwright is better when you need fine-grained control over interactions, authenticated access, or behavior that Firecrawl's extraction layer doesn't support. For anything requiring multi-step navigation or login, neither Firecrawl nor standard Playwright is purpose-built—that's where agents are more practical.
No credit card. No setup. Run your first operation in under a minute.