November 17, 2025
Technology

When Web Agents Need to Remember What They're Doing

Web agents operating at scale need different memory strategies—matching durability guarantees to workflow behavior determines what infrastructure you actually need to build.
Web Agents
AI at Scale
November 17, 2025
Technology

When Web Agents Need to Remember What They're Doing

AI summary by TinyFish
  • How long a workflow needs to “remember” drives everything about web agent reliability and infrastructure design.
  • Stateless vs. stateful is a false binary; workflows exist on a spectrum of durability.
  • Short tasks can restart; longer, multi-step workflows need sessions, checkpoints, or full state management.
  • Overbuilding durability slows teams down as much as underbuilding it breaks them.
  • Map workflow duration to the right level of state persistence before you build.
  • When we're running web agents that monitor hotel inventory across thousands of properties—each with different authentication flows, regional variations, and rate limits—one question shapes everything else: how long does this workflow need to remember what it's doing?

    That question drives infrastructure complexity, recovery mechanisms, and what workflows you can run reliably. State persistence exists on a spectrum, and understanding where your workflows fall tells you what you actually need to build.

    How the Framework Emerged

    Operating web agents at scale taught us that "stateless versus stateful" oversimplifies the problem. What matters is matching durability guarantees to workflow behavior.

    A quick availability check that completes in seconds? If it fails halfway through, just restart. The workflow is fast enough that preserving intermediate state costs more than re-running it. But a multi-step verification flow that authenticates, navigates regional booking systems, extracts structured data, and validates against business rules? That workflow carries context across steps, potentially spans hours, and losing progress means wasted compute and delayed results.

    The durability spectrum runs from fully stateless execution (no memory between attempts) through session-based state (temporary context during active work) and checkpointing (save progress at milestones) to comprehensive state management (full execution context survives anything).

    Where your workflows fall tells you what infrastructure you need.

    Applying the Framework

    Ask yourself: what happens if this workflow gets interrupted?

    Workflows measured in seconds can restart from scratch. Annoying, but acceptable. The infrastructure stays simple, no state storage needed, no recovery mechanisms. Microsoft's Agent Framework makes this explicit: agents are stateless and don't maintain state internally between calls.

    Workflows measured in minutes that need context across steps require session-based state. The agent remembers login credentials and navigation state while actively working. If the session expires during a rate limit pause or when authentication tokens time out, you start over. For workflows that take 10-15 minutes, this is manageable. For workflows coordinating across multiple regional systems over an hour, it's a reliability problem.

    Workflows measured in hours that coordinate across systems need checkpointing. When we're extracting data across thousands of properties with authentication labyrinths and regional variations, checkpointing means pauses for rate limits or transient failures don't force restarts from the beginning. The workflow saves progress at milestones and resumes from there.

    When state loss means business impact—multi-agent coordination, operations spanning days, scenarios where reliability requires infrastructure—comprehensive state management becomes necessary. Full execution context survives anything.

    What Teams Misjudge

    The teams we see struggle most aren't the ones who choose stateless when they need durability. That breaks obviously during testing.

    It's the teams who build comprehensive state management for workflows that don't need it, then discover they've committed to infrastructure complexity that slows everything down.

    A single-shot price check doesn't need the same durability guarantees as a multi-day verification workflow coordinating across systems. Without this framework, teams either over-engineer simple workflows or under-engineer complex ones. Dapr's distinction between ephemeral agents and durable agents illustrates this fork: synchronous interaction versus asynchronous, autonomous execution with persistent state.

    Map your workflow patterns to durability requirements before architectural decisions become commitments. The answer shapes infrastructure complexity, recovery mechanisms, and what workflows you can reliably run at scale. More importantly, it shows you what you can avoid building.

    Share article
    Rina Takahashi