Engineering

Web Agents for Procurement: Multi-Vendor Portal Automation Without APIs

TinyFishie·TinyFish Observer·May 2, 2026·12 min read

Your ERP has 140 approved suppliers. 23 of them have vendor portals. 11 of those portals have APIs. The other 12 require a login, a click through three menus, and a manual download.

That's where procurement automation actually breaks down — not in the systems that have APIs, but in the portals that don't. And it's exactly where RPA projects stall: the portal works until the supplier redesigns their UI, the selector breaks, and nobody notices for three days.

This article covers three procurement workflows where web agents technically outperform RPA, with working code for each pattern.

Quick Reference: What Web Agents Solve in Procurement

Supplier price discovery — pull quotes from 50+ portals in parallel, structured JSON output
PO status tracking — daily automated check across all supplier portals, exception alerts
Invoice retrieval — extract line items and download URLs from portals without EDI
Architecture fit — augments your ERP, doesn't replace it; handles the portal gap only

Why Procurement Is Stuck on Manual Portal Work

The procurement technology stack has a structural gap. ERPs handle internal workflows well — approvals, routing, three-way matching. EDI handles structured data exchange with large, technically mature suppliers.

What neither handles: mid-tier and specialty suppliers with web portals but no API, no EDI, and no integration roadmap. This is not a niche problem. It is the default state of most supplier relationships outside the top tier.

RPA was the first attempt to bridge this gap, and it works well for genuinely automatable, rules-based workflows where the interface is stable.

The problem is that vendor portals are often among the least stable web interfaces in enterprise software. They are built by different vendors, updated on irregular schedules, and never designed with automation in mind. RPA bots follow fixed scripts — when the portal changes, the script breaks. The maintenance burden compounds with every supplier you add.

The technical failure modes are specific and predictable, making web agents a structurally better fit for this problem.

Use Case 1: Automated Supplier Price Discovery Across 50+ Portals

The manual version: A buyer logs into each supplier portal, navigates to pricing, finds the SKU, copies the price into a spreadsheet. For 50 suppliers, this takes half a day. It runs monthly because daily isn't feasible.

The agent version: Run all portals in parallel. Total wall-clock time equals the slowest portal, not the sum of all portals.

pip install aiohttp

import os, asyncio, aiohttp, json

TINYFISH_API_KEY = os.environ["TINYFISH_API_KEY"]

suppliers = [
    {
        "url": "https://portal.supplier-a.com",
        "goal": "Log in and return the current price for SKU-4821. Output JSON only: {\"sku\": \"\", \"price\": 0, \"currency\": \"\", \"effective_date\": \"\", \"minimum_order_qty\": 0}"
    },
    {
        "url": "https://portal.supplier-b.com",
        "goal": "Find pricing for part number 4821. Return JSON only: {\"sku\": \"\", \"price\": 0, \"currency\": \"\", \"effective_date\": \"\", \"minimum_order_qty\": 0}"
    },
    # ... remaining suppliers
]

async def run_agent(session, supplier):
    """Run agent and extract final JSON result from SSE stream."""
    result_json = None
    async with session.post(
        "https://agent.tinyfish.ai/v1/automation/run-sse",
        headers={"X-API-Key": TINYFISH_API_KEY, "Content-Type": "application/json"},
        json={
            "url": supplier["url"],
            "goal": supplier["goal"],
            "browser_profile": "stealth"
        },
        timeout=aiohttp.ClientTimeout(total=120),
    ) as resp:
        async for line in resp.content:
            decoded = line.decode("utf-8").strip()
            if decoded.startswith("data:"):
                payload = decoded[5:].strip()
                try:
                    event = json.loads(payload)
                    # "COMPLETE" event carries the final result
                    if event.get("type") == "COMPLETE":
                        result_json = event.get("result")
                except json.JSONDecodeError:
                    pass

    # None = infrastructure failure; {"status":"failure"} = goal failure
    goal_failed = isinstance(result_json, dict) and result_json.get("status") == "failure"
    return {
        "supplier_url": supplier["url"],
        "result": result_json,
        "success": result_json is not None and not goal_failed
    }

async def main():
    async with aiohttp.ClientSession() as session:
        tasks = [run_agent(session, s) for s in suppliers]
        return await asyncio.gather(*tasks, return_exceptions=True)

results = asyncio.run(main())

# Separate successes from failures
succeeded = [r for r in results if isinstance(r, dict) and r["success"]]
failed = [r for r in results if isinstance(r, dict) and not r["success"]]

print(f"Completed: {len(succeeded)}/{len(suppliers)} portals")
for f in failed:
    print(f"  Failed: {f['supplier_url']} — check streaming_url in dashboard for trace")

Expected output per supplier:

{
  "sku": "4821",
  "price": 14.75,
  "currency": "USD",
  "effective_date": "2026-03-01",
  "minimum_order_qty": 100
}

Key distinction: A COMPLETE event in the SSE stream means the agent session finished — not that the price was found. A portal login failure returns COMPLETED with a null or error result. The code above explicitly checks result for this reason. For production use, log the streaming_url from the STREAMING_URL event — it gives you a live browser replay available 24 hours after run completion, which is essential for diagnosing portal-specific failures.

Use Case 2: Purchase Order Status Tracking

The manual version: Someone checks each portal every morning for PO delays, quantity changes, or delivery updates. With 20 active POs across 12 portals, this is 45–60 minutes of manual login-and-check daily.

The agent version: Scheduled daily run, structured output, exception-based alerting.

import os, asyncio, aiohttp, json

TINYFISH_API_KEY = os.environ["TINYFISH_API_KEY"]

po_portals = [
    {
        "url": "https://portal.supplier-c.com",
        "goal": (
            "Log in and return all open purchase orders as a JSON array. "
            "Each item: {\"po_number\": \"\", \"status\": \"\", "
            "\"expected_delivery\": \"\", \"quantity_confirmed\": 0, "
            "\"delay_flag\": false, \"notes\": \"\"}"
        )
    },
    {
        "url": "https://portal.supplier-d.com",
        "goal": (
            "Find all POs with status In Progress or Pending Shipment. "
            "Return JSON array: {\"po_number\": \"\", \"expected_ship_date\": \"\", "
            "\"delay_notification\": \"\"}. If no delays, return empty array."
        )
    },
]

async def run_agent(session, portal):
    result_json = None
    async with session.post(
        "https://agent.tinyfish.ai/v1/automation/run-sse",
        headers={"X-API-Key": TINYFISH_API_KEY, "Content-Type": "application/json"},
        json={"url": portal["url"], "goal": portal["goal"], "browser_profile": "stealth"},
        timeout=aiohttp.ClientTimeout(total=120),
    ) as resp:
        async for line in resp.content:
            decoded = line.decode("utf-8").strip()
            if decoded.startswith("data:"):
                try:
                    event = json.loads(decoded[5:].strip())
                    # SSE event types (all caps, per docs.tinyfish.ai/api-reference):
                    # STARTED → STREAMING_URL → PROGRESS → COMPLETE
                    # Final result field is "result" (not "resultJson") —
                    # confirmed against both /run-sse and /run (sync) API docs.
                    if event.get("type") == "COMPLETE":
                        result_json = event.get("result")
                except json.JSONDecodeError:
                    pass
    return {"portal": portal["url"], "result": result_json}

def extract_delays(results):
    """Parse results and return POs with delay_flag=True or delay notifications."""
    delayed = []
    for r in results:
        if not r["result"]:
            continue
        try:
            data = json.loads(r["result"]) if isinstance(r["result"], str) else r["result"]
            pos = data if isinstance(data, list) else [data]
            for po in pos:
                if po.get("delay_flag") or po.get("delay_notification"):
                    delayed.append({**po, "source_portal": r["portal"]})
        except (json.JSONDecodeError, AttributeError):
            pass
    return delayed

async def main():
    async with aiohttp.ClientSession() as session:
        results = await asyncio.gather(*[run_agent(session, p) for p in po_portals])
    delays = extract_delays(results)
    if delays:
        print(f"ACTION REQUIRED: {len(delays)} delayed POs detected")
        for d in delays:
            print(f"  PO {d.get('po_number')} at {d['source_portal']}: {d.get('notes') or d.get('delay_notification')}")
    return results

asyncio.run(main())

Use Case 3: Invoice and Document Retrieval

Use case 3 uses the same parallel execution pattern — identical code structure, different goal prompt and output schema.

Goal prompt variants by portal type:

# Portal that exposes PDF download links
goal_with_links = (
    "Log in and navigate to the invoices section. Return all invoices from the last 30 days "
    "as a JSON array: {\"invoice_number\": \"\", \"date\": \"\", \"amount\": 0, "
    "\"currency\": \"\", \"pdf_url\": \"\"}. If no PDF URL is available, set pdf_url to null."
)

# Portal that does not expose download links — extract line items directly
goal_line_items = (
    "Log in and find all invoices from the last 30 days. For each invoice extract the line items "
    "and return as JSON: {\"invoice_number\": \"\", \"date\": \"\", \"total\": 0, "
    "\"currency\": \"\", \"line_items\": [{\"description\": \"\", \"qty\": 0, \"unit_price\": 0, \"total\": 0}]}"
)

Expected output (portal with download links):

[
  {
    "invoice_number": "INV-2026-04821",
    "date": "2026-03-15",
    "amount": 4750.00,
    "currency": "USD",
    "pdf_url": "https://portal.supplier-e.com/invoices/download/04821"
  }
]

Two realistic failure modes to handle explicitly:

First, pdf_url may be a session-authenticated URL that expires after download. Retrieve and store the file immediately after the agent returns — do not cache the URL for later use.

Second, some portals render invoice data in iframes or load it via AJAX after the main page. If the agent returns an empty array despite invoices being present, re-run with browser_profile: "stealth" — the slower, more thorough execution mode is better suited to JS-heavy portal UIs.

For ERP integration, the pdf_url feeds your existing document management pipeline unchanged. The line-item JSON feeds directly into three-way matching logic in your ERP. Neither requires changes to the ERP itself.

RPA vs. Web Agent: The Three Technical Failure Modes on Procurement Portals

RPA works reliably when the interface is stable, the navigation path is deterministic, and the content is synchronously loaded. Procurement portals frequently violate all three.

Failure mode 1 — Selector fragility. A supplier updates their portal. The button with id="submit-po" is now class="btn-primary submit". The RPA bot fails silently. Someone notices three days later. An engineer fixes the selector. The supplier updates again in six weeks. This loop is not hypothetical — it is the standard maintenance cycle for any organization running RPA against vendor portals at scale.

Web agents navigate by understanding the goal, not matching a known element path. A layout change that breaks a selector does not break an agent.

Failure mode 2 — Authentication complexity. Many vendor portals require MFA, have aggressive session timeouts, or redirect through SSO flows that break fixed navigation paths. RPA requires exception handling code for each variation. Web agents handle authentication flows as part of normal execution — including TOTP-based 2FA and standard SSO redirects.

Failure mode 3 — Asynchronous content loading. Portals that load pricing or availability data via AJAX after the initial page render return empty data to RPA bots that don't wait for JS execution. The bot reports success; the extracted field is blank. Web agents wait for JS execution before attempting extraction.

The practical decision rule: if an RPA bot has broken more than twice in a year on a given portal, the maintenance cost has likely exceeded the time saved. That is the threshold for escalating to an agent.

Integration Pattern: The Portal Gap Architecture

Architecture diagram showing web agents bridging the gap between ERP systems and supplier portals without APIs

Web agents work as a data acquisition layer that feeds existing procurement infrastructure — not as a replacement for it.

Supplier portals (no API, no EDI)
          ↓
    TinyFish Web Agent
    (parallel execution)
          ↓
    Structured JSON
          ↓
  n8n / Zapier / custom middleware
          ↓
  ERP inbound interface / spend analytics / alerting

Your ERP handles approvals, three-way matching, payment processing, and audit trails. The agent layer handles the portal gap: extracting structured data from sources that have no API. The boundary between the two is clean JSON — no core ERP development required, provided your existing inbound interfaces are in place.

For SAP and Oracle environments, the agent output routes through your existing inbound interface. For spreadsheet-based tracking common in mid-market procurement, a lightweight Python or Node.js script writes agent output directly to Google Sheets or Excel via API.

Read the API Docs → · MCP Integration →

FAQ

Q: Are there compliance concerns with automating portal access?

This depends on the supplier portal's terms of service and your contractual relationship. Automating access to retrieve your own PO status and invoices is generally unproblematic. For price discovery across competitor or third-party portals, review the ToS before automating. When in doubt, formal API agreements with key suppliers are the cleaner long-term path.

Q: How should I handle portals that time out or return errors?

Every agent run produces a streaming_url — a browser replay available for 24 hours after completion. For production workflows, log this URL alongside the run result so you can trace exactly what happened on the portal. Implement retry with exponential backoff for transient failures (429, 500); alert on persistent failures rather than silently dropping them.

Q: Is 500 free steps enough to test this?

A typical price check or PO status query runs 3–8 agent steps per portal. 500 steps covers roughly 60–150 individual portal interactions — enough to validate the approach against your actual target portals before committing to production volume.

Q: What about portals that require IP allowlisting?

Use proxy_config.type "custom" with your own static-IP proxy URL — TinyFish routes traffic through your specified endpoint. TinyFish's built-in proxy (type "tetra") supports country-code selection (US, GB, CA, DE, FR, JP, AU) but uses rotating residential IPs, which won't satisfy a fixed allowlist. Bring your own dedicated proxy for that scenario.

Web Agents for Procurement: Multi-Vendor Portal Automation Without APIs

TinyFishie·TinyFish Observer·May 2, 2026·12 min read

Your ERP has 140 approved suppliers. 23 of them have vendor portals. 11 of those portals have APIs. The other 12 require a login, a click through three menus, and a manual download.

This article covers three procurement workflows where web agents technically outperform RPA, with working code for each pattern.

Quick Reference: What Web Agents Solve in Procurement

Supplier price discovery — pull quotes from 50+ portals in parallel, structured JSON output
PO status tracking — daily automated check across all supplier portals, exception alerts
Invoice retrieval — extract line items and download URLs from portals without EDI
Architecture fit — augments your ERP, doesn't replace it; handles the portal gap only

Why Procurement Is Stuck on Manual Portal Work

RPA was the first attempt to bridge this gap, and it works well for genuinely automatable, rules-based workflows where the interface is stable.

The technical failure modes are specific and predictable, making web agents a structurally better fit for this problem.

Use Case 1: Automated Supplier Price Discovery Across 50+ Portals

The agent version: Run all portals in parallel. Total wall-clock time equals the slowest portal, not the sum of all portals.

pip install aiohttp

import os, asyncio, aiohttp, json

TINYFISH_API_KEY = os.environ["TINYFISH_API_KEY"]

suppliers = [
    {
        "url": "https://portal.supplier-a.com",
        "goal": "Log in and return the current price for SKU-4821. Output JSON only: {\"sku\": \"\", \"price\": 0, \"currency\": \"\", \"effective_date\": \"\", \"minimum_order_qty\": 0}"
    },
    {
        "url": "https://portal.supplier-b.com",
        "goal": "Find pricing for part number 4821. Return JSON only: {\"sku\": \"\", \"price\": 0, \"currency\": \"\", \"effective_date\": \"\", \"minimum_order_qty\": 0}"
    },
    # ... remaining suppliers
]

async def run_agent(session, supplier):
    """Run agent and extract final JSON result from SSE stream."""
    result_json = None
    async with session.post(
        "https://agent.tinyfish.ai/v1/automation/run-sse",
        headers={"X-API-Key": TINYFISH_API_KEY, "Content-Type": "application/json"},
        json={
            "url": supplier["url"],
            "goal": supplier["goal"],
            "browser_profile": "stealth"
        },
        timeout=aiohttp.ClientTimeout(total=120),
    ) as resp:
        async for line in resp.content:
            decoded = line.decode("utf-8").strip()
            if decoded.startswith("data:"):
                payload = decoded[5:].strip()
                try:
                    event = json.loads(payload)
                    # "COMPLETE" event carries the final result
                    if event.get("type") == "COMPLETE":
                        result_json = event.get("result")
                except json.JSONDecodeError:
                    pass

    # None = infrastructure failure; {"status":"failure"} = goal failure
    goal_failed = isinstance(result_json, dict) and result_json.get("status") == "failure"
    return {
        "supplier_url": supplier["url"],
        "result": result_json,
        "success": result_json is not None and not goal_failed
    }

async def main():
    async with aiohttp.ClientSession() as session:
        tasks = [run_agent(session, s) for s in suppliers]
        return await asyncio.gather(*tasks, return_exceptions=True)

results = asyncio.run(main())

# Separate successes from failures
succeeded = [r for r in results if isinstance(r, dict) and r["success"]]
failed = [r for r in results if isinstance(r, dict) and not r["success"]]

print(f"Completed: {len(succeeded)}/{len(suppliers)} portals")
for f in failed:
    print(f"  Failed: {f['supplier_url']} — check streaming_url in dashboard for trace")

Expected output per supplier:

{
  "sku": "4821",
  "price": 14.75,
  "currency": "USD",
  "effective_date": "2026-03-01",
  "minimum_order_qty": 100
}

Use Case 2: Purchase Order Status Tracking

The agent version: Scheduled daily run, structured output, exception-based alerting.

import os, asyncio, aiohttp, json

TINYFISH_API_KEY = os.environ["TINYFISH_API_KEY"]

po_portals = [
    {
        "url": "https://portal.supplier-c.com",
        "goal": (
            "Log in and return all open purchase orders as a JSON array. "
            "Each item: {\"po_number\": \"\", \"status\": \"\", "
            "\"expected_delivery\": \"\", \"quantity_confirmed\": 0, "
            "\"delay_flag\": false, \"notes\": \"\"}"
        )
    },
    {
        "url": "https://portal.supplier-d.com",
        "goal": (
            "Find all POs with status In Progress or Pending Shipment. "
            "Return JSON array: {\"po_number\": \"\", \"expected_ship_date\": \"\", "
            "\"delay_notification\": \"\"}. If no delays, return empty array."
        )
    },
]

async def run_agent(session, portal):
    result_json = None
    async with session.post(
        "https://agent.tinyfish.ai/v1/automation/run-sse",
        headers={"X-API-Key": TINYFISH_API_KEY, "Content-Type": "application/json"},
        json={"url": portal["url"], "goal": portal["goal"], "browser_profile": "stealth"},
        timeout=aiohttp.ClientTimeout(total=120),
    ) as resp:
        async for line in resp.content:
            decoded = line.decode("utf-8").strip()
            if decoded.startswith("data:"):
                try:
                    event = json.loads(decoded[5:].strip())
                    # SSE event types (all caps, per docs.tinyfish.ai/api-reference):
                    # STARTED → STREAMING_URL → PROGRESS → COMPLETE
                    # Final result field is "result" (not "resultJson") —
                    # confirmed against both /run-sse and /run (sync) API docs.
                    if event.get("type") == "COMPLETE":
                        result_json = event.get("result")
                except json.JSONDecodeError:
                    pass
    return {"portal": portal["url"], "result": result_json}

def extract_delays(results):
    """Parse results and return POs with delay_flag=True or delay notifications."""
    delayed = []
    for r in results:
        if not r["result"]:
            continue
        try:
            data = json.loads(r["result"]) if isinstance(r["result"], str) else r["result"]
            pos = data if isinstance(data, list) else [data]
            for po in pos:
                if po.get("delay_flag") or po.get("delay_notification"):
                    delayed.append({**po, "source_portal": r["portal"]})
        except (json.JSONDecodeError, AttributeError):
            pass
    return delayed

async def main():
    async with aiohttp.ClientSession() as session:
        results = await asyncio.gather(*[run_agent(session, p) for p in po_portals])
    delays = extract_delays(results)
    if delays:
        print(f"ACTION REQUIRED: {len(delays)} delayed POs detected")
        for d in delays:
            print(f"  PO {d.get('po_number')} at {d['source_portal']}: {d.get('notes') or d.get('delay_notification')}")
    return results

asyncio.run(main())

Use Case 3: Invoice and Document Retrieval

Use case 3 uses the same parallel execution pattern — identical code structure, different goal prompt and output schema.

Goal prompt variants by portal type:

# Portal that exposes PDF download links
goal_with_links = (
    "Log in and navigate to the invoices section. Return all invoices from the last 30 days "
    "as a JSON array: {\"invoice_number\": \"\", \"date\": \"\", \"amount\": 0, "
    "\"currency\": \"\", \"pdf_url\": \"\"}. If no PDF URL is available, set pdf_url to null."
)

# Portal that does not expose download links — extract line items directly
goal_line_items = (
    "Log in and find all invoices from the last 30 days. For each invoice extract the line items "
    "and return as JSON: {\"invoice_number\": \"\", \"date\": \"\", \"total\": 0, "
    "\"currency\": \"\", \"line_items\": [{\"description\": \"\", \"qty\": 0, \"unit_price\": 0, \"total\": 0}]}"
)

Expected output (portal with download links):

[
  {
    "invoice_number": "INV-2026-04821",
    "date": "2026-03-15",
    "amount": 4750.00,
    "currency": "USD",
    "pdf_url": "https://portal.supplier-e.com/invoices/download/04821"
  }
]

Two realistic failure modes to handle explicitly:

First, pdf_url may be a session-authenticated URL that expires after download. Retrieve and store the file immediately after the agent returns — do not cache the URL for later use.

RPA vs. Web Agent: The Three Technical Failure Modes on Procurement Portals

RPA works reliably when the interface is stable, the navigation path is deterministic, and the content is synchronously loaded. Procurement portals frequently violate all three.

Web agents navigate by understanding the goal, not matching a known element path. A layout change that breaks a selector does not break an agent.

Integration Pattern: The Portal Gap Architecture

Web agents work as a data acquisition layer that feeds existing procurement infrastructure — not as a replacement for it.

Supplier portals (no API, no EDI)
          ↓
    TinyFish Web Agent
    (parallel execution)
          ↓
    Structured JSON
          ↓
  n8n / Zapier / custom middleware
          ↓
  ERP inbound interface / spend analytics / alerting

Read the API Docs → · MCP Integration →

FAQ

Q: Are there compliance concerns with automating portal access?

Q: How should I handle portals that time out or return errors?

Q: Is 500 free steps enough to test this?

Q: What about portals that require IP allowlisting?

Web Agents for Procurement: Multi-Vendor Portal Automation Without APIs

Quick Reference: What Web Agents Solve in Procurement

Why Procurement Is Stuck on Manual Portal Work

Use Case 1: Automated Supplier Price Discovery Across 50+ Portals

Use Case 2: Purchase Order Status Tracking

Use Case 3: Invoice and Document Retrieval

RPA vs. Web Agent: The Three Technical Failure Modes on Procurement Portals

Integration Pattern: The Portal Gap Architecture

FAQ

Related Reading

Start building.

Search and Fetch are now FREE for every agent, everywhere!

Production-Grade Web Fetching for AI Agents

Why Stitched Web Stacks Fail in Production

Web Agents for Procurement: Multi-Vendor Portal Automation Without APIs

Quick Reference: What Web Agents Solve in Procurement

Why Procurement Is Stuck on Manual Portal Work

Use Case 1: Automated Supplier Price Discovery Across 50+ Portals

Use Case 2: Purchase Order Status Tracking

Use Case 3: Invoice and Document Retrieval

RPA vs. Web Agent: The Three Technical Failure Modes on Procurement Portals

Integration Pattern: The Portal Gap Architecture

FAQ

Related Reading

Start building.

Search and Fetch are now FREE for every agent, everywhere!

Production-Grade Web Fetching for AI Agents

Why Stitched Web Stacks Fail in Production