The Problem
The Solution
The Results

ONLINE-MIND2WEB BENCHMARK

0.0%

The highest score ever recorded on the Online-Mind2Web benchmark.

300
Tasks
136
Websites

Benchmarks and Grit

AI brought benchmarks back with a vengeance.

The problem is, most of them are easy to game and the difference between first place and fifth is often so small it doesn't matter. They've become little more than leaderboard theater.

And yet, when you're building a new category, you need a measuring stick. Something external. Something objective that you didn't design, you didn't train for all just to make yourself look good.

The Robotic Web

At TinyFish, we're building the robotic web. This is the infrastructure for AI agents and robots to operate the internet autonomously. It's not a browser agent clicking around on your behalf. It's a foundation for running your most critical web tasks.

You set a goal. It executes at scale. It returns an accurate result. It's that simple.

But, measuring success isn't simple. It's not one metric. It's task completion, accuracy, scale, performance, cost - all together.

Mind2Web

We're the only company building real web agents right now. It's a strange position to be in. We have no competitors to benchmark against, but we still need to prove our enterprise web agents work. So we started with Mind2Web.

Mind2Web is an open research dataset. It covers 300 tasks across real websites, categorized into easy, medium, and hard. It was designed to test browser agents navigating and completing goals in the real world.

We took all 300 tasks, fed them to our platform, Mino, and ran them in parallel.

Agent
Easy
Medium
Hard
Global
Mino
92.56%
83.2%
73.84%
83.2%
Gemini 2.5
77.1%
71.3%
55.4%
69%
Operator
83.1%
58%
43.2%
61.3%
Browser Use
55.4%
26.6%
8.1%
30%
Mino
83.2%
Easy
92.56%
Medium
83.2%
Hard
73.84%
Gemini 2.5
69%
Easy
77.1%
Medium
71.3%
Hard
55.4%
Operator
61.3%
Easy
83.1%
Medium
58%
Hard
43.2%
Browser Use
30%
Easy
55.4%
Medium
26.6%
Hard
8.1%

"Here's what matters about Mino: it understands the goal, not just the steps. And it doesn't give up."

The Philosophy Behind Mino

Grit

In AI, everyone obsesses over intelligence. But the truly disruptive factor is resilience.

Most agents are "smart," but they are also fragile, breaking at the first sign of change. We built Mino to not only be intelligent and robust. We built it to be relentless.

When Mino hits a website and encounters bot detection, it doesn't quit. It retries with a proxy. If a dropdown menu blocks standard automation, it hunts for a deep link and goes direct.

It might take a few minutes. But it always returns the result.

0%
Success Rate
(Customer use cases)
0x
Retry Attempts
0.0%
Uptime
Bot Detection
Encounters defense systems
Proxy Retry
Retries with proxy rotation
Deep Link
Finds alternate route

The Product

We've packaged the agent, its grit, inference infrastructure, cloud browsing at scale, and proxy handling into a single product with simple pricing.

No hidden costs.
No optimization.
No tuning.
No gimmicks.

This was our first run against Mind2Web. It outperformed everything else we tested, including BrowserUse and Gemini.

Run it yourself.

We're publishing the methodology and results in full. Because Mino is real, our results are verifiable, and we're not hiding behind leaderboard theater.