The Problem

The Solution

The Results

ONLINE-MIND2WEB BENCHMARK

0.0%

The highest score ever recorded on the Online-Mind2Web benchmark.

300

Tasks

136

Websites

Benchmarks and Grit

AI brought benchmarks back with a vengeance.

The problem is, most of them are easy to game and the difference between first place and fifth is often so small it doesn't matter. They've become little more than leaderboard theater.

And yet, when you're building a new category, you need a measuring stick. Something external. Something objective that you didn't design, you didn't train for all just to make yourself look good.

The Robotic Web

At TinyFish, we're building the robotic web. This is the infrastructure for AI agents and robots to operate the internet autonomously. It's not a browser agent clicking around on your behalf. It's a foundation for running your most critical web tasks.

You set a goal. It executes at scale. It returns an accurate result. It's that simple.

But, measuring success isn't simple. It's not one metric. It's task completion, accuracy, scale, performance, cost - all together.

Mind2Web

We're the only company building real web agents right now. It's a strange position to be in. We have no competitors to benchmark against, but we still need to prove our enterprise web agents work. So we started with Mind2Web.

Mind2Web is an open research dataset. It covers 300 tasks across real websites, categorized into easy, medium, and hard. It was designed to test browser agents navigating and completing goals in the real world.

We took all 300 tasks, fed them to our platform, Mino, and ran them in parallel.

Agent

Easy

Medium

Hard

Global

Mino
56%
2%
84%
2%

Gemini 2.5

77.1%

71.3%

55.4%

69%

Operator

83.1%

58%

43.2%

61.3%

Browser Use

55.4%

26.6%

8.1%

30%

Mino
83.2%
Easy
92.56%
Medium
83.2%
Hard
73.84%

Easy

77.1%

Medium

71.3%

Hard

55.4%

Easy

83.1%

Medium

58%

Hard

43.2%

Easy

55.4%

Medium

26.6%

Hard

8.1%

"Here's what matters about Mino: it understands the goal, not just the steps. And it doesn't give up."

The Philosophy Behind Mino

Grit

In AI, everyone obsesses over intelligence. But the truly disruptive factor is resilience.

Most agents are "smart," but they are also fragile, breaking at the first sign of change. We built Mino to not only be intelligent and robust. We built it to be relentless.

When Mino hits a website and encounters bot detection, it doesn't quit. It retries with a proxy. If a dropdown menu blocks standard automation, it hunts for a deep link and goes direct.

It might take a few minutes. But it always returns the result.

Success Rate
(Customer use cases)

Retry Attempts

0.0%

Uptime

Bot Detection

Encounters defense systems

→

Proxy Retry

Retries with proxy rotation

→

Deep Link

Finds alternate route

The Product

We've packaged the agent, its grit, inference infrastructure, cloud browsing at scale, and proxy handling into a single product with simple pricing.

No hidden costs.

No optimization.
No tuning.
No gimmicks.

This was our first run against Mind2Web. It outperformed everything else we tested, including BrowserUse and Gemini.

Run it yourself.

We're publishing the methodology and results in full. Because Mino is real, our results are verifiable, and we're not hiding behind leaderboard theater.

Request Demo