ONLINE-MIND2WEB BENCHMARK
The highest score ever recorded on the Online-Mind2Web benchmark.
AI brought benchmarks back with a vengeance.
The problem is, most of them are easy to game and the difference between first place and fifth is often so small it doesn't matter. They've become little more than leaderboard theater.
And yet, when you're building a new category, you need a measuring stick. Something external. Something objective that you didn't design, you didn't train for all just to make yourself look good.
At TinyFish, we're building the robotic web. This is the infrastructure for AI agents and robots to operate the internet autonomously. It's not a browser agent clicking around on your behalf. It's a foundation for running your most critical web tasks.
You set a goal. It executes at scale. It returns an accurate result. It's that simple.
But, measuring success isn't simple. It's not one metric. It's task completion, accuracy, scale, performance, cost - all together.
We're the only company building real web agents right now. It's a strange position to be in. We have no competitors to benchmark against, but we still need to prove our enterprise web agents work. So we started with Mind2Web.
Mind2Web is an open research dataset. It covers 300 tasks across real websites, categorized into easy, medium, and hard. It was designed to test browser agents navigating and completing goals in the real world.
We took all 300 tasks, fed them to our platform, Mino, and ran them in parallel.
"Here's what matters about Mino: it understands the goal, not just the steps. And it doesn't give up."
The Philosophy Behind Mino
In AI, everyone obsesses over intelligence. But the truly disruptive factor is resilience.
Most agents are "smart," but they are also fragile, breaking at the first sign of change. We built Mino to not only be intelligent and robust. We built it to be relentless.
When Mino hits a website and encounters bot detection, it doesn't quit. It retries with a proxy. If a dropdown menu blocks standard automation, it hunts for a deep link and goes direct.
It might take a few minutes. But it always returns the result.
We've packaged the agent, its grit, inference infrastructure, cloud browsing at scale, and proxy handling into a single product with simple pricing.
This was our first run against Mind2Web. It outperformed everything else we tested, including BrowserUse and Gemini.
We're publishing the methodology and results in full. Because Mino is real, our results are verifiable, and we're not hiding behind leaderboard theater.