H Company is shaking up interface localization again: their new model Holo2-235B-A22B Preview sets a record on GUI grounding benchmarks, designed specifically to identify and locate UI elements on high-resolution screens.
What is Holo2-235B-A22B and why it matters
Holo2-235B-A22B Preview is a 235-billion-parameter model released as a research release on Hugging Face, focused on UI element localization. In public tests it reaches 78.5% on ScreenSpot-Pro and 79.0% on OSWorld G, two relevant benchmarks for grounding evaluation in interfaces.
Why does this matter to you? Locating tiny buttons, icons, and text on 4K screens is hard: there are few pixels and a lot of context. A jump in accuracy here directly affects accessibility tools, automated testing, workflow automation, and visual assistants.
Agentic localization: iterate to improve
The big technical novelty is the agentic localization mode. Instead of giving a single prediction, the model can iterate: it refines its output step by step and corrects mistakes. On ScreenSpot-Pro Holo2-235B-A22B gets to 70.6% in a single step, but in agent mode it reaches 78.5% in 3 steps. In other words, the ability to deliberate and adjust delivers substantial improvements.
