Holo2-235B by H Company leads UI localization

H Company is shaking up interface localization again: their new model Holo2-235B-A22B Preview sets a record on GUI grounding benchmarks, designed specifically to identify and locate UI elements on high-resolution screens.

What is Holo2-235B-A22B and why it matters

Holo2-235B-A22B Preview is a 235-billion-parameter model released as a research release on Hugging Face, focused on UI element localization. In public tests it reaches 78.5% on ScreenSpot-Pro and 79.0% on OSWorld G, two relevant benchmarks for grounding evaluation in interfaces.

Why does this matter to you? Locating tiny buttons, icons, and text on 4K screens is hard: there are few pixels and a lot of context. A jump in accuracy here directly affects accessibility tools, automated testing, workflow automation, and visual assistants.

Agentic localization: iterate to improve

The big technical novelty is the agentic localization mode. Instead of giving a single prediction, the model can iterate: it refines its output step by step and corrects mistakes. On ScreenSpot-Pro Holo2-235B-A22B gets to 70.6% in a single step, but in agent mode it reaches 78.5% in 3 steps. In other words, the ability to deliberate and adjust delivers substantial improvements.

What is Holo2-235B-A22B and why it matters

Agentic localization: iterate to improve

Technical implications and recommendations

Practical use cases

Original source

Stay up to date!

Holo2-235B by H Company leads UI localization