Holo3: AI that masters using enterprise computers | Keryc
They present Holo3, a generation of models designed so machines not only understand, but actually execute real tasks inside enterprise digital environments. What does that look like in practice? A system that sees, reasons and acts on interfaces like an expert user — but with consistency and traceability.
What Holo3 is and why it matters
Holo3 is the latest version aimed at the so-called Autonomous Enterprise. Practically speaking, it’s a model trained to navigate desktop and web applications, complete multi-step flows and keep context across complex tasks. In the OSWorld-Verified benchmark it reaches 78.85% with the Holo3-122B-A10B variant, setting a new reference in computer-use specific tests.
Technically, the family totals 122 billion parameters, but it achieves its performance with only 10 billion "active" parameters at inference. Why does that matter to you? Because it lowers run costs without giving up capabilities, positioning itself as an efficient alternative to larger proprietary models the company mentions.
How it was trained: the 'agentic flywheel'
What makes Holo3 special is not just the architecture, but the training pipeline they call the agentic flywheel. It’s a continuous feedback loop that sharpens two pillars: perception and decision-making. Here are the key steps:
Synthetic Navigation Data: they generate navigation examples from human and automated instructions, reproducing typical and atypical scenarios.
Out-of-Domain Augmentation: they programmatically expand situations so the model handles the unexpected, not just ideal cases.
Curated Reinforcement Learning: each sample is filtered and incorporated with reinforcement learning techniques to maximize effective behavior, not just statistical fit.
If you’re not familiar with the terms, think of this like teaching an assistant to use any program through hundreds of automated practice runs and reward bets for correct outcomes.
The Synthetic Environment Factory
To validate and speed up learning, they built a factory of synthetic environments. Coding agents generate sites and applications according to scenario specs, then verification scripts test end-to-end tasks. That lets them build reproducible tests and measure progress in conditions very similar to real enterprise systems.
Benchmarks and real-world tests
Beyond OSWorld, they designed the H Corporate Benchmarks: 486 real multi-step tasks in four categories: E-commerce, Business software, Collaboration and Multi-App. The suite measures everything from simple one-app tasks to long-range flows that need coordinating data across documents, PDFs, spreadsheets and different systems.
A concrete example: the agent extracts equipment prices from a PDF, cross-checks that information with each employee’s remaining budget and sends personalized approval or rejection emails — all without losing state or the original intent. That requires document parsing, calculations and sustained reasoning.
In comparisons, Holo3 outperforms base models like Qwen3.5 on single-application tasks, suggesting the agentic flywheel compensates for fewer active parameters with greater specialization.
Availability, licensing and cost
Holo3-35B-A3B: open weights published on Hugging Face under the Apache-2.0 license, which makes experimentation and local deployment easier.
Models on the inference API: accessible through their Inference API, with a free tier for testing.
Efficiency: the idea of having only 10B active parameters aims to offer lower cost per inference compared to larger commercial models.
That means mid-sized companies can test or integrate agents with competitive performance without investing in massive infrastructure.
Current limits and next steps
Holo3 is a milestone, not a final solution. Typical concerns remain: robustness against adversarial data in production, security for automations that take actions in real systems, and governance to prevent errors at scale. The company acknowledges this and aims to evolve toward what they call Adaptive Agency: agents that not only use known tools, but learn to handle new software in real time.
My quick technical read: the approach mixes synthetic data, RL and automated verification effectively. That pushes domain transfer in ways traditional benchmarks don’t fully capture. However, the real test will be stability in enterprise environments with sensitive data and processes that cannot fail.
What should companies ask themselves today?
Do you have repetitive flows where an agent could reduce friction and mistakes? If yes, Holo3 and its open variants are interesting candidates.
How much control and auditability do you need to automate actions? Secure, traceable integration must be designed from the start.
Will you experiment with open weights or prefer the convenience of a managed API? Both paths are available from the company.
Holo3 shows that specializing training for interface tasks can be more valuable than inflating parameter counts. The near future of enterprise AI is models that don’t just reply with text, but navigate, calculate and execute real processes with responsibility and clear metrics.