Netomi lays out a concrete plan to bring AI agents into production inside large companies, using GPT-4.1 for fast responses and GPT-5.2 for deeper planning. The interesting part isn’t just that the models reason — it’s that Netomi puts them inside a governed execution layer that keeps actions predictable in real-world conditions.
What Netomi did and why it matters
Netomi’s bet isn’t exotic: combine models with systems engineering to solve real workflows that cross multiple systems. In practice, a single business request can touch booking engines, loyalty databases, CRM, payments and policy rules. Data is incomplete or changes fast; fragile systems break.
For that they designed their Agentic OS: an orchestration pipeline where GPT-4.1 provides low latency and trustworthy tool calls, and GPT-5.2 steps in when multi-step planning and deeper reasoning are needed. That way the models don’t just answer — they execute and complex tasks.
coordinate
Our goal was to orchestrate the many systems a human agent would normally handle and do it safely at machine speed.
Practical patterns they use to keep agents reliable
Netomi follows a battery of patterns so agents behave consistently across long, fragmented tasks:
Persistence of context: reminders that help GPT-5.2 keep reasoning across long steps.
Explicit expectations for tool use: direct GPT-4.1 to call authoritative tools to avoid invented answers during transactional operations.
Structured planning: let GPT-5.2 sketch and execute multi-step tasks in a controlled sequence.
Agent-guided multimodal decisions: use GPT-5.2 to decide when to return images, videos, forms or other rich elements.
These patterns let them map unstructured requests to multi-step workflows and maintain state across discontinuous interactions.
Lessons on latency and concurrency: why speed matters
Straight question: do you trust a system that hesitates just when you need it most? In cases like refunds during storms or traffic spikes in sporting events, latency defines trust.
Netomi breaks the traditional sequential flow (classify -> retrieve -> validate -> call tools -> generate). Instead, they design for concurrency, leveraging low-latency streaming and the stability of GPT-4.1 tool calls.
Concrete examples:
DraftKings pushes the platform to peaks exceeding 40,000 concurrent requests per second.
Under those conditions, Netomi reports responses below three seconds and 98% intent-classification accuracy.
The idea is clear: a good model is not enough; the whole architecture must stay within critical latency thresholds.
Integrated governance: security and compliance in real time
One key lesson: governance can’t be an add-on. It must sit inside the runtime so the agent knows to back off when uncertainty rises.
When intent confidence falls below the threshold, the system leaves free generation and activates controlled paths.
Technically, the governance layer handles:
Schema validation: validate every tool call against OpenAPI contracts before executing.
Policy enforcement: topic filters, brand restrictions and compliance checks applied during reasoning.
PII protection: detect and mask sensitive data in preprocessing and in responses.
Deterministic fallback: revert to safe behaviors when intent or data are ambiguous.
Runtime observability: expose token traces, reasoning steps and tool-chain logs for real-time inspection.
In regulated domains like dental insurance, this isn’t optional. One customer processes nearly two million provider queries a year and, during open-enrollment peaks, needed exactly this level of control.
What you can take away if you build agents today
Three practical principles Netomi makes clear:
Design for complexity: enterprise flows cross many systems, so plan for incomplete data and layered decisions.
Parallelize for latency: avoid strictly sequential pipelines and use low-latency models for real-time-critical parts.
Integrate governance in the runtime: let the system know when to back off and how to audit each step.
OpenAI models form the backbone of reasoning, but it’s systems engineering and operational rules that make them safe and auditable in Fortune 500 environments.
Netomi offers a valuable blueprint: it’s not just having AI that reasons, but building the infrastructure that makes it reliable in the real world. If you’re thinking about taking agentive agents to production, start with these three priorities and avoid the trap of trusting only good prompts.