IBM's CUGA: two dozen agentic apps and a lightweight harness | Keryc
CUGA puts in your hands what is usually the boring and fragile part of any agent: the pipeline, the state, the guardrails and the orchestration. Want to write the business logic and not rebuild the infrastructure every time? Then this is for you. Install it with pip install cuga and check the cuga-apps: 24 single-file examples that test the approach in practice.
What CUGA does and why it matters
CUGA is an open-source agent harness from IBM that takes orchestration off your plate. The planning, execution, tool calls, state handling and reflection steps are already solved. What’s left is the creative part: the list of tools the agent can use and the prompt.
Sounds like a big promise? The interesting bit isn’t a single piece on its own, but that they come preassembled. That means you configure instead of wiring everything. If you already wrote a FastAPI route, you can read any of these examples line by line and understand it without learning a new framework.
Minimal architecture that actually saves work
The core idea is simple: the agent plans before acting, executes by mixing tool calls and generated code, stores variables and reflects to fix broken plans. That machinery lets smaller, open-weight models endure long tasks, because the harness takes on work many models would otherwise have to shoulder.
Four arguments. The model comes from a factory that reads an environment variable and can talk to OpenAI, Anthropic, watsonx, Ollama or a local model. tools and special_instructions are what you actually write. cuga_folder stores state and policies.
Tools, MCP and the convention that saves fragile runs
Apps mix two types of tools: inline, app-specific functions and generic tools provided by MCP servers. The practical win is simple: you write the function that’s yours and borrow web search, geocoding or knowledge libraries without standing them up yourself.
Inline tool pattern:
A normal Python function with a docstring that explains its use.
The harness decides when to call it.
Important convention about responses: every tool should return a small, consistent envelope. Success:
{"ok": true, "data": {...}}
Failure:
{"ok": false, "code": "...", "error": "..."}
That detail is key. If a tool raises a naked exception the planner can die. If it returns the failure envelope, CUGA knows how to handle it and either continue or replan.
The public MCP servers hosted for demos contain 36 useful tools: web search, arXiv, geocoding, finance, etc. A simple bridge links them and the live gallery includes an MCP Tool Explorer so you can try each tool before integrating it into an agent.
Planning, modes and cost-latency
CUGA exposes three reasoning dials you can configure without touching code: Fast, Balanced and Accurate. Those modes tune the tradeoff between cost and latency. The practical idea is you don’t always need the biggest model: with the harness, a gpt-oss-120b can be enough for many apps.
It also includes CodeAct: a mix of tool calls and execution of generated code in a sandbox you control (local, Docker/Podman or cloud). That lets the same agent definition run quickly in a dev environment or be governed in production by changing only configuration.
Policies and built-in governance
CUGA doesn’t leave governance as an afterthought. The runtime ships a policy system versioned in the .cuga folder and applied at different moments in the flow.
Types of policy:
Intent Guard: can reject a request before the agent chooses tools.
Tool Approval: asks for human approval before running a risky tool.
Tool Guide: directs how a tool should be used without rewriting it.
Playbook: fixes an approved procedure for recurring tasks.
Output Formatter: forces the final response to a concrete shape.
CustomPolicy: an escape hatch for specific rules.
Timing matters: Intent Guard runs before tools are selected; Tool Approval runs after the agent generates its code and checks which tools it uses; Output Formatter runs at the end. Triggers aren’t just keyword matches — they also use a semantic store (sqlite-vec) to match the real intent.
Quick example of intent blocking:
await agent.policies.add_intent_guard(
name="Block force-push",
keywords=["--force", "--no-verify"],
response="Blocked: destructive git flags are not permitted.",
)
Multi-agent, skills and learning on the job
When a task grows, instead of increasing a single context, CUGA suggests splitting work: a CugaSupervisor delegates to specialist CugaAgents, each with its own context and tools. The supervisor only decides delegation, keeping the planning surface manageable.
Skills are folders with a SKILL.md that the agent loads only when needed. With ALTK-Evolve, an agent can refine a skill from its own executions, so what you learn today improves future runs.
Ouroboros, an example app, shows this in practice: a supervisor over seven specialists that generate lead-gen pipelines. Adding an eighth specialist is one line in the factory, not a rewrite of the coordinator.
Sovereign production: Boundary Isolation and Sovereign Core
Governed deployment doesn’t force you to rewrite the agent. IBM took CUGA to what they call Sovereign Core, where each agent runs in transient, isolated containers inside the same logical customer boundary. Model, data and control plane stay inside the boundary.
By default the demos run air-gapped with gpt-oss-120b. Each reasoning step emits local OpenTelemetry traces to Grafana Tempo in-tenant. In other words: nothing leaves the boundary and the same agent definition redeploys as-is.
How to quickly get started with the cuga-apps
Basic steps:
git clone https://github.com/cuga-project/cuga-apps.git
cd build
cp .env.example .env # configure LLM_PROVIDER + LLM_MODEL and optional keys
docker compose up --build # the first build is large
# open http://localhost:8080
Or just install the runtime: pip install cuga and explore the live gallery. Open apps/ibm_cloud_advisor/main.py to see the full pattern: an inline tool that queries the IBM catalog and MCP calls. Change the system prompt, add a tool and watch the behavior.
Practical tips and common pitfalls
Follow the tool-envelope convention: avoid naked exceptions.
Start from apps labeled ship-ready, for example the Cloud Advisor or the Movie Recommender.
If a task is long, the reflection and variable tracking that CUGA provides are often the difference between success and failure.
For governance, consider controlling the few tools that touch the outside world instead of inventing an entire control layer from scratch.
The final lesson is simple: an agent can be a file you understand. Write the tools and the prompt, and let the harness carry the complexity. When the application scales, governance is already in the runtime and the same agent can be redeployed in closed environments. Ready to try it? Clone, install and play with an app; the toolbox is already assembled.