Back to news
AnalysisJune 24, 2026· 3 min read

IBM Open-Sources CUGA: Build Agents in One File, Two Dozen Examples Included

CUGA is a lightweight agent harness that handles orchestration, tool binding, and state management so you write only a tool list and prompt. IBM shipped 24 working single-file apps to prove the pattern works.

Our Take

CUGA inverts the usual agent dev tax by pre-assembling the plumbing (planning, reflection, tool calls, state tracking) so you configure instead of build, but the real win is the policy system that governs production agents without a rewrite.

Why it matters

Most agentic apps waste a week on orchestration before touching the actual task. Teams deploying agents to production need guardrails baked into the runtime, not bolted on afterward—CUGA ships both.

Do this week

Clone cuga-apps, read the IBM Cloud advisor example end-to-end, then audit whether your current agent codebase could fit the same shape (tool list + prompt + policies in one file) before starting a rewrite.

CUGA bundles agent orchestration into a configurable harness

IBM Research released CUGA (Configurable Generalist Agent), an open-source Python harness that handles planning, execution loops, tool calls, state management, and reflection so developers only write a tool list and a prompt. The package ships as pip install cuga and comes with two dozen single-file example apps, from a movie recommender to an IBM Cloud architecture advisor.

The core API is minimal: instantiate a CugaAgent with a model, a tool list, and special_instructions, then await agent.invoke(...). Everything below that line is the harness. Tools bind uniformly across OpenAPI, MCP (Model Context Protocol), and LangChain functions; the agent automatically interleaves tool calls with generated code execution (CodeAct); state is held and versioned internally, not lost between steps; and a reflection stage catches bad calls and re-plans rather than barreling ahead.

CUGA configures a cost/latency tradeoff from environment variables (Fast, Balanced, or Accurate reasoning modes) without code changes. The same agent definition runs on gpt-oss-120b, OpenAI, Anthropic, watsonx, LiteLLM, or Ollama depending on a single LLM_PROVIDER variable. Sandboxing for code execution is also swappable: local, Docker/Podman, or E2B cloud.

The example library matters as much as the harness itself. Every app follows the same skeleton (tool list, inline functions, prompt, FastAPI routes) so patterns are visible across domains. The cloud advisor verifies every service recommendation against the IBM catalog before naming it. Paper Scout ranks arXiv results by citation count. Ouroboros coordinates seven agents for lead generation. Meetup Finder automates browser-based event scraping through Playwright.

Production agents need governance built into the runtime

The harness includes a declarative policy system that controls agent behavior without rewrites. Six policy types answer different safety questions: Intent Guards refuse requests outright before tool selection; Tool Approval gates risky tool invocations after code generation; Tool Guides steer how specific tools are used; Playbooks pin known-good procedures for recurring tasks; Output Formatters enforce response shape; and CustomPolicy is the escape hatch.

Policies live in a .cuga folder versioned next to code, not drifting in separate config files. Matching is semantic, not just keyword-based; policies fire on user intent (via sqlite-vec similarity) and can also trigger on agent state or specific tool invocations. Timing matters: Intent Guards check before planning, Tool Approval fires after code generation, Output Formatter runs only once the final message exists.

The convention that makes this work is mechanical but load-bearing: every tool returns the same envelope ({ok: true, data: {...}} on success; {ok: false, code: "...", error: "..."} on failure). This envelope lets the harness distinguish declared failures (which the planner recovers from gracefully) from undeclared exceptions (which derail the run). Across the example apps, reliability correlated with strict adherence to this contract.

Start by reading one app, then audit your tool interface

The entry point is the IBM Cloud advisor example. It's one file: a factory function that builds the agent with four arguments (model, tools, special_instructions, cuga_folder), a tool that searches the IBM catalog (one inline function with a docstring that the agent reads), borrowed web tools from MCP, and a prompt that enforces "never invent service names." The FastAPI routes are ordinary web code; state is a per-thread_id dict that only the agent writes to through tools.

If you're already running agents in production, check whether your tools consistently return a success/failure envelope. If they throw bare exceptions, the agent will stumble on undeclared errors. If they're scattered across multiple files or wrapped in inconsistent adapters, CUGA's uniform binding might cut down on plumbing. And if you're bolting governance on top after the fact (separate approval layers, external policy stores), the runtime policies baked into CUGA flatten that architecture.

The cuga-apps repo includes a HOW_TO_BUILD_AN_APP_FAST.md guide and a tool explorer to test MCP tools from a web form before wiring them in. The live gallery tags apps as "showcase" or "additional" and defaults to showcase; start with the cloud advisor or movie recommender as a working baseline.

#Agents#Open Source#Developer Tools#Enterprise AI
Share:
Keep reading

Related stories