Ai Agents 5 min read

How to Build Single-File Agents With IBM's CUGA Framework

Learn how to manage execution loops, state tracking, and secure tool invocations using the CUGA agent harness and its new FastAPI application templates.

IBM Research’s June 2026 release of cuga-apps provides 24 single-file agentic applications built on the Configurable Generalist Agent (CUGA) framework. The framework operates as a lightweight agent harness for enterprise systems, handling the underlying infrastructure of planning, execution loops, tool invocation, and state management. You can build complex autonomous systems by supplying a tool list and a prompt while the framework manages the routing and error handling.

Framework Architecture and Installation

The base CUGA framework abstracts the orchestration layer entirely. Unlike heavier agent frameworks that require extensive custom classes for memory and routing, CUGA relies on a dynamic task ledger to handle the plumbing.

The framework requires Python and installs directly from PyPI. Run the following command to add it to your environment:

bash pip install cuga

The June 23 release specifically highlights a read-and-copy design philosophy. Every application in the cuga-apps collection is structured as a single FastAPI file. You can drop these files directly into your repository and deploy them immediately. The framework supports one-click model provider switching through standard environment variables. You can route inference requests to OpenAI, watsonx, or Ollama simply by updating your local configuration file, requiring zero changes to the underlying agent logic.

Reasoning Modes and Execution Loops

CUGA implements a built-in Plan-Execution-Reflection loop. This architecture allows the agent to recognize when an API call fails, analyze the error message, and refine its execution path. This self-correction is critical for long-horizon tasks where a single malformed JSON payload would otherwise crash a linear pipeline.

You configure the agent’s behavior by selecting one of three distinct reasoning modes. The selected mode dictates how the system balances token consumption against operational latency and task accuracy.

Reasoning ModePrimary Use CaseExecution Mechanism
FastSimple operations and single-step tasksRelies on lightweight heuristics with minimal planning overhead.
BalancedStandard API orchestration and data retrievalEmploys medium-level reasoning suitable for common enterprise workflows.
AccurateComplex, multi-step logic and critical systemsExecutes deep planning sequences and multiple reflection loops for self-correction.

The framework’s execution capabilities are validated by its benchmark performance. The Accurate mode currently holds the number one rank on the AppWorld benchmark, a rigorous evaluation system encompassing 750 tasks across 457 discrete APIs. The framework also held the top position on the WebArena autonomous web agent leaderboards for the majority of 2025.

Tool Integration and Sandboxed Execution

Enterprise agents require reliable and secure pathways to interact with external systems. CUGA supports native tool binding for OpenAPI specifications, LangChain functions, and the Model Context Protocol.

The cuga-apps repository includes a comprehensive CRM Integration template. This specific application demonstrates a multi-step workflow where the agent reads local contact files, filters the data against a live CRM API, and drafts personalized summary emails based on the query results. Another template, the IBM Cloud Architecture Advisor, functions as an expert-level agent capable of analyzing system requirements and providing structural guidance using the same tool-binding mechanics.

Executing arbitrary code generated by an LLM introduces severe security risks. CUGA natively routes generated code execution into isolated environments. You can configure the harness to execute code locally during development, or route it into a standard Docker container for production workloads. For strict isolation, the framework integrates directly with E2B sandboxes. The official framework documentation covers the exact environment variables and configuration parameters required to bind E2B session tokens to the execution runtime.

State Management With the Task Ledger

State persistence remains a common failure point in long-running autonomous operations. Agents often lose track of previous steps or hallucinate data when their context windows fill up with redundant logs. CUGA mitigates this through a proprietary dynamic task ledger.

The task ledger provides smart variable management. Instead of continuously appending raw JSON responses and execution logs to the prompt context, the framework tracks execution results and state transitions in a structured, isolated format. The agent references this dynamic ledger to verify previous tool outputs before planning its next step. This design keeps the active context window lean. It prevents the underlying model from fabricating API responses during extended operations because the ground truth is strictly maintained outside the conversational context.

Enterprise Auditing and Tradeoffs

For organizations operating under strict compliance requirements, CUGA serves as the execution layer for IBM’s Sovereign Core. This integration ensures that enterprises can inspect, log, and audit the agent’s reasoning logic at every step of the Plan-Execution-Reflection loop.

However, the single-file FastAPI approach introduces organizational constraints as your application grows. Storing routing logic, comprehensive tool definitions, and complex system prompts in a single file becomes difficult to maintain when scaling beyond the scope of the provided template applications. You will likely need to refactor the single-file structure into modular components if your tool registry exceeds a dozen endpoints.

Furthermore, while the AppWorld benchmark results indicate strong orchestration capabilities, you must rigorously evaluate and test the system against your internal endpoints. The heuristics used in the Fast mode prioritize speed over validation and may fail silently if your APIs return non-standard error codes or unexpected payload structures.

Next Steps

If your deployment strategy favors visual orchestration over code-first development, CUGA is available as a native agent component in Langflow 1.10. This integration enables low-code visual building of agentic flows where you can connect CUGA nodes to other pipeline components. For developers building directly in Python, inspect the live gallery of the 24 reference applications on Hugging Face Spaces. The interactive playground allows you to test the Movie Recommender and CRM templates before writing any implementation code.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading