Ai Agents 3 min read

Hugging Face Defines the Scaffold vs Harness Agent Architecture

Hugging Face has published a new technical glossary formalizing the structural differences between an AI agent's scaffolding and its execution harness.

On May 25, 2026, Hugging Face released a technical glossary titled Harness, Scaffold, and the AI Agent Terms Worth Getting Right to address terminology fragmentation in the agent community. The publication follows growing confusion observed at ICLR 2026, where researchers and practitioners reported using conflicting definitions for foundational architectural concepts. Co-authored by researcher Ari Goldberg and reviewed by Hugging Face teams, the document establishes a formal structural model separating the behavior-defining layers of an agent from its underlying execution runtime.

Architectural Boundaries

The glossary formalizes a structural mental model for building these systems: an agent consists of a base model plus a harness. Within this model, Hugging Face draws a strict technical line between scaffolding and the harness itself.

Scaffolding acts as the behavior-defining layer that dictates how a model perceives and interacts with its environment. This encompasses system prompts, tool descriptions, context management, and the specific output formats or schemas the model must follow.

The harness serves as the execution layer or agentic runtime. Its responsibilities are strictly operational. The harness manages the control loop that decides when to invoke the model and when to halt. It handles actual tool execution, triggering the requested APIs or code. It also manages error states, including retries, timeouts, and malformed outputs, while enforcing operational guardrails.

The Impact on Evaluation and Tooling

Isolating the scaffold from the harness has immediate implications for evaluating and testing AI agents. According to the publication, improvements made exclusively to the scaffold layer have yielded 10 to 20 point performance increases on SWE-bench (Verified) tasks, all without altering the underlying model weights.

Despite this technical distinction, commercial products frequently blur the lines between these layers. Hugging Face notes that tools like Anthropic’s Claude Code, OpenAI’s Codex, and the Antigravity CLI often use “harness” as a catch-all term for the entire stack surrounding the base model. Claude Code’s official documentation explicitly describes the software as the “agentic harness around Claude.”

The glossary also contextualizes the rise of reusable agent capabilities, referencing Anthropic Skills. Unlike basic function calls, agent skills are distributed as structured packages of knowledge via .SKILL.md folders, bundling complex instructions and scripts for specific goals.

Updated Terminology for 2026

The release standardizes several other concepts that evolved during the early 2026 surge in terminal and coding agents, such as Google’s Gemini Nano terminal agent and the IBM Open Agent Leaderboard.

Term2026 Definition
Sub-agentAn agent invoked by another agent for a specific subtask, maintaining its own independent model and scaffold.
PolicyThe specific behavior an agent executes, representing a combination of learned model weights and the surrounding harness and scaffold.
Tool SearchThe capability for an agent to search a repository for tools dynamically at runtime, rather than loading all tools into the system prompt upfront.

If you build multi-agent systems, explicitly separating your scaffolding logic from your execution harness allows you to version and test your prompts and schemas independently of your API retry logic. Decoupling the behavior definition from the runtime engine prevents tight coupling that complicates debugging when agents fail at complex tasks.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading