Hugging Face Defines the Scaffold vs Harness Agent Architecture

On May 25, 2026, Hugging Face released a technical glossary titled Harness, Scaffold, and the AI Agent Terms Worth Getting Right to address terminology fragmentation in the agent community. The publication follows growing confusion observed at ICLR 2026, where researchers and practitioners reported using conflicting definitions for foundational architectural concepts. Co-authored by researcher Ari Goldberg and reviewed by Hugging Face teams, the document establishes a formal structural model separating the behavior-defining layers of an agent from its underlying execution runtime.

Architectural Boundaries

The glossary formalizes a structural mental model for building these systems: an agent consists of a base model plus a harness. Within this model, Hugging Face draws a strict technical line between scaffolding and the harness itself.

Scaffolding acts as the behavior-defining layer that dictates how a model perceives and interacts with its environment. This encompasses system prompts, tool descriptions, context management, and the specific output formats or schemas the model must follow.

The harness serves as the execution layer or agentic runtime. Its responsibilities are strictly operational. The harness manages the control loop that decides when to invoke the model and when to halt. It handles actual tool execution, triggering the requested APIs or code. It also manages error states, including retries, timeouts, and malformed outputs, while enforcing operational guardrails.

The Impact on Evaluation and Tooling

Isolating the scaffold from the harness has immediate implications for evaluating and testing AI agents. According to the publication, improvements made exclusively to the scaffold layer have yielded 10 to 20 point performance increases on SWE-bench (Verified) tasks, all without altering the underlying model weights.

Despite this technical distinction, commercial products frequently blur the lines between these layers. Hugging Face notes that tools like Anthropic’s Claude Code, OpenAI’s Codex, and the Antigravity CLI often use “harness” as a catch-all term for the entire stack surrounding the base model. Claude Code’s official documentation explicitly describes the software as the “agentic harness around Claude.”

The glossary also contextualizes the rise of reusable agent capabilities, referencing Anthropic Skills. Unlike basic function calls, agent skills are distributed as structured packages of knowledge via .SKILL.md folders, bundling complex instructions and scripts for specific goals.

Updated Terminology for 2026

The release standardizes several other concepts that evolved during the early 2026 surge in terminal and coding agents, such as Google’s Gemini Nano terminal agent and the IBM Open Agent Leaderboard.

Term	2026 Definition
Sub-agent	An agent invoked by another agent for a specific subtask, maintaining its own independent model and scaffold.
Policy	The specific behavior an agent executes, representing a combination of learned model weights and the surrounding harness and scaffold.
Tool Search	The capability for an agent to search a repository for tools dynamically at runtime, rather than loading all tools into the system prompt upfront.

If you build multi-agent systems, explicitly separating your scaffolding logic from your execution harness allows you to version and test your prompts and schemas independently of your API retry logic. Decoupling the behavior definition from the runtime engine prevents tight coupling that complicates debugging when agents fail at complex tasks.

Hugging Face Defines the Scaffold vs Harness Agent Architecture

Architectural Boundaries

The Impact on Evaluation and Tooling

Updated Terminology for 2026

Keep Reading

How to Build Advanced AI Agents with OpenClaw v2026

Open Agent Leaderboard Evaluates Full Scaffolding and Task Costs

Google's Agents CLI: A Terminal Path to Agent Platform

ServiceNow Ships a Benchmark for Testing Enterprise Voice Agents

NVIDIA Ships Nemotron 3 Content Safety 4B for On-Device Filtering