How to Chain Hugging Face Spaces Using the /agents.md Endpoint
You will learn how to orchestrate text-to-image and 3D modeling tools by chaining Hugging Face Spaces together using the universal markdown tool interface.
Hugging Face’s new /agents.md endpoint turns over 50,000 community-built Spaces into modular tools for autonomous agents. As demonstrated in a recent Hugging Face blog post, a coding agent can now natively chain multiple Spaces—such as generating 2D gallery assets and passing them to 3D generators—without requiring custom manual wrappers. Here is how to configure your agents to read these markdown definitions, handle parameter passing between distinct models, and orchestrate complex workflows across the Hugging Face Hub.
Understanding the Universal Tool Interface
Historically, granting an agent access to a web-hosted AI tool required writing specific API wrappers or scraping HTML interfaces. The /agents.md endpoint changes this by providing a standardized machine-readable definition for every supported Space on the Hub.
When your agent appends /agents.md to a Hugging Face Space URL, it retrieves a markdown file containing the specific instructions, input parameters, and expected output formats for that tool. This functionality integrates cleanly with the Model Context Protocol (MCP), standardizing how your orchestrator discovers and executes remote functions.
Instead of building a monolithic architecture that tries to do everything internally, you can define a CodeAgent or ReactCodeAgent that dynamically loads agent skills at runtime. The agent reads the documentation, formats the inputs according to the markdown specification, and handles the resulting output artifacts.
Workflow: From Text to 3D Gallery
To build a virtual environment like the 3D Paris Gallery, your agent must coordinate two entirely different model architectures. The agent must generate flat textures and art pieces first, then convert those flat images into 3D objects, and finally write the code to place them in a 3D scene.
Step 1: Semantic Discovery
The process begins with the Hugging Face Hub’s semantic search API. If you instruct an agent to “build a 3D gallery,” it searches the Hub for relevant tools. It identifies Spaces capable of high-fidelity image generation and Spaces capable of 3D modeling.
Step 2: Generating Base Assets
The agent selects the first tool in the chain. In this workflow, it uses the Ideogram 4 Space. By parsing the /agents.md file for this specific Space, the agent learns that it needs to provide a text prompt parameter to generate an image.
The agent generates a series of prompts for “Parisian Gallery interior textures” and specific framed artworks. It calls the Space, waits for the inference to complete, and stores the resulting high-resolution image files in its local workspace.
Step 3: Converting to 3D Models
Next, the agent must convert the 2D outputs into 3D objects. It queries the Hub again and selects TripoSplat, a VAST-AI tool that generates 3D Gaussian Splatting models from single images.
The agent reads TripoSplat’s /agents.md documentation. It learns that TripoSplat requires an image file as input rather than a text prompt. The agent takes the image outputs generated by Ideogram 4 and maps them directly to the image parameter of the TripoSplat Space. The Space processes the images and returns 3D Gaussian Splatting assets.
Step 4: Code Assembly
With all 3D assets generated and downloaded, the agent switches from external tool execution to local code generation. It writes the necessary HTML, CSS, and JavaScript (typically using a framework like Three.js or a WebGL wrapper) to instantiate a walkable environment. It maps the Gaussian splats to specific coordinates within the scene, creating the final cohesive gallery.
Hardware and Performance Considerations
Chaining multiple remote tools introduces latency and potential rate-limiting bottlenecks. However, Hugging Face mitigates this via their “Running on Zero” (ZeroGPU) infrastructure.
| Feature | Description | Impact on Agent Workflow |
|---|---|---|
| Availability | 50,000+ Spaces | Ensures agents have a massive fallback library for niche tasks. |
| Interface | /agents.md & llms.txt | Eliminates HTML parsing errors and UI-change breakages. |
| Compute Allocation | ZeroGPU integration | Tools scale dynamically. Free for many Pro users, preventing rapid token depletion on backend execution. |
| Latency | Cold start vs Warm | Agents must handle asynchronous wait states while Spaces boot from a cold start. |
When designing your agent instructions, explicitly define timeout tolerances. A complex chain involving Ideogram 4 and TripoSplat will take time to run, particularly if the 3D generation Space is waking from a dormant state. Ensure your orchestrator handles long-polling gracefully.
Limitations and Tradeoffs
While the /agents.md endpoint massively expands what your models can do, relying on community Spaces carries inherent risks for production environments.
First, Spaces are often experimental or subject to unannounced updates. An agent relying on a specific parameter structure might fail if the Space author modifies the underlying application, even if the markdown updates dynamically.
Second, handling errors across chained external tools requires robust validation. If Ideogram 4 returns an image containing visual artifacts or a format that TripoSplat cannot parse, the agent must be able to detect the failure, adjust its initial prompt, and retry the generation sequence. Without tight error-handling loops, a multi-step chain will inevitably crash midway through execution.
Finally, passing data between disparate endpoints requires network I/O. For massive batches of assets, shifting to a dedicated self-hosted pipeline will be faster and more reliable than orchestrating requests across public Hugging Face Spaces.
Next Steps
To begin orchestrating these tools, install the Hugging Face agents library and configure a ReactCodeAgent. Start by building a simple two-tool chain—for example, using an OCR Space to read a document and an LLM Space to summarize the text—before advancing to multimodal generative tasks. Review the specific /agents.md file of any Space you intend to use to understand the precise data structures your agent must pass.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
IBM Pivots to Agent Logic to Control Multi-Step AI Workflows
A joint technical publication from IBM and Hugging Face details how strict state management and formal logic layers can govern long-running enterprise agents.
How to Expose the Hugging Face Hub to Coding Agents via hf CLI
Learn how to use the newly redesigned hf CLI to provide coding agents like Claude Code and Cursor with direct access to Hugging Face models and datasets.
Android XR Launches With Gemini 3.5 Wearable Agent Support
Google's Android XR platform introduces a two-tier hardware strategy for smart glasses, relying on Gemini 3.5 to process multimodal agentic workflows.
Open Agent Leaderboard Evaluates Full Scaffolding and Task Costs
IBM and Hugging Face launched a benchmark that evaluates autonomous agents as complete systems, measuring both task success rates and the USD cost per run.
NVIDIA Ships Nemotron 3 Content Safety 4B for On-Device Filtering
NVIDIA released Nemotron 3 Content Safety 4B, a multilingual multimodal moderation model for text and images, on Hugging Face.