Ai Agents 3 min read

Thousand Token Wood Runs a 5-Agent Economy on Qwen2.5-3B

Developed for Hugging Face's Build Small Hackathon, the Thousand Token Wood simulation uses a 3-billion-parameter model to drive a real-time agent economy.

On June 5, 2026, researcher “AdmiralTaco” released Thousand Token Wood as part of the Hugging Face Build Small Hackathon. The project is a multi-agent economic simulation powered entirely by a 3-billion-parameter language model. It serves as a technical field report demonstrating how developers can run complex, real-time agentic systems using localized models rather than relying on large frontier APIs.

Infrastructure and Serving Scale

The simulation relies on Qwen2.5-3B, chosen specifically for its low latency and reliable JSON formatting. The model runs via vLLM on Modal compute instances. A Gradio application provides the visual interface for the simulation.

The environment consists of five woodland creature agents trading five unique goods using a currency called pebbles. To maintain real-time performance, the system processes all agent decisions for a simulation turn in a single batched GPU call. This design avoids the queuing delays typical in multi-agent systems that rely on sequential API requests.

Forced Scarcity and Market Dynamics

Multi-agent setups often default to self-sufficiency, halting trade. To force market activity, the developer engineered three specific scarcity constraints into the environment prompts.

Economic MechanicAgent ConstraintMarket Result
Diet VarietyCreatures can only eat one unit of a specific food per meal.Forces agents to buy diverse food sources they do not produce.
SpoilagePerishable goods rot over time if hoarded.Incentivizes quick sales of surplus inventory.
Winter Fuel CrisisAll agents must burn firewood each turn.Creates competitive bidding and a wealth gap, as only the woodcutter produces wood.

These constraints prevented static interactions and triggered competitive pricing behavior among the 3B-parameter agents.

Small Model Capabilities and Limits

The deployment highlighted strict differences between formatting capability and reasoning capacity in small models. Throughout the simulation traces, Qwen2.5-3B generated 100% valid JSON output. It effectively executed structured output without requiring retry logic or external parsing corrections. This confirms that smaller models can operate reliably as formatting engines in software pipelines.

The reasoning limits appeared in agent strategy. The models demonstrated weak judgment compared to larger architectures. Agents occasionally panicked under resource constraints or hoarded goods irrationally, lacking the complex multi-step planning seen in 70B+ models. The developer published the raw agent traces on the Hugging Face Hub, allowing developers to audit the specific decision logs of these smaller models.

The Build Small Ecosystem

The project emerged from the Hugging Face Build Small Hackathon, an event structured to incentivize applications independent of expensive cloud APIs. Participants competed for a $15,000 cash prize pool and hardware rewards including two RTX 5080 GPUs. The event provided $250 in Modal credits and $20 in Hugging Face credits to participants, encouraging optimization in AI inference costs.

If you build autonomous environments, Thousand Token Wood provides a validated architecture for batching agent prompts into single compute operations. The open-sourced traces offer a baseline for evaluating how 3B models handle long-running state and strict formatting rules under pressure.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading