How to Build Multi-Agent CNC Workflows on AMD MI300X
Learn how to coordinate LangChain agents and Qwen 2.5 7B on the AMD MI300X to reduce CNC manufacturability analysis time from hours to seconds.
MachinaCheck is a multi-agent AI system that automates CNC manufacturability analysis, highlighted in the recent AMD Developer Hackathon wrap-up on May 10, 2026. The platform reduces manual Design for Manufacturability (DFM) evaluation time from up to 60 minutes down to roughly 30 seconds. You can replicate this architecture to build hybrid manufacturing workflows that combine deterministic CAD parsing with large language model reasoning.
System Architecture
Building an automated DFM pipeline requires segmenting tasks strictly between predictable code and generative models. Relying on an LLM for exact geometric math will result in failed parts.
The MachinaCheck architecture uses LangChain to orchestrate a multi-agent architecture consisting of five distinct components:
- STEP File Parser: A non-LLM, pure Python component that extracts raw geometric data from standard 3D CAD files.
- Operations Classifier: An instance of Qwen 2.5 7B that analyzes the extracted geometry to identify necessary machining operations, such as differentiating between drilling and milling.
- Tool Matcher: A deterministic Python script that queries a workshop database to find available tools matching the required specifications.
- Feasibility Decision Agent: A second Qwen 2.5 7B call that reasons over the combined geometric data and available tooling to determine if the part can be manufactured within the specified tolerances.
- Report Generator: A final Qwen 2.5 7B pass that produces a structured manufacturing report, complete with tool lists and risk assessments.
Hardware and Model Serving
Running multiple simultaneous agent calls requires significant memory bandwidth and capacity. The system runs on the AMD Instinct MI300X platform via the AMD Developer Cloud.
The hardware provides 192GB of HBM3 memory, which allows the pipeline to load and run Qwen 2.5 7B without relying on quantization. The model is served using the vLLM stack compiled for ROCm 7.
Because the workflow splits tasks across discrete agents, inference latency defines the total pipeline execution time. The vLLM configuration on the MI300X achieves an average response time of under 3 seconds per agent call.
Designing the Hybrid Workflow
The most critical design decision in this pipeline is the hybrid approach to data processing. The system offloads reasoning to Qwen 2.5 7B while actively preventing the LLM from handling deterministic lookups.
The Tool Matcher component avoids the LLM entirely. When matching a required 4mm hole to an available 4mm drill bit, standard database queries provide 100% accuracy with zero hallucination risk. You must structure your LangChain tools to enforce this separation, passing only the final structured outputs from the Python scripts into the Feasibility Decision Agent’s context window.
To begin testing this architecture, provision an MI300X instance on the AMD Developer Cloud and deploy the ROCm 7 compatible vLLM container.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
Thousand Token Wood Runs a 5-Agent Economy on Qwen2.5-3B
Developed for Hugging Face's Build Small Hackathon, the Thousand Token Wood simulation uses a 3-billion-parameter model to drive a real-time agent economy.
Zealot, the Multi-Agent Cloud Attack Framework
Palo Alto Networks has demonstrated Zealot, an autonomous multi-agent AI system capable of executing end-to-end cloud infrastructure exploits in minutes.
How to Refactor Monolithic Agents with Google ADK
Learn how to transition monolithic prompt scripts into production-ready multi-agent pipelines using Google's Agent Development Kit and the Agent2Agent protocol.
Cursor Agents Boost CUDA Kernel Speed by 38% on NVIDIA Blackwell
A new multi-agent system from Cursor achieves massive performance gains on NVIDIA Blackwell GPUs by autonomously optimizing complex CUDA kernels.
Claude Cowork Reimagines the Enterprise as an Agentic Workspace
Anthropic debuts Claude Cowork, introducing multi-agent coordination, persistent team memory, and VPC deployment options for secure corporate collaboration.