Ai Agents 3 min read

How to Build Multi-Agent CNC Workflows on AMD MI300X

Learn how to coordinate LangChain agents and Qwen 2.5 7B on the AMD MI300X to reduce CNC manufacturability analysis time from hours to seconds.

MachinaCheck is a multi-agent AI system that automates CNC manufacturability analysis, highlighted in the recent AMD Developer Hackathon wrap-up on May 10, 2026. The platform reduces manual Design for Manufacturability (DFM) evaluation time from up to 60 minutes down to roughly 30 seconds. You can replicate this architecture to build hybrid manufacturing workflows that combine deterministic CAD parsing with large language model reasoning.

System Architecture

Building an automated DFM pipeline requires segmenting tasks strictly between predictable code and generative models. Relying on an LLM for exact geometric math will result in failed parts.

The MachinaCheck architecture uses LangChain to orchestrate a multi-agent architecture consisting of five distinct components:

  1. STEP File Parser: A non-LLM, pure Python component that extracts raw geometric data from standard 3D CAD files.
  2. Operations Classifier: An instance of Qwen 2.5 7B that analyzes the extracted geometry to identify necessary machining operations, such as differentiating between drilling and milling.
  3. Tool Matcher: A deterministic Python script that queries a workshop database to find available tools matching the required specifications.
  4. Feasibility Decision Agent: A second Qwen 2.5 7B call that reasons over the combined geometric data and available tooling to determine if the part can be manufactured within the specified tolerances.
  5. Report Generator: A final Qwen 2.5 7B pass that produces a structured manufacturing report, complete with tool lists and risk assessments.

Hardware and Model Serving

Running multiple simultaneous agent calls requires significant memory bandwidth and capacity. The system runs on the AMD Instinct MI300X platform via the AMD Developer Cloud.

The hardware provides 192GB of HBM3 memory, which allows the pipeline to load and run Qwen 2.5 7B without relying on quantization. The model is served using the vLLM stack compiled for ROCm 7.

Because the workflow splits tasks across discrete agents, inference latency defines the total pipeline execution time. The vLLM configuration on the MI300X achieves an average response time of under 3 seconds per agent call.

Designing the Hybrid Workflow

The most critical design decision in this pipeline is the hybrid approach to data processing. The system offloads reasoning to Qwen 2.5 7B while actively preventing the LLM from handling deterministic lookups.

The Tool Matcher component avoids the LLM entirely. When matching a required 4mm hole to an available 4mm drill bit, standard database queries provide 100% accuracy with zero hallucination risk. You must structure your LangChain tools to enforce this separation, passing only the final structured outputs from the Python scripts into the Feasibility Decision Agent’s context window.

To begin testing this architecture, provision an MI300X instance on the AMD Developer Cloud and deploy the ROCm 7 compatible vLLM container.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading