Ai Agents 6 min read

Local LLMs Hit 88% Accuracy Triaging OpenClaw Pull Requests

Learn how to build a local AI triage system for GitHub repositories using Hugging Face Text Generation Inference, smolagents, and Command R.

Hugging Face recently detailed how they automated issue and pull request management for the OpenClaw repository using local hardware. Their local PR triage workflow replaces external API calls with a self-hosted pipeline using Text Generation Inference and the smolagents library. This architecture achieved 88 percent labeling accuracy while reducing operating costs to cents per pull request. You can replicate this setup to handle conflict detection, issue categorization, and thread summarization entirely on your own infrastructure.

Running repository automation locally solves two major bottlenecks for open-source projects. It eliminates the recurring per-token costs associated with high-velocity repositories, and it prevents unreleased code or sensitive discussions from being transmitted to third-party model providers. The workflow relies on self-hosted GitHub Actions runners interfacing directly with a local inference server.

Hardware and Model Selection

The triage pipeline requires significant GPU VRAM to maintain high throughput for agentic reasoning. The OpenClaw experiment utilized a workstation equipped with two NVIDIA RTX 6000 Ada Generation GPUs. This provides 96GB of total VRAM, which is sufficient for running optimized 70B parameter models or handling multiple concurrent requests on smaller models.

The primary model driving the workflow is Command R (v01). Command R handles the core Retrieval-Augmented Generation and tool-use tasks. Its architecture specifically targets complex reasoning over external tools, making it well-suited for navigating GitHub repository contexts.

For the heaviest logic tasks, the system also supports Llama-3-70B-Instruct. Running a 70B model on 96GB of VRAM requires strict memory management. The workflow utilizes AutoAWQ to compress the model weights. If you are configuring this on your own hardware, understanding what quantization is in AI is critical. AutoAWQ reduces the memory footprint enough to fit the 70B model across the dual GPUs while preserving the reasoning capabilities required for evaluating code diffs.

Configuring Local Inference

The foundation of the triage system is Hugging Face Text Generation Inference. TGI serves the selected models via an API that mimics standard cloud provider endpoints. This allows the agent framework to interact with the local models using standard network requests.

You must deploy TGI on the machine hosting the GPUs. The TGI server handles continuous batching, tensor parallelism across the two RTX 6000s, and request queuing. When setting up TGI for repository triage, configure the server to prioritize long-context windows. Pull requests often contain thousands of lines of diffs and extensive comment histories. TGI will automatically map the model layers across both GPUs when started with the appropriate tensor parallelism flags.

Orchestrating Tasks with smolagents

The logic layer of the triage system is built using the Hugging Face smolagents library. This library differs from standard agent frameworks by prioritizing Python execution over static JSON formatting. Instead of returning structured JSON that the application must parse and execute, the agent writes and executes small Python snippets to accomplish its goals.

The core component used in this workflow is the CodeAgent class. When a pull request triggers the system, the CodeAgent is instantiated with access to the GitHub API. Rather than relying on traditional function calling, the agent generates a Python script to fetch the pull request details, read the diffs, and post comments.

This execution model significantly reduces the complexity of the agent loop. The agent can handle pagination, error handling, and data transformation natively within the generated Python snippet before returning the final text output.

Core Triage Capabilities

The OpenClaw implementation targets three specific repository maintenance tasks. You can configure your local deployment to handle any combination of these functions based on your repository needs.

Automated Issue Labeling

The most frequent task is categorizing incoming issues and pull requests. The agent reads the title, description, and any provided code snippets, then applies the appropriate tags like bug, enhancement, or documentation. In the OpenClaw deployment, this automated labeling achieved an 88 percent accuracy rate when compared to manual human categorization.

Merge Conflict Detection and Resolution

Handling stale pull requests requires identifying merge conflicts and suggesting fixes. The smolagents framework tackles this using a “Plan-and-Execute” pattern. The system first identifies that a conflict exists between the base branch and the proposed changes.

The agent then outlines a step-by-step plan to resolve the conflict. It executes this plan by generating Python code to fetch the conflicting files, analyze the overlapping changes, and formulate a resolution. During the OpenClaw experiment, the agent successfully suggested the correct code resolution for merge conflicts in 40 percent of the identified cases. For complex repositories, implementing specific multi-agent coordination patterns can further improve this resolution rate by separating the analysis and coding steps into distinct agent roles.

Thread Summarization

Long, stalled pull requests often suffer from a loss of context. Maintainers returning to a PR after several weeks must read through dozens of comments to understand the current blocker. The local agent parses the entire comment thread and generates a concise summary of the discussion, the agreed-upon technical direction, and the specific reasons the PR is currently blocked.

Integration via GitHub Actions

The bridge between the repository and the local AI system is a self-hosted GitHub Actions runner. You must install the runner agent on the same local network as the TGI server.

When a contributor opens a pull request, a standard GitHub Action workflow triggers. The workflow specifies runs-on: self-hosted. The local runner picks up the job, initializes the smolagents CodeAgent, and points it at the local TGI endpoint. The agent performs its analysis, pushes the required labels or comments back up to GitHub via the API, and terminates the job. This architecture ensures that the intensive compute tasks remain entirely on the local hardware.

Cost Efficiency and Tradeoffs

Shifting from proprietary APIs to local inference fundamentally changes the economics of repository automation. Hugging Face calculated that running these triage tasks via the GPT-4o API costs project maintainers significant amounts during high-traffic periods.

By moving the workloads to the local dual RTX 6000 Ada setup, the operating cost drops to approximately $0.12 to $0.45 per pull request. This cost strictly represents the electricity and the amortized cost of the hardware itself. There are no per-token API fees. The “FREE” designation in the architecture refers entirely to the elimination of these recurring software costs.

This approach does require an upfront investment in capable hardware. Consumer-grade GPUs like the RTX 4090 can handle smaller quantized models, but reliable execution of the 70B parameter models required for deep code reasoning necessitates professional-grade hardware with ample VRAM.

Deploying a local triage agent shifts repository maintenance from a manual chore to a predictable, automated pipeline. Start by configuring a self-hosted GitHub Actions runner on a machine with at least 24GB of VRAM, deploy a quantized version of Command R via TGI, and use the smolagents library to handle basic issue labeling before expanding into complex conflict resolution.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading