Ai Engineering 3 min read

How to Route GPU GitHub Actions to Hugging Face Jobs

Offload your training and GPU-heavy CI workloads to Hugging Face Jobs using their new ephemeral GitHub runners and action integrations.

Hugging Face’s new GitHub CI integration allows you to execute compute-intensive workflows directly on their serverless hardware fleet while maintaining GitHub Actions as your primary orchestrator. Released alongside the redesigned hf CLI, this integration lets you run GPU-dependent tests and training jobs on demand, avoiding the high idle costs of traditional self-hosted EC2 runners.

There are two primary integration paths: direct job submission via a custom action, or ephemeral self-hosted runners. Both require initial authentication setup but serve different workflow patterns.

Authentication and Identity Setup

Before modifying your workflow files, you must configure authentication between GitHub and Hugging Face. The standard method requires generating a Hugging Face access token with job.write permissions.

Add this token to your GitHub repository as a repository secret named HF_TOKEN. For advanced enterprise deployments, the new workflow identity federation allows CI jobs to publish models and read gated repositories without managing static secrets, mirroring the trusted publishing mechanics used in npm and PyPI.

Direct Job Submission

The hf-jobs-action is a standard GitHub Action that submits a specific script or command to Hugging Face Jobs. This is ideal when you want to isolate a single heavy task, like running LLM evaluation pipelines, without moving the entire job execution out of GitHub’s environment.

To use it, add the following step to your workflow:

yaml steps:

  • uses: actions/checkout@v4
  • uses: huggingface/hf-jobs-action@main with: command: python run_evaluation.py env: HF_TOKEN: ${{ secrets.HF_TOKEN }}

This action automatically handles real-time log streaming back to the GitHub Actions console and supports direct file mounting from your repository to the Hugging Face container.

Ephemeral Self-Hosted Runners

The beta jobs-actions integration allows Hugging Face Jobs to act as a complete self-hosted GitHub runner. Instead of just running one script remotely, the entire GitHub Action job executes on Hugging Face hardware.

By changing a single line in your workflow configuration, Hugging Face automatically provisions the hardware, registers it as a runner, executes your workflow, and terminates the instance immediately afterward.

yaml jobs: gpu-tests: runs-on: hf-jobs-l4x1 steps: - uses: actions/checkout@v4 - run: pytest tests/gpu/

Hardware Flavors and Pricing

Billing operates on a per-second basis using standard Hub subscription credits. You can target specific hardware by changing the runs-on value to match the required flavor.

Runner TagHardware TypeTarget Workload
hf-jobs-cpuHigh-performance CPUAccelerated unit tests
hf-jobs-t4-smallNVIDIA T4Basic inference and small model tests
hf-jobs-l4x1NVIDIA L4Medium model inference and fine-tuning
hf-jobs-a10g-smallNVIDIA A10GStandard ML testing pipelines
hf-jobs-a100NVIDIA A100Heavy fine-tuning and training
hf-jobs-h200NVIDIA H200Frontier model training and large scale inference

Tradeoffs and Limitations

While moving computation to Hugging Face provides significant cost savings by eliminating idle runner fleets, it does introduce a minor latency penalty. The ephemeral runner mode (jobs-actions) typically experiences a cold start delay of 30 to 90 seconds while the hardware provisions and registers with GitHub.

For comparison, standard GitHub-hosted ubuntu-latest runners typically start in 5 to 15 seconds. If your repository primarily runs lightweight CPU tests that execute quickly, the cold start overhead may negate the 30% execution speed improvements seen on Hugging Face’s CPU instances. Reserve this integration for tests that genuinely benefit from hardware acceleration.

Review your current GitHub Actions usage logs to identify the longest-running GPU tasks, and migrate those specific workflows to hf-jobs-l4x1 to validate the setup before rolling it out across your entire test suite.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading