Blog

AI engineering insights, practical advice, and things I'm learning.

Latest AI news, updated daily. Go to News →

AI Engineering

How to Run TPU Workloads on Google Cloud with Ray 2.55

Learn how to provision Google Cloud TPUs, handle slice topologies, and deploy machine learning models using Ray 2.55 and the KubeRay Operator.

Google Cloud Tpu · Ray Framework · Machine Learning Infrastructure · Kuberay Operator

July 20, 2026

AI Engineering

How to Scale Diffusers Training With NeMo Automodel

Learn how to fine-tune large diffusion models like FLUX.1-dev and Wan 2.1 across multiple GPUs using the NVIDIA NeMo Automodel library.

Diffusion Models · Gpu Acceleration · Model Fine Tuning

July 17, 2026

AI Engineering

How to Profile PyTorch Attention Kernels on A100 GPUs

Learn how to use the PyTorch profiler to identify memory and compute bottlenecks in attention mechanisms using Hugging Face's tracing methodology.

Pytorch Profiler · Attention Mechanism · Gpu Optimization

July 10, 2026

AI Engineering

How to Cut CPU Costs with Cloudflare Workers Cache

You will learn how to configure Cloudflare Workers Cache to serve responses directly from entrypoints, handle invalidations, and partition cache keys.

Cloudflare Workers · Edge Computing · Serverless Optimization

July 8, 2026

AI Engineering

How to launch Hugging Face models in SageMaker Studio

You will learn how to use the new Hugging Face integration to automatically provision and deploy open-source models directly into Amazon SageMaker Studio.

Hugging Face · Amazon Sagemaker · Model Deployment

July 8, 2026

AI Engineering

SkyPilot Drops Cross-Cloud Egress Fees With Hugging Face Storage

Configure SkyPilot to mount Hugging Face Storage natively and eliminate cross-cloud egress fees for multi-cloud AI workloads.

Multi Cloud · Skypilot · Hugging Face

July 7, 2026

AI Engineering

How to Configure Elastic Training in MaxText on TPUs

Learn how to enable elastic training in MaxText to survive hardware failures and resume distributed AI workloads in seconds.

Tpu Acceleration · Distributed Training · Fault Tolerance

July 7, 2026

AI Engineering

How to Expose Ephemeral vLLM Endpoints on Hugging Face Jobs

Learn how to spin up temporary, OpenAI-compatible vLLM inference endpoints on Hugging Face serverless infrastructure using a single CLI command.

Vllm · Hugging Face · Serverless Inference

June 26, 2026

AI Engineering

How to Implement Saga Rollbacks in Cloudflare Workflows

Learn how to manage distributed transactions and write compensating actions using the saga rollback feature in Cloudflare Workflows.

Distributed Transactions · Cloudflare Workflows · State Management

June 25, 2026

AI Engineering

How to Speed Up MoE Fine-Tuning With NeMo AutoModel

Learn how to configure NVIDIA NeMo AutoModel in Transformers v5 to increase MoE training throughput and reduce GPU memory usage.

Mixture Of Experts · Nvidia Nemo · Model Fine Tuning

June 24, 2026

AI Engineering

How to Secure Claude API Workloads With Identity Federation

You will learn how to configure Workload Identity Federation to authenticate non-human Claude API requests and eliminate static access keys.

Claude Api · Workload Identity Federation · Cloud Security

June 19, 2026

AI Engineering

How to Configure Sparse-LoRA and DoRA With Hugging Face PEFT

Learn how to use PEFT 0.18.0 to configure Sparse-LoRA, DoRA, LoRA-XS, and rsLoRA for more efficient fine-tuning on single-GPU hardware.

Parameter Efficient Fine Tuning · Hugging Face Peft · Large Language Models

June 19, 2026

AI Engineering

How to Run In-Loop Model Evaluations With olmo-eval

Learn how to set up olmo-eval to test large language model checkpoints during the training process using vLLM, LiteLLM, and Docker-based agent sandboxes.

Llm Evaluation · Model Training · Vllm

June 12, 2026

AI Engineering

How to Fuse PyTorch MLP Kernels for a 30% Inference Speedup

Learn how to analyze PyTorch profiler traces and implement Liger kernel fusion to significantly reduce memory bandwidth bottlenecks in transformer models.

Pytorch · Kernel Fusion · Inference Optimization

June 12, 2026

AI Engineering

How to Serve DiffusionGemma Locally With vLLM

Learn how to deploy Google's 26B text diffusion model on local hardware to achieve massive parallel generation speeds using vLLM and Hugging Face.

Diffusion Models · Local Deployment · Vllm Inference

June 10, 2026

AI Engineering

How to Route GPU GitHub Actions to Hugging Face Jobs

Offload your training and GPU-heavy CI workloads to Hugging Face Jobs using their new ephemeral GitHub runners and action integrations.

Github Actions · Hugging Face · Gpu Computing

June 10, 2026

AI Engineering

How to Call Claude 4.5 via Apple Foundation Models Framework

Learn how to integrate Claude 4.5 into your Swift applications using Apple's new Foundation Models framework for hybrid on-device and cloud processing.

Claude 4 5 · Apple Foundation Models · Swift Programming

June 9, 2026

AI Engineering

How to Provision Google Colab GPUs From the Command Line

Learn how to install the Google Colab CLI, provision high-performance remote GPUs from your local terminal, and execute headless machine learning workflows.

Google Colab · Gpu Provisioning · Command Line Interface

June 5, 2026

AI Engineering

How to Stop OCR Degeneration With DharmaOCR Lite 3B

Dharma-AI's new DharmaOCR models apply DPO to eliminate autoregressive looping. Learn how to configure the 3B parameter model for structured JSON extraction.

Optical Character Recognition · Direct Preference Optimization · Structured Data Extraction

June 3, 2026

AI Engineering

How to Find GPU Gaps in PyTorch 2.12 With torch.profiler

Learn how to identify performance bottlenecks and idle GPU lanes using the native torch.profiler in PyTorch 2.12 across Blackwell and AMD hardware.

Pytorch · Gpu Optimization · Performance Profiling

May 29, 2026