Blog
AI engineering insights, practical advice, and things I'm learning.
Ai Engineering
How to Run In-Loop Model Evaluations With olmo-eval
Learn how to set up olmo-eval to test large language model checkpoints during the training process using vLLM, LiteLLM, and Docker-based agent sandboxes.
Llm Evaluation · Model Training · Vllm · Litellm
Ai Engineering
How to Fuse PyTorch MLP Kernels for a 30% Inference Speedup
Learn how to analyze PyTorch profiler traces and implement Liger kernel fusion to significantly reduce memory bandwidth bottlenecks in transformer models.
Pytorch · Kernel Fusion · Inference Optimization
Ai Engineering
How to Serve DiffusionGemma Locally With vLLM
Learn how to deploy Google's 26B text diffusion model on local hardware to achieve massive parallel generation speeds using vLLM and Hugging Face.
Diffusion Models · Local Deployment · Vllm Inference
Ai Engineering
How to Route GPU GitHub Actions to Hugging Face Jobs
Offload your training and GPU-heavy CI workloads to Hugging Face Jobs using their new ephemeral GitHub runners and action integrations.
Github Actions · Hugging Face · Gpu Computing
Ai Agents
How to Chain Hugging Face Spaces Using the /agents.md Endpoint
You will learn how to orchestrate text-to-image and 3D modeling tools by chaining Hugging Face Spaces together using the universal markdown tool interface.
Hugging Face · Agentic Workflows · Api Orchestration
Ai Engineering
How to Call Claude 4.5 via Apple Foundation Models Framework
Learn how to integrate Claude 4.5 into your Swift applications using Apple's new Foundation Models framework for hybrid on-device and cloud processing.
Claude 4 5 · Apple Foundation Models · Swift Programming
Ai Engineering
How to Provision Google Colab GPUs From the Command Line
Learn how to install the Google Colab CLI, provision high-performance remote GPUs from your local terminal, and execute headless machine learning workflows.
Google Colab · Gpu Provisioning · Command Line Interface
Ai Agents
How to Expose the Hugging Face Hub to Coding Agents via hf CLI
Learn how to use the newly redesigned hf CLI to provide coding agents like Claude Code and Cursor with direct access to Hugging Face models and datasets.
Hugging Face · Cli Tools · Agentic Workflows
Ai Agents
How to Automate Desktop Workflows With Claude Cowork
Learn how to configure Claude Cowork to execute multi-step desktop tasks using local file access, markdown skills, and built-in workspace connectors.
Desktop Automation · Workflow Optimization · Claude Cowork
Ai Agents
How to Extend Reachy Mini Capabilities With Remote MCP Tools
Learn how to extend the Reachy Mini robot using remote Model Context Protocol tools hosted on Hugging Face Spaces without modifying local application code.
Robotics · Mcp Protocol · Hugging Face Spaces
Ai Engineering
How to Stop OCR Degeneration With DharmaOCR Lite 3B
Dharma-AI's new DharmaOCR models apply DPO to eliminate autoregressive looping. Learn how to configure the 3B parameter model for structured JSON extraction.
Optical Character Recognition · Direct Preference Optimization · Structured Data Extraction
Ai Engineering
How to Find GPU Gaps in PyTorch 2.12 With torch.profiler
Learn how to identify performance bottlenecks and idle GPU lanes using the native torch.profiler in PyTorch 2.12 across Blackwell and AMD hardware.
Pytorch · Gpu Optimization · Performance Profiling
Ai Engineering
How to Automate Google Pay Integrations With MCP
Connect your AI development environment to real-time merchant data and documentation using the new Google Pay and Wallet Developer MCP server.
Mcp Server · Google Pay · Workflow Automation
Ai Agents
How to Orchestrate Parallel Subagents in Claude Code
Learn how to use dynamic workflows in Claude Code to manage up to 1,000 parallel subagents, handle resumable state, and optimize your Opus 4.8 API costs.
Claude Code · Parallel Orchestration · Dynamic Workflows
Ai Engineering
How to Cut Checkpoint Time by 85% With TRL Delta Weight Sync
Learn how to configure TRL Delta Weight Sync to reduce trillion-parameter model checkpointing times by 85 percent using Hugging Face Hub Buckets.
Hugging Face · Checkpointing · Trl Library
Ai Engineering
How to Run Gemma 4 On-Device with LiteRT-LM
Learn how to configure LiteRT-LM to deploy the Gemma 4 model family locally across mobile, desktop, and edge environments with constrained JSON decoding.
Gemma 4 · Litert Lm · On Device Ai
Ai Agents
How to run Claude Managed Agents in self-hosted sandboxes
Learn how to deploy Claude Managed Agents using self-hosted sandboxes and MCP tunnels to securely execute tools and access private data.
Claude Managed Agents · Self Hosted Sandboxes · Mcp Tunnels
Ai Engineering
How to Fine-Tune Cosmos Predict 2.5 for Robotics With LoRA
Learn how to adapt NVIDIA's 2B and 14B Cosmos Predict 2.5 world foundation models using parameter-efficient fine-tuning methods like LoRA and DoRA.
Fine Tuning · Lora Dora · World Models
Ai Coding
How to Scale Claude Code Across Enterprise Monorepos
Learn how to deploy Claude Code in multi-million line monorepos using hierarchical context, language server protocol integration, and on-demand skills.
Claude Code · Monorepo Scaling · Enterprise Software
Ai Agents
How to Control Agent Tool Execution via Genkit Middleware
Learn how to use Google's new Genkit Middleware to intercept model calls, implement human-in-the-loop tool approvals, and handle transient API failures.
Genkit Middleware · Agentic Workflows · Human In The Loop