Blog
AI engineering insights, practical advice, and things I'm learning.
AI Engineering
Google Graduates LiteRT NPU Acceleration to Production
Learn how to configure LiteRT for hardware-accelerated on-device AI inference using Google's production-ready NPU capabilities.
Litert · On Device Ai · Npu Acceleration · Hardware Inference
AI Engineering
Build Real-Time Voice Agents with Cloudflare Agents SDK
Learn how to integrate low-latency voice interactions into your AI agents using Cloudflare's new @cloudflare/voice package and Durable Objects.
Cloudflare Workers · Voice Ai · Stt
AI Engineering
Build a Fast Multilingual OCR with Nemotron-OCR-v2
Learn how to deploy NVIDIA Nemotron-OCR-v2 for high-speed document extraction across six languages using synthetic data and GPU acceleration.
Nvidia Nemotron · Multilingual Ocr · Synthetic Data
AI Engineering
Train Multimodal Sentence Transformers for Visual Retrieval
Learn how to finetune multimodal embedding and reranker models for text, image, and audio using the updated Sentence Transformers library.
Sentence Transformers · Multimodal Ai · Embedding Models
AI Engineering
How to Deploy Enterprise MCP with Cloudflare Workers
Learn to secure and scale Model Context Protocol deployments using Cloudflare’s reference architecture for remote MCP servers and centralized portals.
Model Context Protocol · Cloudflare Workers · Ai Agents
AI Engineering
How to Use Subagents in Gemini CLI
Learn how to build and orchestrate specialized AI subagents in Gemini CLI to prevent context rot and improve development speed using isolated expert loops.
Gemini Cli · Ai Agents · Google Gemini
AI Engineering
How to Automate Workflows with Claude Code Routines
Learn how to use Claude Code's new routines to schedule tasks, trigger API workflows, and automate GitHub PR reviews on cloud infrastructure.
Claude Code · Anthropic · Ai Automation
AI Engineering
How to Create and Use One-Click Skills in Google Chrome
Convert your favorite Gemini AI prompts into automated browser macros with Google's new Skills feature for one-click productivity on any webpage.
Google Chrome · Gemini Ai · Browser Automation
AI Engineering
How to Use the New Unified Cloudflare CLI and Local Explorer
Learn how to use Cloudflare's new cf CLI and Local Explorer to streamline cross-product development and debug local data for AI agents and human developers.
Cloudflare Cli · Local Explorer · Ai Agents
AI Engineering
How to Implement Multi-Agent Coordination Patterns
Learn five production-grade architectural patterns for multi-agent systems to optimize performance, hierarchy, and context management in AI engineering.
Multi Agent Systems · Claude Code · Orchestration
AI Engineering
How to Use Symbolic Execution for Automated BPF Analysis
Learn how Cloudflare uses the Z3 theorem prover to instantly generate magic packets and reverse-engineer BPF bytecode for security research.
Symbolic Execution · Z3 Solver · Bpf Bytecode
AI Engineering
How to Implement the Advisor Strategy with Claude
Optimize AI agents by pairing high-intelligence advisor models with cost-effective executors using Anthropic's native advisor tool API.
Anthropic Claude · Ai Agents · Model Routing
AI Engineering
How to Use Multimodal Sentence Transformers v5.4
Learn to implement multimodal embedding and reranker models using Sentence Transformers for advanced search across text, images, audio, and video.
Hugging Face · Sentence Transformers · Multimodal Rag
AI Engineering
How to Use Subagents in Claude Code
Learn how to use modular subagents in Claude Code to isolate context, delegate specialized tasks, and optimize costs with custom AI personas.
Claude Code · Anthropic · Ai Agents
AI Engineering
How to Optimize MoE Inference with Warp Decode
Learn how Cursor's warp decode technique uses GPU kernel optimizations and warp-level primitives to achieve 300+ tokens per second on Blackwell hardware.
Moe Models · Gpu Optimization · Cuda Kernels
AI Engineering
How to Use Amazon Polly's Bidirectional Streaming API
Learn how to use Amazon Polly’s new HTTP/2 bidirectional streaming to reduce latency in real-time conversational AI by streaming text and audio simultaneously.
Amazon Polly · Text To Speech · Aws Cloud
AI Engineering
What Are Parameters in AI Models?
Parameters are the numbers that make AI models work. Here's what they are, why models have billions of them, and what the count actually tells you about capability.
Parameters · Llm · Ai Engineering
AI Engineering
What Is Quantization in AI?
Quantization shrinks AI models by reducing numerical precision. Here's how it works, what formats exist, and how to choose the right tradeoff between size, speed, and quality.
Quantization · Llm · Inference
AI Engineering
What Is AI Inference and How Does It Work?
Inference is where AI models do their actual work. Here's what happens during inference, why it's the bottleneck, and what determines speed and cost.
Inference · Llm · Ai Engineering
AI Engineering
How to Build a Domain-Specific Embedding Model
Learn NVIDIA's recipe for fine-tuning a domain-specific embedding model in hours using synthetic data, hard negatives, BEIR, and NIM.
Embeddings · Retrieval · Fine Tuning