Blog
AI engineering insights, practical advice, and things I'm learning.
AI Engineering
How to Deploy Enterprise MCP with Cloudflare Workers
Learn to secure and scale Model Context Protocol deployments using Cloudflare’s reference architecture for remote MCP servers and centralized portals.
Model Context Protocol · Cloudflare Workers · Ai Agents
AI Engineering
How to Use Subagents in Gemini CLI
Learn how to build and orchestrate specialized AI subagents in Gemini CLI to prevent context rot and improve development speed using isolated expert loops.
Gemini Cli · Ai Agents · Google Gemini
AI Engineering
How to Automate Workflows with Claude Code Routines
Learn how to use Claude Code's new routines to schedule tasks, trigger API workflows, and automate GitHub PR reviews on cloud infrastructure.
Claude Code · Anthropic · Ai Automation
AI Engineering
How to Create and Use One-Click Skills in Google Chrome
Convert your favorite Gemini AI prompts into automated browser macros with Google's new Skills feature for one-click productivity on any webpage.
Google Chrome · Gemini Ai · Browser Automation
AI Engineering
How to Use the New Unified Cloudflare CLI and Local Explorer
Learn how to use Cloudflare's new cf CLI and Local Explorer to streamline cross-product development and debug local data for AI agents and human developers.
Cloudflare Cli · Local Explorer · Ai Agents
AI Engineering
How to Implement Multi-Agent Coordination Patterns
Learn five production-grade architectural patterns for multi-agent systems to optimize performance, hierarchy, and context management in AI engineering.
Multi Agent Systems · Claude Code · Orchestration
AI Engineering
How to Use Symbolic Execution for Automated BPF Analysis
Learn how Cloudflare uses the Z3 theorem prover to instantly generate magic packets and reverse-engineer BPF bytecode for security research.
Symbolic Execution · Z3 Solver · Bpf Bytecode
AI Engineering
How to Implement the Advisor Strategy with Claude
Optimize AI agents by pairing high-intelligence advisor models with cost-effective executors using Anthropic's native advisor tool API.
Anthropic Claude · Ai Agents · Model Routing
AI Engineering
How to Use Multimodal Sentence Transformers v5.4
Learn to implement multimodal embedding and reranker models using Sentence Transformers for advanced search across text, images, audio, and video.
Hugging Face · Sentence Transformers · Multimodal Rag
AI Engineering
How to Use Subagents in Claude Code
Learn how to use modular subagents in Claude Code to isolate context, delegate specialized tasks, and optimize costs with custom AI personas.
Claude Code · Anthropic · Ai Agents
AI Engineering
How to Optimize MoE Inference with Warp Decode
Learn how Cursor's warp decode technique uses GPU kernel optimizations and warp-level primitives to achieve 300+ tokens per second on Blackwell hardware.
Moe Models · Gpu Optimization · Cuda Kernels
AI Engineering
How to Use Amazon Polly's Bidirectional Streaming API
Learn how to use Amazon Polly’s new HTTP/2 bidirectional streaming to reduce latency in real-time conversational AI by streaming text and audio simultaneously.
Amazon Polly · Text To Speech · Aws Cloud
AI Engineering
What Are Parameters in AI Models?
Parameters are the numbers that make AI models work. Here's what they are, why models have billions of them, and what the count actually tells you about capability.
Parameters · Llm · Ai Engineering
AI Engineering
What Is Quantization in AI?
Quantization shrinks AI models by reducing numerical precision. Here's how it works, what formats exist, and how to choose the right tradeoff between size, speed, and quality.
Quantization · Llm · Inference
AI Engineering
What Is AI Inference and How Does It Work?
Inference is where AI models do their actual work. Here's what happens during inference, why it's the bottleneck, and what determines speed and cost.
Inference · Llm · Ai Engineering
AI Engineering
How to Build a Domain-Specific Embedding Model
Learn NVIDIA's recipe for fine-tuning a domain-specific embedding model in hours using synthetic data, hard negatives, BEIR, and NIM.
Embeddings · Retrieval · Fine Tuning
AI Engineering
How Cursor Built Composer 2 on Top of Kimi K2.5
Cursor's Composer 2 is built on Kimi K2.5. What continued pretraining, reinforcement learning, and self-summarization mean, and how they work.
Cursor · Kimi K2 5 · Reinforcement Learning
AI Engineering
What Is Mixture-of-Experts (MoE) in AI?
MoE models have a trillion parameters but only activate a fraction per token. How expert routing works, why it matters for cost, and which major models use it.
Mixture Of Experts · Moe · Llm Architecture
AI Engineering
What Is Continued Pretraining in AI?
Continued pretraining adapts a general LLM to a specific domain using large unlabeled data. How it works, how it differs from fine-tuning, and real examples.
Continued Pretraining · Llm Training · Domain Adaptation
AI Engineering
Continued Pretraining vs RAG: Two Ways to Add Knowledge
Continued pretraining bakes knowledge into model weights. RAG injects it at query time. When to use each, where each breaks down, and why you often need both.
Continued Pretraining · Rag · Retrieval Augmented Generation