Blog

AI engineering insights, practical advice, and things I'm learning.

Latest AI news, updated daily. Go to News →

AI Engineering

Google Graduates LiteRT NPU Acceleration to Production

Learn how to configure LiteRT for hardware-accelerated on-device AI inference using Google's production-ready NPU capabilities.

Litert · On Device Ai · Npu Acceleration · Hardware Inference

AI Engineering

Build Real-Time Voice Agents with Cloudflare Agents SDK

Learn how to integrate low-latency voice interactions into your AI agents using Cloudflare's new @cloudflare/voice package and Durable Objects.

Cloudflare Workers · Voice Ai · Stt

AI Engineering

Build a Fast Multilingual OCR with Nemotron-OCR-v2

Learn how to deploy NVIDIA Nemotron-OCR-v2 for high-speed document extraction across six languages using synthetic data and GPU acceleration.

Nvidia Nemotron · Multilingual Ocr · Synthetic Data

AI Engineering

Train Multimodal Sentence Transformers for Visual Retrieval

Learn how to finetune multimodal embedding and reranker models for text, image, and audio using the updated Sentence Transformers library.

Sentence Transformers · Multimodal Ai · Embedding Models

AI Engineering

How to Deploy Enterprise MCP with Cloudflare Workers

Learn to secure and scale Model Context Protocol deployments using Cloudflare’s reference architecture for remote MCP servers and centralized portals.

Model Context Protocol · Cloudflare Workers · Ai Agents

AI Engineering

How to Use Subagents in Gemini CLI

Learn how to build and orchestrate specialized AI subagents in Gemini CLI to prevent context rot and improve development speed using isolated expert loops.

Gemini Cli · Ai Agents · Google Gemini

AI Engineering

How to Automate Workflows with Claude Code Routines

Learn how to use Claude Code's new routines to schedule tasks, trigger API workflows, and automate GitHub PR reviews on cloud infrastructure.

Claude Code · Anthropic · Ai Automation

AI Engineering

How to Create and Use One-Click Skills in Google Chrome

Convert your favorite Gemini AI prompts into automated browser macros with Google's new Skills feature for one-click productivity on any webpage.

Google Chrome · Gemini Ai · Browser Automation

AI Engineering

How to Use the New Unified Cloudflare CLI and Local Explorer

Learn how to use Cloudflare's new cf CLI and Local Explorer to streamline cross-product development and debug local data for AI agents and human developers.

Cloudflare Cli · Local Explorer · Ai Agents

AI Engineering

How to Implement Multi-Agent Coordination Patterns

Learn five production-grade architectural patterns for multi-agent systems to optimize performance, hierarchy, and context management in AI engineering.

Multi Agent Systems · Claude Code · Orchestration

AI Engineering

How to Use Symbolic Execution for Automated BPF Analysis

Learn how Cloudflare uses the Z3 theorem prover to instantly generate magic packets and reverse-engineer BPF bytecode for security research.

Symbolic Execution · Z3 Solver · Bpf Bytecode

AI Engineering

How to Implement the Advisor Strategy with Claude

Optimize AI agents by pairing high-intelligence advisor models with cost-effective executors using Anthropic's native advisor tool API.

Anthropic Claude · Ai Agents · Model Routing

AI Engineering

How to Use Multimodal Sentence Transformers v5.4

Learn to implement multimodal embedding and reranker models using Sentence Transformers for advanced search across text, images, audio, and video.

Hugging Face · Sentence Transformers · Multimodal Rag

AI Engineering

How to Use Subagents in Claude Code

Learn how to use modular subagents in Claude Code to isolate context, delegate specialized tasks, and optimize costs with custom AI personas.

Claude Code · Anthropic · Ai Agents

AI Engineering

How to Optimize MoE Inference with Warp Decode

Learn how Cursor's warp decode technique uses GPU kernel optimizations and warp-level primitives to achieve 300+ tokens per second on Blackwell hardware.

Moe Models · Gpu Optimization · Cuda Kernels

AI Engineering

How to Use Amazon Polly's Bidirectional Streaming API

Learn how to use Amazon Polly’s new HTTP/2 bidirectional streaming to reduce latency in real-time conversational AI by streaming text and audio simultaneously.

Amazon Polly · Text To Speech · Aws Cloud

AI Engineering

What Are Parameters in AI Models?

Parameters are the numbers that make AI models work. Here's what they are, why models have billions of them, and what the count actually tells you about capability.

Parameters · Llm · Ai Engineering

AI Engineering

What Is Quantization in AI?

Quantization shrinks AI models by reducing numerical precision. Here's how it works, what formats exist, and how to choose the right tradeoff between size, speed, and quality.

Quantization · Llm · Inference

AI Engineering

What Is AI Inference and How Does It Work?

Inference is where AI models do their actual work. Here's what happens during inference, why it's the bottleneck, and what determines speed and cost.

Inference · Llm · Ai Engineering

AI Engineering

How to Build a Domain-Specific Embedding Model

Learn NVIDIA's recipe for fine-tuning a domain-specific embedding model in hours using synthetic data, hard negatives, BEIR, and NIM.

Embeddings · Retrieval · Fine Tuning