Blog
AI engineering insights, practical advice, and things I'm learning.
Ai Engineering
How to Optimize MoE Inference with Warp Decode
Learn how Cursor's warp decode technique uses GPU kernel optimizations and warp-level primitives to achieve 300+ tokens per second on Blackwell hardware.
Moe Models · Gpu Optimization · Cuda Kernels
Ai Agents
How to Build Advanced AI Agents with OpenClaw v2026
Learn to master OpenClaw v2026.3.22 by configuring reasoning files, integrating ClawHub skills, and deploying secure agent sandboxes.
Openclaw · Ai Agents · Clawhub
Ai Engineering
How to Use Amazon Polly's Bidirectional Streaming API
Learn how to use Amazon Polly’s new HTTP/2 bidirectional streaming to reduce latency in real-time conversational AI by streaming text and audio simultaneously.
Amazon Polly · Text To Speech · Aws Cloud
Ai Agents
How to Speed Up Regex Search for AI Agents
Learn how Cursor uses local sparse n-gram indexes to make regex search fast enough for interactive AI agent workflows.
Regex Search · Local Indexing · Sparse N Grams
Ai Engineering
What Are Parameters in AI Models?
Parameters are the numbers that make AI models work. Here's what they are, why models have billions of them, and what the count actually tells you about capability.
Parameters · Llm · Ai Engineering
Ai Engineering
What Is Quantization in AI?
Quantization shrinks AI models by reducing numerical precision. Here's how it works, what formats exist, and how to choose the right tradeoff between size, speed, and quality.
Quantization · Llm · Inference
Ai Engineering
What Is AI Inference and How Does It Work?
Inference is where AI models do their actual work. Here's what happens during inference, why it's the bottleneck, and what determines speed and cost.
Inference · Llm · Ai Engineering
Ai Engineering
How to Build a Domain-Specific Embedding Model
Learn NVIDIA's recipe for fine-tuning a domain-specific embedding model in hours using synthetic data, hard negatives, BEIR, and NIM.
Embeddings · Retrieval · Fine Tuning
Ai Coding
How to Set Up Claude Code Channels
Connect Claude Code to Telegram and Discord so you can message your coding session from your phone.
Claude Code · Telegram · Discord
Ai Engineering
How Cursor Built Composer 2 on Top of Kimi K2.5
Cursor's Composer 2 is built on Kimi K2.5. What continued pretraining, reinforcement learning, and self-summarization mean, and how they work.
Cursor · Kimi K2 5 · Reinforcement Learning
Ai Engineering
What Is Mixture-of-Experts (MoE) in AI?
MoE models have a trillion parameters but only activate a fraction per token. How expert routing works, why it matters for cost, and which major models use it.
Mixture Of Experts · Moe · Llm Architecture
Ai Engineering
What Is Continued Pretraining in AI?
Continued pretraining adapts a general LLM to a specific domain using large unlabeled data. How it works, how it differs from fine-tuning, and real examples.
Continued Pretraining · Llm Training · Domain Adaptation
Ai Engineering
Continued Pretraining vs RAG: Two Ways to Add Knowledge
Continued pretraining bakes knowledge into model weights. RAG injects it at query time. When to use each, where each breaks down, and why you often need both.
Continued Pretraining · Rag · Retrieval Augmented Generation
Ai Engineering
How to Build Enterprise AI with Mistral Forge on Your Own Data
Learn how Mistral Forge helps enterprises build custom AI models with private data, synthetic data, evals, and flexible deployment.
Mistral Forge · Enterprise Ai · Custom Models
Ai Engineering
How to Deploy NVIDIA Dynamo 1.0 for Production AI Inference Across GPU Clusters
Learn how to use NVIDIA Dynamo 1.0 to orchestrate scalable AI inference with KV routing, multimodal support, and Kubernetes scheduling.
Nvidia Dynamo · Inference Optimization · Gpu Clusters
Ai Engineering
How to Run NVIDIA Nemotron 3 Nano 4B Locally on Jetson and RTX
Learn to deploy NVIDIA's Nemotron 3 Nano 4B locally with BF16, FP8, or GGUF on Jetson, RTX, vLLM, TensorRT-LLM, and llama.cpp.
Local Llms · Edge Ai · Nvidia
Career
AI Engineer Career Path: Skills, Salary, and How to Get Started
AI engineering is a distinct role from ML engineering or data science. Here's what AI engineers do, what skills you need, what the pay looks like, and how to break in from other software roles.
Career · Ai Engineer · Skills
Ai Agents
How to Choose Between GPT-5.4 Mini and Nano for Coding Agents and High-Volume API Tasks
Learn when to use GPT-5.4 mini vs nano for coding, tool use, subagents, and cost-sensitive API workflows.
Openai · Gpt 5 4 Mini · Gpt 5 4 Nano
Ai Engineering
How to Deploy Mistral Small 4 for Multimodal Reasoning and Coding
Learn how to deploy Mistral Small 4 with reasoning controls, multimodal input, and optimized serving on API, Hugging Face, or NVIDIA.
Mistral Small 4 · Mistral Ai · Multimodal Models
Ai Engineering
How to Get Started with Open-H, GR00T-H, and Cosmos-H for Healthcare Robotics Research
Learn how to use NVIDIA's new Open-H dataset and GR00T-H and Cosmos-H models to build and evaluate healthcare robotics systems.
Healthcare Robotics · Physical Ai · Robotics Datasets