Blog
AI engineering insights, practical advice, and things I'm learning.
AI Engineering
How Cursor Built Composer 2 on Top of Kimi K2.5
Cursor's Composer 2 is built on Kimi K2.5. What continued pretraining, reinforcement learning, and self-summarization mean, and how they work.
Cursor · Kimi K2 5 · Reinforcement Learning
AI Engineering
What Is Mixture-of-Experts (MoE) in AI?
MoE models have a trillion parameters but only activate a fraction per token. How expert routing works, why it matters for cost, and which major models use it.
Mixture Of Experts · Moe · Llm Architecture
AI Engineering
What Is Continued Pretraining in AI?
Continued pretraining adapts a general LLM to a specific domain using large unlabeled data. How it works, how it differs from fine-tuning, and real examples.
Continued Pretraining · Llm Training · Domain Adaptation
AI Engineering
Continued Pretraining vs RAG: Two Ways to Add Knowledge
Continued pretraining bakes knowledge into model weights. RAG injects it at query time. When to use each, where each breaks down, and why you often need both.
Continued Pretraining · Rag · Retrieval Augmented Generation
AI Engineering
How to Build Enterprise AI with Mistral Forge on Your Own Data
Learn how Mistral Forge helps enterprises build custom AI models with private data, synthetic data, evals, and flexible deployment.
Mistral Forge · Enterprise Ai · Custom Models
AI Engineering
How to Deploy NVIDIA Dynamo 1.0 for Production AI Inference Across GPU Clusters
Learn how to use NVIDIA Dynamo 1.0 to orchestrate scalable AI inference with KV routing, multimodal support, and Kubernetes scheduling.
Nvidia Dynamo · Inference Optimization · Gpu Clusters
AI Engineering
How to Run NVIDIA Nemotron 3 Nano 4B Locally on Jetson and RTX
Learn to deploy NVIDIA's Nemotron 3 Nano 4B locally with BF16, FP8, or GGUF on Jetson, RTX, vLLM, TensorRT-LLM, and llama.cpp.
Local Llms · Edge Ai · Nvidia
AI Engineering
How to Deploy Mistral Small 4 for Multimodal Reasoning and Coding
Learn how to deploy Mistral Small 4 with reasoning controls, multimodal input, and optimized serving on API, Hugging Face, or NVIDIA.
Mistral Small 4 · Mistral Ai · Multimodal Models
AI Engineering
How to Get Started with Open-H, GR00T-H, and Cosmos-H for Healthcare Robotics Research
Learn how to use NVIDIA's new Open-H dataset and GR00T-H and Cosmos-H models to build and evaluate healthcare robotics systems.
Healthcare Robotics · Physical Ai · Robotics Datasets
AI Engineering
How to Use Claude Across Excel and PowerPoint with Shared Context and Skills
Learn how to use Claude's shared Excel and PowerPoint context, Skills, and enterprise gateways for faster analyst workflows.
Anthropic · Claude · Excel
AI Engineering
How to Reduce LLM API Costs in Production
LLM API costs add up fast in production. Here are the practical strategies that work: prompt caching, model routing, batching, output limits, and cost-per-task tracking.
Llm Costs · Prompt Caching · Ai Engineering
AI Engineering
LLM Observability: How to Monitor AI Applications
Traditional monitoring doesn't cover LLM applications. Here's what to log, how to trace multi-step chains, and how to detect quality regressions before users do.
Observability · Monitoring · Llm Ops
AI Engineering
How Function Calling Works in LLMs
Function calling lets LLMs interact with external systems by requesting structured tool executions. Here's how the loop works, how to define tools, and what to watch for across providers.
Function Calling · Tool Use · Llms
AI Engineering
How to Stream LLM Responses in Your Application
Streaming LLM responses reduces perceived latency and improves UX. Here's how server-sent events work, how to implement streaming with OpenAI and Anthropic, and what to watch for in production.
Streaming · Llms · Server Sent Events
AI Engineering
How to Evaluate AI Output (LLM-as-Judge Explained)
Traditional tests don't work for AI output. Here's how to evaluate quality using LLM-as-judge, automated checks, human review, and continuous evaluation frameworks.
Evaluation · Llm As Judge · Ai Engineering
AI Engineering
How to Run IBM Granite 4.0 1B Speech for Multilingual Edge ASR and Translation
Learn how to deploy IBM Granite 4.0 1B Speech for fast multilingual ASR and translation on edge devices.
Speech Models · Edge Ai · Multilingual Asr
AI Engineering
Context Engineering: The Most Important AI Skill in 2026
Context engineering is replacing prompt engineering as the critical AI skill. Learn what it is, why it matters more than prompting, and how to manage state, memory, and information flow in AI systems.
Context Engineering · Prompt Engineering · Rag
AI Engineering
How to Choose a Vector Database in 2026
Pinecone, Weaviate, Qdrant, pgvector, or Chroma? Here's how to pick the right vector database for your AI application based on scale, infrastructure, and actual needs.
Vector Database · Embeddings · Rag
AI Engineering
GPT vs Claude vs Gemini: Which AI Model Should You Use?
A practical comparison of GPT, Claude, and Gemini. Their real strengths, pricing, context windows, and which model fits which task in 2026.
Gpt · Claude · Gemini
AI Engineering
AI Agent Frameworks Compared: LangChain vs CrewAI vs LlamaIndex
A practical comparison of the top AI agent frameworks in 2026. When to use LangChain, CrewAI, or LlamaIndex, their strengths, tradeoffs, and what actually works in production.
Langchain · Crewai · Llamaindex