Blog
AI engineering insights, practical advice, and things I'm learning.
AI Engineering
How to Build Enterprise AI with Mistral Forge on Your Own Data
Learn how Mistral Forge helps enterprises build custom AI models with private data, synthetic data, evals, and flexible deployment.
Mistral Forge · Enterprise Ai · Custom Models
AI Engineering
How to Deploy NVIDIA Dynamo 1.0 for Production AI Inference Across GPU Clusters
Learn how to use NVIDIA Dynamo 1.0 to orchestrate scalable AI inference with KV routing, multimodal support, and Kubernetes scheduling.
Nvidia Dynamo · Inference Optimization · Gpu Clusters
AI Engineering
How to Run NVIDIA Nemotron 3 Nano 4B Locally on Jetson and RTX
Learn to deploy NVIDIA's Nemotron 3 Nano 4B locally with BF16, FP8, or GGUF on Jetson, RTX, vLLM, TensorRT-LLM, and llama.cpp.
Local Llms · Edge Ai · Nvidia
AI Engineering
How to Deploy Mistral Small 4 for Multimodal Reasoning and Coding
Learn how to deploy Mistral Small 4 with reasoning controls, multimodal input, and optimized serving on API, Hugging Face, or NVIDIA.
Mistral Small 4 · Mistral Ai · Multimodal Models
AI Engineering
How to Get Started with Open-H, GR00T-H, and Cosmos-H for Healthcare Robotics Research
Learn how to use NVIDIA's new Open-H dataset and GR00T-H and Cosmos-H models to build and evaluate healthcare robotics systems.
Healthcare Robotics · Physical Ai · Robotics Datasets
AI Engineering
How to Use Claude Across Excel and PowerPoint with Shared Context and Skills
Learn how to use Claude's shared Excel and PowerPoint context, Skills, and enterprise gateways for faster analyst workflows.
Anthropic · Claude · Excel
AI Engineering
How to Reduce LLM API Costs in Production
LLM API costs add up fast in production. Here are the practical strategies that work: prompt caching, model routing, batching, output limits, and cost-per-task tracking.
Llm Costs · Prompt Caching · Ai Engineering
AI Engineering
LLM Observability: How to Monitor AI Applications
Traditional monitoring doesn't cover LLM applications. Here's what to log, how to trace multi-step chains, and how to detect quality regressions before users do.
Observability · Monitoring · Llm Ops
AI Engineering
How Function Calling Works in LLMs
Function calling lets LLMs interact with external systems by requesting structured tool executions. Here's how the loop works, how to define tools, and what to watch for across providers.
Function Calling · Tool Use · Llms
AI Engineering
How to Stream LLM Responses in Your Application
Streaming LLM responses reduces perceived latency and improves UX. Here's how server-sent events work, how to implement streaming with OpenAI and Anthropic, and what to watch for in production.
Streaming · Llms · Server Sent Events
AI Engineering
How to Evaluate AI Output (LLM-as-Judge Explained)
Traditional tests don't work for AI output. Here's how to evaluate quality using LLM-as-judge, automated checks, human review, and continuous evaluation frameworks.
Evaluation · Llm As Judge · Ai Engineering
AI Engineering
How to Run IBM Granite 4.0 1B Speech for Multilingual Edge ASR and Translation
Learn how to deploy IBM Granite 4.0 1B Speech for fast multilingual ASR and translation on edge devices.
Speech Models · Edge Ai · Multilingual Asr
AI Engineering
Context Engineering: The Most Important AI Skill in 2026
Context engineering is replacing prompt engineering as the critical AI skill. Learn what it is, why it matters more than prompting, and how to manage state, memory, and information flow in AI systems.
Context Engineering · Prompt Engineering · Rag
AI Engineering
How to Choose a Vector Database in 2026
Pinecone, Weaviate, Qdrant, pgvector, or Chroma? Here's how to pick the right vector database for your AI application based on scale, infrastructure, and actual needs.
Vector Database · Embeddings · Rag
AI Engineering
GPT vs Claude vs Gemini: Which AI Model Should You Use?
A practical comparison of GPT, Claude, and Gemini. Their real strengths, pricing, context windows, and which model fits which task in 2026.
Gpt · Claude · Gemini
AI Engineering
AI Agent Frameworks Compared: LangChain vs CrewAI vs LlamaIndex
A practical comparison of the top AI agent frameworks in 2026. When to use LangChain, CrewAI, or LlamaIndex, their strengths, tradeoffs, and what actually works in production.
Langchain · Crewai · Llamaindex
AI Engineering
How to Build a RAG Application (Step by Step)
A practical walkthrough of building a RAG pipeline from scratch: chunking documents, generating embeddings, storing vectors, retrieving context, and generating grounded answers.
Rag · Retrieval Augmented Generation · Embeddings
AI Engineering
How to Run LLMs Locally on Your Machine
Running AI models locally gives you privacy, speed, and zero API costs. Here's what hardware you need, which tools to use, and how to choose the right model.
Local Llms · Ollama · Llama
AI Engineering
Structured Output from LLMs: JSON Mode Explained
LLMs generate text, but applications need structured data. Here's how JSON mode, function calling, and schema enforcement turn free-form AI output into reliable, typed data.
Structured Output · Json Mode · Function Calling
AI Engineering
Fine-Tuning vs RAG: When to Use Each Approach
RAG changes what the model knows. Fine-tuning changes how it behaves. Here's when to use each approach, their real tradeoffs, and why the answer is usually both.
Fine Tuning · Rag · Llm