Blog
AI engineering insights, practical advice, and things I'm learning.
AI Engineering
How to Build a RAG Application (Step by Step)
A practical walkthrough of building a RAG pipeline from scratch: chunking documents, generating embeddings, storing vectors, retrieving context, and generating grounded answers.
Rag · Retrieval Augmented Generation · Embeddings
AI Engineering
How to Run LLMs Locally on Your Machine
Running AI models locally gives you privacy, speed, and zero API costs. Here's what hardware you need, which tools to use, and how to choose the right model.
Local Llms · Ollama · Llama
AI Engineering
Structured Output from LLMs: JSON Mode Explained
LLMs generate text, but applications need structured data. Here's how JSON mode, function calling, and schema enforcement turn free-form AI output into reliable, typed data.
Structured Output · Json Mode · Function Calling
AI Engineering
Fine-Tuning vs RAG: When to Use Each Approach
RAG changes what the model knows. Fine-tuning changes how it behaves. Here's when to use each approach, their real tradeoffs, and why the answer is usually both.
Fine Tuning · Rag · Llm
AI Engineering
What Is the Model Context Protocol (MCP)?
MCP standardizes how AI models connect to tools and data. Here's what the Model Context Protocol is, how it works, and why it matters for developers building AI applications.
Mcp · Model Context Protocol · Ai Agents
AI Engineering
What Is RAG? Retrieval-Augmented Generation Explained
RAG lets AI models pull in real data before generating a response. Here's how retrieval-augmented generation works, why it matters, and where it breaks down.
Rag · Retrieval Augmented Generation · Llms
AI Engineering
What Are Embeddings in AI? A Technical Explanation
Embeddings turn text into numbers that capture meaning. Here's how they work, why they matter for search and RAG, and how to choose the right model for your use case.
Embeddings · Vector Search · Ai Architecture
AI Engineering
Why AI Hallucinates and How to Reduce It
AI hallucination isn't a bug you can patch. It's a consequence of how language models work. Here's what causes it, how to measure it, and what actually reduces it.
Hallucination · Llms · Ai Safety
AI Engineering
What Is AI Temperature and How Does It Affect Output?
Temperature controls how random or deterministic an AI model's output is. Here's what it does technically, how it relates to top-p and top-k, and when to adjust it.
Temperature · Llm · Ai Engineering
AI Engineering
Context Windows Explained: Why Your AI Forgets
Context windows determine how much an AI model can 'see' at once. Here's what they are technically, how attention scales, and practical strategies for working within their limits.
Context Windows · Llms · Prompt Engineering
AI Engineering
What Is an LLM? How Large Language Models Actually Work
LLMs predict text, they don't understand it. Here's how large language models work under the hood, from training to transformers to next-token prediction, and why it matters for how you use them.
Llm · Large Language Models · Ai Engineering
AI Engineering
What Tokenization Means for Your Prompts
Tokenization isn't just a technical detail. It shapes how LLMs process your input. Understanding it changes the way you write prompts.
Tokenization · Llms · Prompt Engineering