System Prompts: How to Write Effective LLM Instructions
System prompts define how your LLM behaves. Here's how to structure them, what mistakes to avoid, and how provider-specific behavior affects your prompt strategy.
The system prompt is the instruction layer between you and the model. It defines the persona, constraints, output format, and behavioral boundaries for every response. A well-written system prompt is the difference between a model that does what you need and one that technically works but constantly requires correction.
Every production LLM application has a system prompt. If you’re not writing one, you’re relying on the model’s defaults, which are optimized for general helpfulness, not for your specific use case.
What System Prompts Do
The system prompt sits at the beginning of the message array, before any user messages. The model treats it as persistent context that applies to the entire conversation:
messages = [
{"role": "system", "content": "You are a customer support agent for Acme Corp..."},
{"role": "user", "content": "I need to return my order."},
]
The system message shapes model behavior in several ways:
- Persona: Who the model is, what expertise it has, how it communicates
- Constraints: What the model should and shouldn’t do
- Format: How responses should be structured
- Knowledge boundaries: What the model knows about (your product, your domain) and what it should redirect or decline
The model doesn’t become the persona. It adjusts its probability distribution over outputs to be more consistent with the system prompt instructions. A system prompt that says “respond only in JSON” dramatically increases the probability of JSON output, but it’s not a hard constraint unless you combine it with structured output mode.
Structuring a System Prompt
The most effective system prompts follow a predictable structure:
Role
Start with who the model is. Be specific. “You are a helpful assistant” is too vague to be useful. “You are a senior Python developer who reviews code for security vulnerabilities” gives the model a clear frame for its responses.
You are a technical support agent for CloudDeploy, a container orchestration platform.
You help developers troubleshoot deployment issues, explain error messages,
and guide them through configuration changes.
Constraints
Define what the model should avoid. Constraints are more reliable than open-ended permissions because they narrow the output space:
Rules:
- Only answer questions related to CloudDeploy. For unrelated questions,
politely redirect to the appropriate resource.
- Never suggest workarounds that bypass security controls.
- If you don't know the answer, say so. Do not guess.
- Do not make up features or API endpoints that don't exist.
Constraints should be specific and testable. “Be helpful” is not a constraint. “Do not provide pricing information; direct pricing questions to sales@company.com” is a constraint you can evaluate.
Output Format
If you need a consistent output structure, define it explicitly:
Response format:
1. Acknowledge the issue in one sentence.
2. Provide the likely cause.
3. Give step-by-step resolution instructions.
4. If the issue might recur, explain how to prevent it.
For programmatic consumption, specify the exact schema. Pair this with structured output enforcement from the API for guaranteed compliance.
Context and Knowledge
Include any domain-specific information the model needs. This could be product documentation snippets, terminology definitions, or business rules:
Product context:
- CloudDeploy supports Kubernetes 1.28+ only.
- The CLI tool is called "cddeploy" (not "clouddeploy" or "cd").
- Free tier is limited to 3 clusters and 10 deployments per cluster.
This section tends to grow over time. Review it periodically and remove anything outdated. Every token here is included in every request, which affects both cost and the model’s attention budget.
Common Mistakes
Being too vague. “Be professional and helpful” tells the model nothing it doesn’t already default to. Every instruction should change behavior from what the model would do without it.
Contradictory instructions. “Be concise” and “provide comprehensive, detailed answers” in the same prompt creates ambiguity. The model resolves contradictions unpredictably. Pick a direction and commit.
Instruction overload. A 3,000-token system prompt with 40 rules is hard for the model to follow consistently. Prioritize. The most important constraints should come first, and there shouldn’t be more than 10-15 rules. If you need more, you may need to route to different prompts for different tasks.
Negative-only instructions. A prompt that’s all “don’t do this, don’t do that” without positive guidance leaves the model guessing what it should do. Balance constraints with clear direction.
Assuming persistence across requests. Each API call is stateless. The system prompt must be included in every request. If you update your system prompt, the change takes effect immediately on the next request. There’s no deployment step, which is powerful but also means accidental changes propagate instantly.
Provider-Specific Behavior
Different providers handle system prompts differently:
OpenAI treats the system message as a privileged instruction that the model should follow above user messages. With GPT-5.4, system prompt adherence is strong, especially for format and constraint instructions.
Anthropic uses a system parameter separate from the message array. Claude gives heavy weight to system instructions and is particularly responsive to persona and constraint definitions. Anthropic recommends putting examples and long context in the system prompt for better prompt caching efficiency.
Google (Gemini) supports system instructions as a dedicated field. Behavior is similar to other providers, with system instructions influencing response style and content boundaries.
The core principles work across providers, but test your prompts with the model you’ll deploy. A system prompt optimized for GPT-5.4 might need adjustments for Claude, especially around format compliance and constraint interpretation.
Testing and Versioning
System prompts are code. Treat them accordingly:
Version control. Store prompts in your repository, not in a dashboard. Track changes with git. When behavior regresses, you can diff the prompt to find what changed.
Eval sets. Maintain a set of test inputs with expected outputs. Run them after every prompt change. This is the same principle as agent evaluation applied to the prompt layer. A change that improves one behavior might break another.
A/B testing. For user-facing applications, test prompt changes on a subset of traffic before rolling out. Measure the metrics that matter: task completion, user satisfaction, error rate.
Prompt as config. In production, load system prompts from configuration (environment variables, config files, feature flags) rather than hardcoding them. This lets you update prompts without deploying code.
A Template to Start From
You are [specific role] for [product/company].
Your job is to [primary task]. You communicate in a [tone] style.
Rules:
- [Constraint 1: most important boundary]
- [Constraint 2: second most important]
- [Constraint 3: format or content rule]
- If you don't know, say so. Do not guess.
Response format:
[Define structure if needed]
Context:
[Product-specific information the model needs]
Fill this in for your use case, test it against your eval set, and iterate. A system prompt is never finished. It evolves as you discover new edge cases and as user behavior shifts. The investment in writing a good one pays off on every request. For a broader introduction to prompt engineering techniques, including how system prompts interact with few-shot examples and chain of thought, those guides cover the full picture.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
Chain of Thought Prompting: A Developer Guide
Chain of thought prompting makes LLMs reason through problems step by step. Here's when it works, when it doesn't, and how to implement it with practical patterns.
Few-Shot Prompting: How to Guide LLMs with Examples
Few-shot prompting teaches LLMs by example instead of instruction. Here's how to choose examples, format them, and know when few-shot is the right approach vs. fine-tuning.
Prompt Engineering Guide: How to Write Better AI Prompts
Prompting isn't about magic phrases. It's structured thinking that determines output quality. Here's how to write prompts that actually work, from frameworks to chain-of-thought to system prompts.
How to Stream LLM Responses in Your Application
Streaming LLM responses reduces perceived latency and improves UX. Here's how server-sent events work, how to implement streaming with OpenAI and Anthropic, and what to watch for in production.
What Is AI Temperature and How Does It Affect Output?
Temperature controls how random or deterministic an AI model's output is. Here's what it does technically, how it relates to top-p and top-k, and when to adjust it.
Stripe Launches Machine Payments Protocol for AI Agents
Stripe and Tempo released MPP, an open standard that lets AI agents make autonomous streaming payments across stablecoins, cards, and Bitcoin Lightning.