What Are Parameters in AI Models?
Parameters are the numbers that make AI models work. Here's what they are, why models have billions of them, and what the count actually tells you about capability.
An AI model is a network of connections, and every connection has a number attached to it. That number is a parameter. It controls how strongly one part of the network influences another. A model with 7 billion parameters has 7 billion of these numbers, each one tuned during training to make the model produce useful output. After training, the parameters are fixed. They are the model.
What Parameters Do
Think of a neural network as layers of nodes connected by wires. Each wire has a dial on it. Turn the dial up, and the signal passing through that wire gets amplified. Turn it down, and the signal gets dampened. Parameters are those dials.
During training, the model processes enormous amounts of text and adjusts every dial, billions of times, until the network reliably produces good output for a given input. The final settings of all those dials, the trained parameters, represent everything the model “learned.” They encode grammar patterns, factual associations, reasoning shortcuts, and the structure of language itself.
Most parameters are weights, the values on those connections between neurons. There are also biases (small offsets that shift a neuron’s output) and embedding parameters (numbers that convert input tokens into vectors the network can process). When a model is described as “7B,” that’s the total count of all these numbers.
From Parameter Count to Memory
Every parameter is a number that takes up space in memory. How much space depends on the precision used to store it.
| Precision | Bits per parameter | Memory for 7B | Memory for 70B |
|---|---|---|---|
| FP32 (full) | 32 | ~28 GB | ~280 GB |
| FP16 / BF16 | 16 | ~14 GB | ~140 GB |
| INT8 | 8 | ~7 GB | ~70 GB |
| INT4 | 4 | ~3.5 GB | ~35 GB |
The math is simple: parameter count multiplied by bytes per parameter. A 7B model at 16-bit precision uses 7 billion x 2 bytes = 14 GB. At 4-bit quantization, the same model fits in about 3.5 GB.
This is why parameter count is the first thing to check when deciding whether a model will run on your hardware. A 70B model at full precision needs 140 GB, far more than any single consumer GPU. At 4-bit quantization, it drops to 35 GB, which fits on a high-end GPU.
These figures cover only the weights. During inference, the model also needs memory for the KV cache, activations, and runtime overhead, typically adding 10-30% on top.
More Parameters, More Capacity
Language is complex. Capturing patterns in human text, from basic grammar to subtle reasoning, requires a network with enough parameters to encode those patterns. Each parameter stores a small piece. More parameters means more capacity to represent more distinctions.
A 1B model can handle simple extraction and classification. A 7B model handles summarization, code generation, and general conversation. A 70B model handles nuanced multi-step reasoning and performs well across a broader range of domains. The improvement at each tier is real but subject to diminishing returns. Going from 7B to 70B (10x) produces a clear quality jump. Going from 70B to 700B produces a smaller one relative to the cost.
Practical tiers:
- 1B-3B: Run on phones and constrained devices. Best for structured tasks with predictable output formats.
- 7B-8B: The sweet spot for running models locally. Fit on a laptop at 4-bit quantization. Capable enough for many production tasks.
- 13B-34B: Stronger at multi-step reasoning and coherent long-form output. Need 32GB+ RAM or a 12GB+ GPU.
- 70B: Approach frontier quality on many tasks. Need 64GB+ RAM or a high-end GPU.
- 400B+: Datacenter-scale. Multi-GPU setups. Where the largest open and proprietary models sit.
Total Parameters vs. Active Parameters
Some models don’t use all their parameters on every input. Mixture-of-Experts (MoE) architectures have a large total parameter count but route each token to only a subset of the network. An MoE model with 47 billion total parameters might activate only 13 billion per token.
Total parameters determine download size and storage. Active parameters determine inference speed and compute cost. An MoE model with 47B total but 13B active runs at similar speed to a dense 13B model, while potentially matching a larger dense model on quality. When comparing inference costs, active parameters per token matters more than the headline number.
Beyond Parameter Count
More parameters doesn’t automatically mean a better model. Several factors matter as much or more.
Training data quality. A 7B model trained on carefully curated data can outperform a 70B model trained on noisy web scrapes. Data quality is why some small models consistently punch above their parameter class.
Architecture choices. Attention variants, normalization methods, and positional encoding all affect what a model can do at a given scale. Two 7B models with different architectures can have very different capabilities.
Fine-tuning and alignment. A base model and its instruction-tuned variant have identical parameter counts but entirely different behavior. The base model generates plausible text. The tuned model follows instructions.
Context window length. Two models of identical size but different context lengths have different practical capabilities. A 128K-context model can process large codebases. A 4K-context model cannot.
Parameter count tells you the scale, the approximate memory footprint, and the general capability tier. Everything beyond that requires benchmarks and testing on your actual task. For a deeper treatment of model selection, hardware tradeoffs, and when bigger is actually worth the cost, Get Insanely Good at AI covers the full picture at getaibook.com/book.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
Gimlet Labs Raises $80M Series A for AI Inference
Gimlet Labs raised an $80 million Series A led by Menlo Ventures to scale its multi-silicon AI inference cloud.
What Is AI Inference and How Does It Work?
Inference is where AI models do their actual work. Here's what happens during inference, why it's the bottleneck, and what determines speed and cost.
Continued Pretraining vs RAG: Two Ways to Add Knowledge
Continued pretraining bakes knowledge into model weights. RAG injects it at query time. When to use each, where each breaks down, and why you often need both.
GPT vs Claude vs Gemini: Which AI Model Should You Use?
A practical comparison of GPT, Claude, and Gemini. Their real strengths, pricing, context windows, and which model fits which task in 2026.
Structured Output from LLMs: JSON Mode Explained
LLMs generate text, but applications need structured data. Here's how JSON mode, function calling, and schema enforcement turn free-form AI output into reliable, typed data.