Ai Engineering 5 min read

What Are Parameters in AI Models?

Parameters are the numbers that make AI models work. Here's what they are, why models have billions of them, and what the count actually tells you about capability.

An AI model is a network of connections, and every connection has a number attached to it. That number is a parameter. It controls how strongly one part of the network influences another. A model with 7 billion parameters has 7 billion of these numbers, each one tuned during training to make the model produce useful output. After training, the parameters are fixed. They are the model.

What Parameters Do

Think of a neural network as layers of nodes connected by wires. Each wire has a dial on it. Turn the dial up, and the signal passing through that wire gets amplified. Turn it down, and the signal gets dampened. Parameters are those dials.

During training, the model processes enormous amounts of text and adjusts every dial, billions of times, until the network reliably produces good output for a given input. The final settings of all those dials, the trained parameters, represent everything the model “learned.” They encode grammar patterns, factual associations, reasoning shortcuts, and the structure of language itself.

Most parameters are weights, the values on those connections between neurons. There are also biases (small offsets that shift a neuron’s output) and embedding parameters (numbers that convert input tokens into vectors the network can process). When a model is described as “7B,” that’s the total count of all these numbers.

From Parameter Count to Memory

Every parameter is a number that takes up space in memory. How much space depends on the precision used to store it.

PrecisionBits per parameterMemory for 7BMemory for 70B
FP32 (full)32~28 GB~280 GB
FP16 / BF1616~14 GB~140 GB
INT88~7 GB~70 GB
INT44~3.5 GB~35 GB

The math is simple: parameter count multiplied by bytes per parameter. A 7B model at 16-bit precision uses 7 billion x 2 bytes = 14 GB. At 4-bit quantization, the same model fits in about 3.5 GB.

This is why parameter count is the first thing to check when deciding whether a model will run on your hardware. A 70B model at full precision needs 140 GB, far more than any single consumer GPU. At 4-bit quantization, it drops to 35 GB, which fits on a high-end GPU.

These figures cover only the weights. During inference, the model also needs memory for the KV cache, activations, and runtime overhead, typically adding 10-30% on top.

More Parameters, More Capacity

Language is complex. Capturing patterns in human text, from basic grammar to subtle reasoning, requires a network with enough parameters to encode those patterns. Each parameter stores a small piece. More parameters means more capacity to represent more distinctions.

A 1B model can handle simple extraction and classification. A 7B model handles summarization, code generation, and general conversation. A 70B model handles nuanced multi-step reasoning and performs well across a broader range of domains. The improvement at each tier is real but subject to diminishing returns. Going from 7B to 70B (10x) produces a clear quality jump. Going from 70B to 700B produces a smaller one relative to the cost.

Practical tiers:

  • 1B-3B: Run on phones and constrained devices. Best for structured tasks with predictable output formats.
  • 7B-8B: The sweet spot for running models locally. Fit on a laptop at 4-bit quantization. Capable enough for many production tasks.
  • 13B-34B: Stronger at multi-step reasoning and coherent long-form output. Need 32GB+ RAM or a 12GB+ GPU.
  • 70B: Approach frontier quality on many tasks. Need 64GB+ RAM or a high-end GPU.
  • 400B+: Datacenter-scale. Multi-GPU setups. Where the largest open and proprietary models sit.

Total Parameters vs. Active Parameters

Some models don’t use all their parameters on every input. Mixture-of-Experts (MoE) architectures have a large total parameter count but route each token to only a subset of the network. An MoE model with 47 billion total parameters might activate only 13 billion per token.

Total parameters determine download size and storage. Active parameters determine inference speed and compute cost. An MoE model with 47B total but 13B active runs at similar speed to a dense 13B model, while potentially matching a larger dense model on quality. When comparing inference costs, active parameters per token matters more than the headline number.

Beyond Parameter Count

More parameters doesn’t automatically mean a better model. Several factors matter as much or more.

Training data quality. A 7B model trained on carefully curated data can outperform a 70B model trained on noisy web scrapes. Data quality is why some small models consistently punch above their parameter class.

Architecture choices. Attention variants, normalization methods, and positional encoding all affect what a model can do at a given scale. Two 7B models with different architectures can have very different capabilities.

Fine-tuning and alignment. A base model and its instruction-tuned variant have identical parameter counts but entirely different behavior. The base model generates plausible text. The tuned model follows instructions.

Context window length. Two models of identical size but different context lengths have different practical capabilities. A 128K-context model can process large codebases. A 4K-context model cannot.

Parameter count tells you the scale, the approximate memory footprint, and the general capability tier. Everything beyond that requires benchmarks and testing on your actual task. For a deeper treatment of model selection, hardware tradeoffs, and when bigger is actually worth the cost, Get Insanely Good at AI covers the full picture at getaibook.com/book.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading