xAI Launches Grok 4.20 and $10 SuperGrok Lite Subscription
xAI unveils Grok 4.20 with a 65% reduction in hallucinations and launches SuperGrok Lite, a $10/month tier featuring multimodal AI video and image tools.
On March 26, 2026, xAI released Grok 4.20 and a new $10 SuperGrok Lite subscription. The release introduces a specialized routing architecture designed to lower error rates while pushing API costs down to $2.00 per million tokens. For developers building systems that require verifiable citations, the model’s new internal consensus approach changes how reasoning steps are handled in production.
Adversarial Consensus Architecture
Grok 4.20 achieves a 65% reduction in hallucinations compared to Grok 4.1. The model’s baseline error rate drops from 12% to 4.2%. This accuracy improvement relies on an internal system called Adversarial Consensus.
Every query automatically triggers four specialized internal agents before generating an output. The agents include Grok acting as captain, Harper checking facts, Benjamin handling math and logic, and Lucas providing contrarian or creative angles. These nodes debate and cross-check each other to finalize a response.
If you design multi-agent systems, this native routing shifts the debate mechanism from your application layer directly into the model’s inference process. The model supplements this reasoning with direct access to X Search and general Web Search. This produces inline citations linking to specific platform posts and external URLs.
Context Window and API Economics
The model supports a 2.0 million token context window. It targets a low latency profile of approximately 250 milliseconds per request.
The API pricing sits at $2.00 per million tokens. xAI positions this as up to 64x cheaper than previous frontier reasoning models. At this price point, feeding large codebases or extensive document libraries into the prompt becomes viable for routine processing tasks.
SuperGrok Subscription Tiers
The consumer rollout introduces SuperGrok Lite at $10 per month. The tier targets casual creators who need multimodal tools but do not require enterprise compute. Users gain access to basic AI image generation through the Aurora model and video generation via Grok Imagine.
Video outputs on the Lite tier are restricted to 480p resolution and a maximum duration of six seconds per clip, with an undisclosed daily generation cap. The plan also doubles the chat session limits compared to the free tier and allows users to configure one custom agent in Expert mode.
| Subscription Tier | Monthly Price | Video Capability | Agent Access |
|---|---|---|---|
| SuperGrok Lite | $10 | 480p, 6 seconds | 1 custom agent |
| SuperGrok Heavy | $300 | 720p, 30 seconds | 16-agent reasoning |
Infrastructure and Production Context
Grok 4.20 was trained on Colossus, xAI’s 200,000 GPU cluster located in Memphis. The release follows rapid structural changes at the company. In January 2026, xAI raised $20 billion at a $230 billion valuation. Weeks later, SpaceX acquired the organization to integrate AI models across its aerospace and Starlink operations. The aggressive release cadence reflects this newly consolidated compute footprint.
Evaluate Grok 4.20’s API if your current stack relies on external orchestration frameworks to force model consensus. You can likely strip out custom debate prompts and reduce your tool-calling latency by letting the native Adversarial Consensus handle fact-checking internally. Test the inline citation feature against your existing retrieval systems to see if the integrated X Search provides better source attribution for real-time queries.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
What Is an LLM? How Large Language Models Actually Work
LLMs predict text, they don't understand it. Here's how large language models work under the hood, from training to transformers to next-token prediction, and why it matters for how you use them.
How to Use Amazon Polly's Bidirectional Streaming API
Learn how to use Amazon Polly’s new HTTP/2 bidirectional streaming to reduce latency in real-time conversational AI by streaming text and audio simultaneously.
Google DeepMind Releases AI Manipulation Toolkit
DeepMind's new toolkit uses human-in-the-loop studies to measure how AI models exploit cognitive vulnerabilities and identifies key manipulation tactics.
Google Says Post-Quantum Migration Can't Wait Until 2035
Google warns that quantum computers could break RSA-2048 sooner than expected, pushing its migration deadline to 2029, years ahead of NIST's 2035 target.
Cohere Transcribe debuts as open-source ASR model
Cohere Transcribe launches as a 2B open-source speech-to-text model with 14-language support, self-hosting, and vLLM serving.