Google Launches Veo 3.1 Lite for Cheap AI Video Generation
Google's new Veo 3.1 Lite model offers cost-effective 720p and 1080p video generation with native audio via the Gemini API and Google AI Studio.
On March 31, 2026, Google released Veo 3.1 Lite, an entry-level video generation model optimized for high-volume developer applications. The model matches the processing speed of Veo 3.1 Fast while cutting inference costs by more than half. It is available immediately in paid preview through the Gemini API and for testing in Google AI Studio.
Capabilities and Formatting
Veo 3.1 Lite supports both text-to-video and image-to-video pipelines. Developers can specify output durations of 4, 6, or 8 seconds per API call. The model renders video natively at 720p and 1080p resolutions.
The system handles flexible framing natively. You can generate landscape video at a 16:9 ratio or portrait content at 9:16 for mobile surfaces. The model also includes native audio generation out of the box. Sound effects and ambient noise are synthesized and synchronized directly with the visual content during the generation pass. If you build AI agents to automate content creation, this synchronized audio removes the need for a secondary sound-design step.
| Feature | Specification |
|---|---|
| Resolutions | 720p, 1080p |
| Aspect Ratios | 16:9 (Landscape), 9:16 (Portrait) |
| Durations | 4, 6, or 8 seconds |
| Audio | Native synchronized sound effects |
| Base Cost | $0.05 per second (720p) |
| API Model ID | veo-3.1-lite-generate-preview |
API Integration and Cost Structure
Google positions the Lite tier for rapid iteration and production scale. The base price begins at $0.05 per second for 720p generation. This prices the model at less than 50% of the cost of Veo 3.1 Fast.
You can access the model using the veo-3.1-lite-generate-preview ID in the Gemini API. To support the new pricing structure, Google will reduce the cost of the mid-tier Veo 3.1 Fast model on April 7, 2026. If you manage high-throughput media pipelines, this tier structure allows you to reduce LLM API costs by routing draft generations to the Lite tier before rendering final outputs.
Commercial Video Generation Market
The launch completes the Veo 3.1 family architecture. It arrives shortly after OpenAI’s decision to shut down Sora and pivot its research toward world simulation and robotics. Google is now competing directly with enterprise providers like Alibaba and its Seedance 2.0 model for commercial video generation.
The product launch was spearheaded by Alisa Fortin, Product Manager at Google DeepMind, and Guillaume Vernade, Gemini Developer Advocate. Their strategy establishes a clear entry point for developers building automated media platforms.
If you are building video workflows, map your resolution and duration requirements to the new tiering. Point your staging environments and early user previews to Veo 3.1 Lite to control spend, and reserve the heavier Veo 3.1 model for high-fidelity final renders.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
How to Use Amazon Polly's Bidirectional Streaming API
Learn how to use Amazon Polly’s new HTTP/2 bidirectional streaming to reduce latency in real-time conversational AI by streaming text and audio simultaneously.
Google DeepMind Releases AI Manipulation Toolkit
DeepMind's new toolkit uses human-in-the-loop studies to measure how AI models exploit cognitive vulnerabilities and identifies key manipulation tactics.
Google's Lyria 3 Brings Song Generation to the Gemini API
Google added Lyria 3 to the Gemini API and AI Studio, letting developers generate songs with lyrics, structure controls, and image input.
Google DeepMind Unveils AGI Cognitive Evaluation Framework and Launches $200,000 Kaggle Hackathon
Google DeepMind introduced a 10-faculty framework for measuring AGI progress and opened a $200,000 Kaggle evaluation hackathon.
Gemini 3.1 Flash Live Launches for Real-Time Audio AI
Google launched Gemini 3.1 Flash Live, a low-latency audio-to-audio model for real-time dialogue, voice agents, and Search Live.