Google Releases Veo 3.1 Lite for Low-Cost Video Generation via Gemini API
Google's new Veo 3.1 Lite model offers cost-effective 720p and 1080p video generation with native audio via the Gemini API and Google AI Studio.
On March 31, 2026, Google released Veo 3.1 Lite, an entry-level video generation model optimized for high-volume developer applications. The model matches the processing speed of Veo 3.1 Fast while cutting inference costs by more than half. It is available immediately in paid preview through the Gemini API and for testing in Google AI Studio.
Capabilities and Formatting
Veo 3.1 Lite supports both text-to-video and image-to-video pipelines. Developers can specify output durations of 4, 6, or 8 seconds per API call. The model renders video natively at 720p and 1080p resolutions.
The system handles flexible framing natively. You can generate landscape video at a 16:9 ratio or portrait content at 9:16 for mobile surfaces. The model also includes native audio generation out of the box. Sound effects and ambient noise are synthesized and synchronized directly with the visual content during the generation pass. If you build AI agents to automate content creation, this synchronized audio removes the need for a secondary sound-design step.
| Feature | Specification |
|---|---|
| Resolutions | 720p, 1080p |
| Aspect Ratios | 16:9 (Landscape), 9:16 (Portrait) |
| Durations | 4, 6, or 8 seconds |
| Audio | Native synchronized sound effects |
| Base Cost | $0.05 per second (720p) |
| API Model ID | veo-3.1-lite-generate-preview |
API Integration and Cost Structure
Google positions the Lite tier for rapid iteration and production scale. The base price begins at $0.05 per second for 720p generation. This prices the model at less than 50% of the cost of Veo 3.1 Fast.
You can access the model using the veo-3.1-lite-generate-preview ID in the Gemini API. To support the new pricing structure, Google will reduce the cost of the mid-tier Veo 3.1 Fast model on April 7, 2026. If you manage high-throughput media pipelines, this tier structure allows you to reduce LLM API costs by routing draft generations to the Lite tier before rendering final outputs.
Commercial Video Generation Market
The launch completes the Veo 3.1 family architecture. It arrives shortly after OpenAI’s decision to shut down Sora and pivot its research toward world simulation and robotics. Google is now competing directly with enterprise providers like Alibaba and its Seedance 2.0 model for commercial video generation.
The product launch was spearheaded by Alisa Fortin, Product Manager at Google DeepMind, and Guillaume Vernade, Gemini Developer Advocate. Their strategy establishes a clear entry point for developers building automated media platforms.
If you are building video workflows, map your resolution and duration requirements to the new tiering. Point your staging environments and early user previews to Veo 3.1 Lite to control spend, and reserve the heavier Veo 3.1 model for high-fidelity final renders.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
How to Implement Event-Driven Webhooks in the Gemini API
Learn how to configure static and dynamic webhooks in the Gemini API to eliminate polling overhead for long-running AI operations and agent workflows.
Google Ships 9 Gemini Omni Demos Alongside 3.5 Flash
Google has released nine demonstration videos showcasing Gemini Omni's physics-aware video generation and the benchmark results for Gemini 3.5 Flash.
Single-Weight Gemini Omni Unifies Multimodal Video Generation
Google's Gemini Omni collapses text, image, audio, and video generation into a single set of model weights to enable conversational video editing.
Decoupled DiLoCo, Training Across Regions Without Lockstep
Google DeepMind's Decoupled DiLoCo architecture allows asynchronous AI training across geographically distant compute clusters with mixed TPU hardware.
DeepMind's Alignment Bet: More Test-Time Compute
Google DeepMind researchers have published a study demonstrating that video and language model alignment dramatically improves through test-time scaling.