Vertical 60-Second Video Summaries Arrive in Google NotebookLM
Google's NotebookLM now uses the Gemini 3.1 Flash-Lite Image model to generate source-conditioned, 60-second vertical video summaries of uploaded documents.
Google has introduced Short Video Overviews to NotebookLM, allowing you to convert uploaded documents and research notes into 60-second vertical videos. Designed for a mobile-first 9:16 format, the feature synthesizes dense PDFs and web links into narrated slideshows using AI-generated art and kinetic typography. This expands the tool’s utility beyond standard text summaries, bridging the gap between deep research and quick, digestible media.
Rendering Pipeline and Models
The generation engine relies on a multi-model architecture anchored by Nano Banana 2 Lite, an internal designation for the Gemini 3.1 Flash-Lite Image model. This specific variant prioritizes rendering speed, capable of generating individual stylistic frames in approximately four seconds.
Despite the fast per-frame generation, the full pipeline involves significant orchestration. The NotebookLM “Studio” process must parse the source text, generate a narrative script, synthesize the voiceover, and composite the kinetic typography over the AI art. Because of this multi-step pipeline, a single 60-second clip can currently take up to 30 minutes to render completely.
Unlike standard open-ended video generators, these clips are strictly source-conditioned. The model architecture restricts the visual script and narration to the facts present in your uploaded materials. This grounding mechanism prevents the model from injecting outside context, maintaining accuracy for technical or academic subjects.
Formatting and Early Feedback
The visual engine supports multiple artistic directions. You can select from Classic, Whiteboard, and Retro Print styles, dictating how the paper cutout-style AI art is rendered. In a demonstration covering historical events, the system utilized stylized cutout animations to match the narration. The initial rollout for the tool is restricted to English.
Early adopters in research and crypto analysis have utilized the tool to convert dense tokenomics papers into fast team briefings. However, because the pipeline operates without a human review layer, some users have encountered minor audio glitches and visual inaccuracies. If you routinely evaluate AI output for technical precision, you still need to verify the generated clips against the original source documents before distribution.
Pricing and Availability
Google is rolling the feature out through a phased, tiered approach across both the web interface and the NotebookLM mobile app.
| Subscription Tier | Monthly Cost | Daily Video Limit | Availability |
|---|---|---|---|
| Google AI Ultra | $99.99 | 200 | Immediate |
| Google AI Pro | $19.99 | 20 | Immediate |
| Free Tier | $0.00 | 0 | Coming Soon |
The heavy compute requirements of the Studio rendering pipeline are reflected in the pricing structure. The feature requires a paid tier for immediate access, though Google has indicated that free users will receive a lower-tier allocation in the near future.
If you manage dense internal documentation or lengthy research reports, Short Video Overviews offer a programmatic way to format that information for mobile consumption. You should account for the 30-minute processing time by treating generation as an asynchronous batch task rather than a real-time retrieval tool.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
Train Multimodal Sentence Transformers for Visual Retrieval
Learn how to finetune multimodal embedding and reranker models for text, image, and audio using the updated Sentence Transformers library.
Single-Weight Gemini Omni Unifies Multimodal Video Generation
Google's Gemini Omni collapses text, image, audio, and video generation into a single set of model weights to enable conversational video editing.
Gemini Omni Flash Unifies Video Generation at 10 Cents a Second
Google DeepMind has launched Nano Banana 2 Lite for rapid image generation and opened Gemini Omni Flash to developers for unified multimodal video editing.
Google Ships 9 Gemini Omni Demos Alongside 3.5 Flash
Google has released nine demonstration videos showcasing Gemini Omni's physics-aware video generation and the benchmark results for Gemini 3.5 Flash.
Gemini 1.5 Flash Now Does Real-Time Voice
The new Multimodal Live API enables developers to build low-latency, expressive speech-to-speech applications with advanced emotional inflection.