Lyria 3 Now Lets Developers Generate Full Songs
Google added Lyria 3 to the Gemini API and AI Studio, letting developers generate songs with lyrics, structure controls, and image input.
Google has opened Lyria 3 to developers in public preview through the Gemini API and Google AI Studio. The release adds two preview models, lyria-3-pro-preview for full songs up to about three minutes and lyria-3-clip-preview for 30-second generations. If you build music tools, creator workflows, or multimodal media apps, Google just moved Lyria from limited product surface area into a real programmable interface.
Model scope
The split between Lyria 3 Pro and Lyria 3 Clip is straightforward. Pro targets longer-form generation, Clip targets shorter outputs and lower per-request cost.
| Model | API ID | Output length | Price |
|---|---|---|---|
| Lyria 3 Pro Preview | lyria-3-pro-preview | up to ~3 minutes | $0.08 per song |
| Lyria 3 Clip Preview | lyria-3-clip-preview | 30 seconds | $0.04 per song |
This matters because Google is pricing music generation per request, not per token. If you are used to text model budgeting, the operational model is closer to image or video generation than standard LLM metering. Cost estimation becomes simpler, and product design starts with output length tiers instead of token ceilings. Teams already thinking about API cost control should treat song length as the primary pricing lever.
Prompt control
Google is exposing more than plain text-to-music prompting. Lyria 3 supports natural-language musical direction, tempo conditioning, time-aligned lyrics, and section-aware prompting across structures such as intro, verse, chorus, and bridge.
For developers, this is the real upgrade. Full-song generation is useful, but structural control is what makes it productizable. A songwriting assistant, brand music generator, or short-form video soundtrack tool needs controllable composition, not just one-shot audio synthesis. The same prompting discipline that matters in text systems still applies here, especially if your app needs repeatable outputs and style constraints. If your team already works on prompt engineering or system prompts, the pattern carries over cleanly.
Multimodal input and output
Lyria 3 also accepts multimodal input. The Gemini API supports up to 10 images alongside the text prompt to influence composition.
The response can include both audio and text, with responseModalities: ["AUDIO", "TEXT"]. In practice, that means you can generate a track and receive lyrics in the same response flow. Google also states that lyrics are generated in the language of the prompt, and the docs include multilingual prompting examples.
For app builders, this opens a useful design path: image-conditioned music generation for ads, moodboards, scene planning, or social content pipelines. If you are already building multimodal applications on Gemini, this release extends the platform into music instead of requiring a separate specialist stack. It fits the broader pattern in GPT vs Claude vs Gemini, where model platforms are becoming media platforms.
API surface
The API is live enough to matter. Google provides endpoint patterns for v1beta/models/lyria-3-pro-preview:generateContent and v1beta/models/lyria-3-clip-preview:generateContent, with examples across Python, JavaScript, Go, Java, C#, and REST in the music generation docs.
Google AI Studio also now includes a dedicated music workspace, but it requires a paid API key. This is a developer release, not a free playground launch.
Safety and product constraints
Every generated audio output includes a SynthID watermark. Prompts are checked by safety filters, and requests for specific artist voices or copyrighted lyrics are blocked.
Those constraints shape actual product design. If your feature depends on artist-style mimicry, it will not survive policy enforcement. If your workflow expects unrestricted lyric completion from protected works, you need a different experience design. Safety filters are not an edge case here, they are part of the API contract. The same operational mindset used in evaluating AI output applies to music generation, especially for prompt rejection handling and user-facing fallbacks.
Platform position
This launch is specifically about developer access through Gemini and AI Studio. Google had already brought earlier Lyria versions to Vertex AI, first in preview and then with Lyria 2 generally available. March 25 changes the distribution model by making the newer generation available directly in the Gemini developer ecosystem, while Lyria 3 Pro is also expanding across Vertex AI and Google products.
Under the hood, Lyria 3 is a latent diffusion system over temporal audio latents, trained on Google TPUs with JAX and ML Pathways. Those details matter less for app integration than for expectations: this is a generative media model with structured controls and safety layers, not a conversational model with hidden reasoning features. Google explicitly notes it reasons over musical structure internally but does not expose thought blocks or thought signatures.
If you plan to ship with Lyria 3, prototype around the two hard constraints first: output length tier and safety filtering. Then design your prompts around song structure, lyrics alignment, and image conditioning, because those are the controls that turn a music demo into a usable product.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
What Are Parameters in AI Models?
Parameters are the numbers that make AI models work. Here's what they are, why models have billions of them, and what the count actually tells you about capability.
NVIDIA Ships Nemotron 3 Content Safety 4B for On-Device Filtering
NVIDIA released Nemotron 3 Content Safety 4B, a multilingual multimodal moderation model for text and images, on Hugging Face.
OpenAI has Shut Down Sora and a Billion-Dollar Disney Deal
OpenAI is shutting down Sora, calling it a 'side quest.' The framing tells you where AI companies think the real value is.
LiteLLM PyPI Attack Risks Credential Theft on Install
Compromised LiteLLM PyPI versions 1.82.7 and 1.82.8 could auto-run malware and steal credentials from Python environments.
Google Is Solving the LLM Memory Bottleneck with TurboQuant
Google Research published TurboQuant, a data-oblivious quantization algorithm that compresses LLM key-value caches to 3.5 bits per channel with zero accuracy loss and up to 8x speedup on H100 GPUs.