Google's Lyria 3 Brings Song Generation to the Gemini API

Google has opened Lyria 3 to developers in public preview through the Gemini API and Google AI Studio. The release adds two preview models, lyria-3-pro-preview for full songs up to about three minutes and lyria-3-clip-preview for 30-second generations. If you build music tools, creator workflows, or multimodal media apps, Google just moved Lyria from limited product surface area into a real programmable interface.

Model scope

The split between Lyria 3 Pro and Lyria 3 Clip is straightforward. Pro targets longer-form generation, Clip targets shorter outputs and lower per-request cost.

Model	API ID	Output length	Price
Lyria 3 Pro Preview	`lyria-3-pro-preview`	up to ~3 minutes	$0.08 per song
Lyria 3 Clip Preview	`lyria-3-clip-preview`	30 seconds	$0.04 per song

This matters because Google is pricing music generation per request, not per token. If you are used to text model budgeting, the operational model is closer to image or video generation than standard LLM metering. Cost estimation becomes simpler, and product design starts with output length tiers instead of token ceilings. Teams already thinking about API cost control should treat song length as the primary pricing lever.

Prompt control

Google is exposing more than plain text-to-music prompting. Lyria 3 supports natural-language musical direction, tempo conditioning, time-aligned lyrics, and section-aware prompting across structures such as intro, verse, chorus, and bridge.

For developers, this is the real upgrade. Full-song generation is useful, but structural control is what makes it productizable. A songwriting assistant, brand music generator, or short-form video soundtrack tool needs controllable composition, not just one-shot audio synthesis. The same prompting discipline that matters in text systems still applies here, especially if your app needs repeatable outputs and style constraints. If your team already works on prompt engineering or system prompts, the pattern carries over cleanly.

Multimodal input and output

Lyria 3 also accepts multimodal input. The Gemini API supports up to 10 images alongside the text prompt to influence composition.

The response can include both audio and text, with responseModalities: ["AUDIO", "TEXT"]. In practice, that means you can generate a track and receive lyrics in the same response flow. Google also states that lyrics are generated in the language of the prompt, and the docs include multilingual prompting examples.

For app builders, this opens a useful design path: image-conditioned music generation for ads, moodboards, scene planning, or social content pipelines. If you are already building multimodal applications on Gemini, this release extends the platform into music instead of requiring a separate specialist stack. It fits the broader pattern in GPT vs Claude vs Gemini, where model platforms are becoming media platforms.

API surface

The API is live enough to matter. Google provides endpoint patterns for v1beta/models/lyria-3-pro-preview:generateContent and v1beta/models/lyria-3-clip-preview:generateContent, with examples across Python, JavaScript, Go, Java, C#, and REST in the music generation docs.

Google AI Studio also now includes a dedicated music workspace, but it requires a paid API key. This is a developer release, not a free playground launch.

Safety and product constraints

Every generated audio output includes a SynthID watermark. Prompts are checked by safety filters, and requests for specific artist voices or copyrighted lyrics are blocked.

Those constraints shape actual product design. If your feature depends on artist-style mimicry, it will not survive policy enforcement. If your workflow expects unrestricted lyric completion from protected works, you need a different experience design. Safety filters are not an edge case here, they are part of the API contract. The same operational mindset used in evaluating AI output applies to music generation, especially for prompt rejection handling and user-facing fallbacks.

Platform position

This launch is specifically about developer access through Gemini and AI Studio. Google had already brought earlier Lyria versions to Vertex AI, first in preview and then with Lyria 2 generally available. March 25 changes the distribution model by making the newer generation available directly in the Gemini developer ecosystem, while Lyria 3 Pro is also expanding across Vertex AI and Google products.

Under the hood, Lyria 3 is a latent diffusion system over temporal audio latents, trained on Google TPUs with JAX and ML Pathways. Those details matter less for app integration than for expectations: this is a generative media model with structured controls and safety layers, not a conversational model with hidden reasoning features. Google explicitly notes it reasons over musical structure internally but does not expose thought blocks or thought signatures.

If you plan to ship with Lyria 3, prototype around the two hard constraints first: output length tier and safety filtering. Then design your prompts around song structure, lyrics alignment, and image conditioning, because those are the controls that turn a music demo into a usable product.

Google's Lyria 3 Brings Song Generation to the Gemini API

Model scope

Prompt control

Multimodal input and output

API surface

Safety and product constraints

Platform position

Keep Reading

How to Implement Event-Driven Webhooks in the Gemini API

Gemini API Gains Streaming Voice Translation in 70 Languages

Google Dreambeans Curates Personal Data Into 14 Daily Cartoons

Google Ships 9 Gemini Omni Demos Alongside 3.5 Flash

Project Canvas Renders Vector UI via Gemini 3.0 Ultra