Google Photos Adds 3D-Aware Re-composition via Auto Frame
Google has updated the Auto frame feature in Google Photos with a 3D-aware generative diffusion model to allow post-capture camera angle adjustments.
On April 22, 2026, Google Research published the technical architecture behind a new 3D-aware generative AI technique for post-capture photo framing. Authored by Marcos Seefelder and Pedro Velez, the system is now live in Google Photos as a major update to the Auto frame feature. The approach interprets standard 2D photos as 3D scenes, allowing the camera position to be moved automatically within a virtual space.
Spatial Understanding and Generative Filling
The system relies on machine learning models to detect face positions and the 3D orientations of subjects. It constructs a 3D point map to infer the original camera’s spatial layout and parameters. When the virtual camera angle shifts, previously hidden areas of the scene are exposed.
To fill these empty spaces, Google uses a generative latent diffusion model. The model was trained on an internal dataset of image pairs with known camera parameters. It learns to reconstruct a scene from one view into another by estimating the 3D point map of the first view and projecting it onto the second. This maintains visual consistency and parallax in ways that standard cropping cannot.
Implementation in Google Photos
This technology operates inside Google Photos as a secondary rendition option when a user applies the Auto frame feature to eligible portrait images. It specifically corrects distortion in selfies caused by wide-angle lenses, where features closest to the lens appear unnaturally large.
The tool also enables post-capture adjustments to the height or lateral angle of the camera to improve framing, such as centering a face or capturing more of the background. Google presents this as a single-action improvement. Users receive the re-composed image without needing to engage in manual prompt engineering.
Moving From 2D to Spatial Generation
Treating photos as 3D environments represents a shift in consumer image editing workflows. If you build visual editing software or multi-agent systems for media processing, automating the AI inference behind spatial reconstruction removes significant friction. You should evaluate how generative latent models can power digital re-shoots rather than relying solely on static pixel manipulation.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
Build Real-Time Voice Agents with Cloudflare Agents SDK
Learn how to integrate low-latency voice interactions into your AI agents using Cloudflare's new @cloudflare/voice package and Durable Objects.
AWS SageMaker adds NVIDIA Blackwell G7e inference instances
Amazon SageMaker AI now offers G7e instances on NVIDIA RTX PRO 6000 Blackwell GPUs, with 96GB memory and 2.3x faster inference over G6e.
Nano Banana 2: Putting Your Own Photos Inside Gemini Images
Google's Nano Banana 2 powers personalized image generation in the Gemini app, letting users include themselves and loved ones using their Google Photos library.
Developer Claims to Crack Google SynthID AI Watermarking
A new open-source tool dubbed 'reverse-SynthID' claims to bypass Google DeepMind’s invisible watermarks using signal processing and spectral analysis.
Runway Announces $10M Fund for Early-Stage AI Startups
Runway formalizes its venture arm with a $10 million fund and Builders program to support early-stage startups using its video intelligence infrastructure.