Ai Engineering 2 min read

Google Photos Adds 3D-Aware Re-composition via Auto Frame

Google has updated the Auto frame feature in Google Photos with a 3D-aware generative diffusion model to allow post-capture camera angle adjustments.

On April 22, 2026, Google Research published the technical architecture behind a new 3D-aware generative AI technique for post-capture photo framing. Authored by Marcos Seefelder and Pedro Velez, the system is now live in Google Photos as a major update to the Auto frame feature. The approach interprets standard 2D photos as 3D scenes, allowing the camera position to be moved automatically within a virtual space.

Spatial Understanding and Generative Filling

The system relies on machine learning models to detect face positions and the 3D orientations of subjects. It constructs a 3D point map to infer the original camera’s spatial layout and parameters. When the virtual camera angle shifts, previously hidden areas of the scene are exposed.

To fill these empty spaces, Google uses a generative latent diffusion model. The model was trained on an internal dataset of image pairs with known camera parameters. It learns to reconstruct a scene from one view into another by estimating the 3D point map of the first view and projecting it onto the second. This maintains visual consistency and parallax in ways that standard cropping cannot.

Implementation in Google Photos

This technology operates inside Google Photos as a secondary rendition option when a user applies the Auto frame feature to eligible portrait images. It specifically corrects distortion in selfies caused by wide-angle lenses, where features closest to the lens appear unnaturally large.

The tool also enables post-capture adjustments to the height or lateral angle of the camera to improve framing, such as centering a face or capturing more of the background. Google presents this as a single-action improvement. Users receive the re-composed image without needing to engage in manual prompt engineering.

Moving From 2D to Spatial Generation

Treating photos as 3D environments represents a shift in consumer image editing workflows. If you build visual editing software or multi-agent systems for media processing, automating the AI inference behind spatial reconstruction removes significant friction. You should evaluate how generative latent models can power digital re-shoots rather than relying solely on static pixel manipulation.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading