Cosmos 3 Open Omnimodel Merges World Simulation and Action
NVIDIA released Cosmos 3, an open-weight omnimodel that unifies vision reasoning, world simulation, and action prediction for physical AI applications.
NVIDIA released Cosmos 3 at GTC Taipei, an open omnimodel built specifically for physical AI workloads. The system unifies vision reasoning, world simulation, and action prediction into a single architecture. This combined pipeline compresses the training and evaluation cycles for robotics and autonomous systems from months to days.
Mixture-of-Transformers Architecture
Cosmos 3 abandons standard single-model designs in favor of a Mixture-of-Transformers (MoT) architecture. The system uses a dual-tower approach to process multimodal inputs simultaneously.
The Reasoner Tower acts as an autoregressive vision-language model. It interprets text, images, and video to extract motion, spatio-temporal relationships, and object interactions. The Generation Tower is a diffusion-based block that receives context from the reasoner to output physically grounded predictions. These outputs include predictive video sequences and robot-task trajectories.
This split architecture allows the model to generate up to 30 seconds of predictive video based on text or visual inputs. Autonomous systems can evaluate the simulated physical consequences of an action before executing it in the real world.
Model Variants and Hardware Targets
NVIDIA released three distinct versions of the model targeting different stages of the robotics development lifecycle.
| Model | Parameters | Target Hardware | Primary Use Case |
|---|---|---|---|
| Cosmos 3 Nano | 8B (8B Reasoner + 8B Generator) | NVIDIA RTX PRO 6000 | Efficient workstation inference |
| Cosmos 3 Super | 32B (32B Reasoner + 32B Generator) | NVIDIA Hopper and Blackwell | High-fidelity synthetic data generation |
| Cosmos 3 Edge | Not specified | Edge deployment hardware | Real-time on-device inference |
The heavy Super variant is built for research and synthetic data generation, allowing developers to create training material for smaller models. The upcoming Edge variant will target local robotics deployment environments, similar to the hardware targets for Nemotron 3 Nano 4B.
Benchmark Performance
At launch, Cosmos 3 established new baselines across physical AI and multimodal leaderboards.
The model ranked first among open models on VANTAGE-Bench for vision-language reasoning on real-world fixed-camera footage. It also secured the top position on Artificial Analysis leaderboards for both Text-to-Image and Image-to-Video generation without audio. NVIDIA reports additional category leads on PAI-Bench, R-Bench, Physics-IQ, and RoboLab.
Ecosystem Support and Datasets
Alongside the model checkpoints on Hugging Face, NVIDIA released open code, post-training recipes, and six synthetic datasets. These datasets provide immediate training foundations for embodied robot scenes, autonomous driving, warehouse operations, and human motion simulations.
If you previously built workflows around Cosmos Predict 2.5, the new architecture requires updating your inference pipelines to handle the dual-tower MoT outputs.
NVIDIA also formed the NVIDIA Cosmos Coalition to establish deployment standards for physical AI. Launch partners include Agile Robots, Black Forest Labs, Runway, Skild AI, LTX, and Generalist. Early industry adopters actively deploying the model include Samsung, LG Electronics, Li Auto, and Doosan Robotics.
For production environments, Cosmos 3 is packaged as NVIDIA NIM microservices. Development teams can pull the optimized containers to deploy the model immediately across local GPU clusters or cloud infrastructure.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
How to Fine-Tune Cosmos Predict 2.5 for Robotics With LoRA
Learn how to adapt NVIDIA's 2B and 14B Cosmos Predict 2.5 world foundation models using parameter-efficient fine-tuning methods like LoRA and DoRA.
Cascaded Speech Pipeline Brings Reachy Mini Inference Local
Hugging Face released an offline conversational stack for the Reachy Mini robot that replaces cloud APIs with a local pipeline built on Gemma 4 and Qwen3-TTS.
GENE-26.5 Gives Hardware-Agnostic Robots Human-Scale Dexterity
The French robotics startup Genesis AI has released GENE-26.5, a hardware-agnostic foundation model paired with a custom human-scale robotic hand.
How to Get Started with Open-H, GR00T-H, and Cosmos-H for Healthcare Robotics Research
Learn how to use NVIDIA's new Open-H dataset and GR00T-H and Cosmos-H models to build and evaluate healthcare robotics systems.
Untrained Tasks Now Possible via π0.7 Robotic Brain
Physical Intelligence unveils π0.7, a foundation model enabling robots to solve novel, complex problems through compositional generalization.