OlmoEarth v1.1 Tops DINOv3 in Remote Sensing Benchmarks
Ai2 updated its multimodal Earth observation models with OlmoEarth v1.1, bringing enhanced training efficiency and state-of-the-art benchmark performance.
The Allen Institute for AI has released OlmoEarth v1.1, an updated family of multimodal foundation models designed specifically for Earth observation tasks. Building on the initial v1.0 release from late 2025, version 1.1 focuses on improved training efficiency and better handling of the varied temporal resolutions inherent to raw satellite imagery.
Refined SLIM Architecture
The core technical shift in v1.1 is the introduction of a refined “Stable Latent Image Modeling” (SLIM) objective. This change reduces the computational overhead required during downstream fine-tuning while allowing the model to better manage missing timesteps in monthly time series data. The underlying architecture is Pareto-optimized, explicitly balancing multiply-accumulate operations (MACs) against spatial performance. Developers can match the model size to their available parameter budget across four official variants:
- OlmoEarth-v1.1-Nano: ~1.4M parameters
- OlmoEarth-v1.1-Tiny: ~6.2M parameters
- OlmoEarth-v1.1-Base: ~90M parameters
- OlmoEarth-v1.1-Large: ~300M parameters
Modalities and Benchmark Performance
OlmoEarth v1.1 is trained on a combination of Sentinel-1, Sentinel-2, and Landsat modalities. This raw data is enriched by six derived mapping layers, including OpenStreetMap, WorldCover, and the USDA Cropland Data Layer.
In internal testing against 12 competing foundation models, including Meta’s DINOv3 and IBM/NASA’s Prithvi, the Ai2 models maintained state-of-the-art performance across diverse remote sensing tasks. For teams working with vector embeddings, OlmoEarth achieved the best performance on 15 out of 24 tasks evaluated using k-nearest neighbors (kNN) and linear probing. Under full end-to-end fine-tuning conditions, it secured the highest scores on 19 out of 29 tasks.
Platform Integrations and Custom Exports
Ai2 simultaneously updated the OlmoEarth Platform, a no-code ecosystem utilized heavily by NGOs and government agencies. The platform now supports custom embedding exports as Cloud-Optimized GeoTIFFs (COGs) directly from the OlmoEarth Studio, allowing immediate ingestion into external GIS software.
Furthermore, the platform handles automated data acquisition and preparation for the v1.1 models. This abstracts away the traditional complexities of fine-tuning pipelines, enabling users to localize models for specific geographic regions rapidly. Specialized, task-specific versions of the model pre-tuned for mangrove classification, crop-type mapping, and forest-fire fuel moisture content (LFMC) prediction were also included in the launch.
The models, weights, and training code are hosted on Hugging Face under the OlmoEarth Artifact License, while the pretraining stack is available in the allenai/olmoearth_pretrain GitHub repository. If you build spatial analysis or environmental monitoring systems, deploying the Nano or Tiny variants provides an immediate baseline for executing state-of-the-art classification on edge hardware with strict compute constraints.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
Train Multimodal Sentence Transformers for Visual Retrieval
Learn how to finetune multimodal embedding and reranker models for text, image, and audio using the updated Sentence Transformers library.
Gemma 4 Arrives With Full Apache 2.0 License
Google releases Gemma 4, a new generation of open models optimized for advanced reasoning, agentic workflows, and high-performance edge deployment.
Single-Weight Gemini Omni Unifies Multimodal Video Generation
Google's Gemini Omni collapses text, image, audio, and video generation into a single set of model weights to enable conversational video editing.
8K Context Reranking Hits Hugging Face With Ettin Cross-Encoders
Hugging Face released six open-source cross-encoders under the Ettin Reranker family with an 8,192-token context window for long-form document retrieval.
PaddleOCR 3.5 Adds Transformers Backend and Browser Inference
The PaddleOCR 3.5 update decouples the toolkit from the PaddlePaddle framework by adding a native Transformers backend and client-side browser execution.