OlmoEarth arrives as a family of open foundational models designed to turn raw satellite signals into useful, scalable operational intelligence.
Interested in how you can go from images and radar to crop maps, fire alerts, or ship detection with models you can tweak without being an AI expert? This is what OlmoEarth does and why it matters.
What OlmoEarth is and why it stands out
OlmoEarth is a family of multimodal foundational models for Earth observation, trained on large volumes of satellite data. It comes in four sizes with the same architecture and training approach:
- OlmoEarth-v1-Nano (~1.4M parameters) and OlmoEarth-v1-Tiny (~6.2M): fast, cheap inference at scale.
- OlmoEarth-v1-Base (~90M): a balance between accuracy and speed for most use cases.
- OlmoEarth-v1-Large (~300M): better performance on complex tasks.
In addition to code and weights, Ai2 publishes the research, training, and evaluation stack as open artifacts, and offers the OlmoEarth Platform to simplify deployments and customizations.
Architecture and technical approach
OlmoEarth is essentially a vision transformer adapted to time series and multimodal Earth data. Instead of processing only static images, the model works with monthly sequences of satellite patches and converts each portion into a token that includes location, time, and sensor type.
- Modalities: optical images, radar, and contextual maps (for example OpenStreetMap, land cover, canopy height).
- Pretraining: masked modeling of image parts so the model learns to reconstruct or predict hidden areas from visible context. This is self-supervised learning combined with weak supervision using the map layers.
- Tokenization: each token represents a patch with time and sensor metadata, which lets the model reason over space, time, and modality simultaneously.
Put less technically: the model learns to fill in missing image pieces and use auxiliary maps as clues. That produces robust representations you can reuse for many tasks without training from scratch every time.
Data and scale
- Pretraining dataset size: about 10 terabytes, millions of samples at a uniform 10 meters/pixel resolution.
- Temporality: up to 12 monthly timestamps per sample, though the system tolerates missing timesteps and modalities (very common in real satellite data).
This flexibility is key: OlmoEarth handles incomplete data and real-world conditions without breaking the pipeline.
Evaluation and comparisons
OlmoEarth was tested with standard evaluation methods: kNN, linear probing (LP) and supervised fine-tuning (SFT). Results show leading performance in:
- Scene and patch classification
- Semantic segmentation
- Object and change detection
- Regression over time series (for example live fuel moisture)
Compared to recent industrial and academic research models — Meta DINOv3, IBM/NASA Prithvi, IBM Terramind, CROMA, Panopticon, as well as earlier Ai2 models like Satlas and Galileo — OlmoEarth achieves better performance on many benchmarks. Against Google DeepMind AlphaEarth Foundations (AEF), OlmoEarth produced competitive embeddings and, after fine-tuning, substantially outperformed AEF on the tasks tested.
This underscores the advantage of having a platform that makes fine-tuning and model personalization accessible.
Practical applications and deployment
OlmoEarth is not just a paper: it’s already used in real challenges.
- Agriculture and food security: crop-type mapping for smallholders in sub-Saharan Africa, supporting more precise interventions.
- Fire risk assessment: better estimates of live fuel moisture, which improves risk maps and operational decisions.
- Object and vessel detection: lower false positive and false negative rates in optical and radar, useful for maritime surveillance and illegal fishing.
- Coverage and ecosystem mapping: helps prioritize field surveys and allocate restoration funds with more confidence.
Why does this matter operationally? Because OlmoEarth offers competitive accuracy without requiring giant models: lower inference costs, the ability to rerun analyses frequently, and room for continuous improvement.
How it’s used in practice (common workflows)
- Embedding extraction: for fast search and detection pipelines with
kNN. - Linear probing: quickly evaluate the usefulness of representations for a task with limited labels.
- Supervised fine-tuning: adapt the model to a region, sensor, or specific task to maximize performance.
- OlmoEarth Platform: integrates data acquisition, labeling, fine-tuning, and production deployment so you don’t have to build the whole stack from scratch.
Additionally, fine-tuned models are already released for tasks like mangrove classification, crop mapping, and forest fuel classification, developed with regional partners to ease adaptation to new areas.
What’s next and collaborators
The next generation of OlmoEarth will expand modalities (for example weather data) and sectors like humanitarian response. Ai2 thanks key collaborators: Amazon Conservation, African Wildlife Foundation, CGIAR/IFPRI, Global Mangrove Watch, Global Ecosystem Atlas, ITC University of Twente, NASA JPL and NASA Harvest.
Want to use it tomorrow? You can sign up to receive updates, partner access, and explore the Viewer with sample model outputs.
Final reflection
OlmoEarth shows a clear trend: foundational models for Earth observation are moving from lab experiments to operational tools. What’s the most important change? The combination of multimodal pretraining, accessible fine-tuning, and efficient model sizes lets you apply AI to real problems without a prohibitive resource burden.
