ChronoEarth-492K: A Large Scale and Long Horizon
Spatiotemporal Hyperspectral Earth Observation
Dataset and Benchmark

Haozhe Si1, Yuxuan Wan2, Yuqing Wang2, Minh Do1, Han Zhao2
1Dept. of Electrical and Computer Engineering, 2Siebel School of Computing and Data Science

University of Illinois Urbana-Champaign

Abstract

Hyperspectral imaging (HSI) provides dense spectral information for the Earth's surface, enabling material-level understanding of land cover and ecosystem dynamics. Despite recent progress in hyperspectral self-supervised learning (SSL), existing datasets remain temporally shallow, limiting the development of long-horizon spatiotemporal modeling. To address this gap, we introduce ChronoEarth-492K, the first large-scale, temporally calibrated hyperspectral SSL dataset built upon NASA's EO-1 Hyperion mission — the world's longest continuous hyperspectral archive (2001–2017). ChronoEarth-492K comprises 492,354 radiometrically harmonized patches across 185,398 global locations over 17 years, with 28,786 sites containing multi-temporal sequences (≥3 observations) that enable both short- and long-horizon temporal analysis. Building on this foundation, we establish the ChronoEarth-Benchmark, a unified evaluation suite spanning static, short-horizon, and long-horizon temporal tasks, constructed from six open-source geospatial products covering land cover, crop type, forest dynamics, and soil properties. We further introduce a standardized evaluation protocol and report extensive baseline results across state-of-the-art hyperspectral foundation models.

Key Contributions

  • 1
    ChronoEarth-492K dataset — the first large-scale hyperspectral SSL dataset offering 17 years of temporally calibrated global observations from NASA EO-1 Hyperion (2001–2017), with 492,354 patches at 30 m resolution across 9 global regions.
  • 2
    ChronoEarth-Benchmark — a unified evaluation suite with static, short-horizon, and long-horizon temporal tasks across six geospatial products, with leak-proof spatial splits designed for realistic OOD evaluation.
  • 3
    Extensive baselines — comprehensive empirical evaluation of state-of-the-art hyperspectral foundation models (SpectralViT, LESSViT, DOFA, HyperSigma, SatMAE, DINOv3) under static, temporal, and cross-satellite transfer settings.

Data Collection & Processing Pipeline

ChronoEarth-492K is built from the complete NASA EO-1 Hyperion Level-1T archive (May 2001 – March 2017), the longest-running spaceborne hyperspectral mission to date. The pipeline consists of four stages:

🛰️
1. Data Acquisition
All Level-1T terrain-corrected Hyperion scenes with <10% cloud cover are queried from the USGS EROS archive via the M2M API. Nine continental regions are defined to ensure geographic balance and limit oceanic coverage.
🌈
2. Spectral Harmonization
Each scene contains 242 raw bands. Following established protocols, unstable and low-signal bands (water absorption at ~1400 nm and ~1900 nm, instrument noise bands) are removed, retaining 155 spectrally consistent bands across all scenes.
🗺️
3. Global Patching & Indexing
Scenes are resampled to 30 m and divided into non-overlapping 128×128 pixel patches aligned to a fixed UTM-zone global grid. Each patch receives a deterministic UID encoding its UTM zone, column, and row, enabling consistent cross-timestamp alignment.
⏱️
4. Temporal Sequence Construction
Patches sharing the same UID but different timestamps are grouped into spatiotemporal sequences. This yields 56,491 locations with ≥2 observations and 28,786 locations with ≥3 timestamps, supporting both short- and long-horizon modeling.
Global patching and spatiotemporal indexing pipeline

Figure: Global Patching and Spatial Alignment. (a) Raw EO-1 Hyperion observation. (b) Global patching using a UTM-zone–specific grid, with each patch assigned a unique spatial UID. (c) Spatial alignment of patches sharing the same UID across multiple observations. (d) Formation of spatiotemporal sequences by aggregating aligned samples from different timestamps.

Benchmark Label Processing

To construct the ChronoEarth-Benchmark, we align six geospatial label products with the hyperspectral imagery using the UID-based spatial index. Each label layer is projected to the corresponding UTM grid, and all valid location–label pairs are retained as supervision targets — enabling multi-year temporal task construction. Two quality filters are applied:

  • Class filtering: Labels with frequency below 1% are grouped into a background class to reduce noise from rare categories.
  • Entropy filtering: Patches with normalized Shannon entropy below τ = 0.1 are discarded, retaining only semantically informative samples.

Leak-proof train/val/test splits are generated by a distance-aware grouping strategy based on EO-1 orbital swaths. Patches from the same acquisition are grouped into spatial units, and overlapping units are merged into connected components. Splits are assigned at the component level, inducing a spatial OOD setting where test samples come from geographically distinct regions.

Dataset Statistics

492K
Total patches
185K
Unique locations
28.8K
Multi-temporal sites (≥3 obs.)
17 yrs
Temporal span (2001–2017)
155
Spectral bands
30 m
Spatial resolution
9
Global regions
128²
Patch size (pixels)

Global Coverage

Global distribution of ChronoEarth-492K

Figure: Global spatial coverage of ChronoEarth-492K. Red markers indicate hyperspectral observation locations distributed across Africa (AF), Arctic (AC), East Asia (EA), Europe (EU), Latin America (LA), North America (NA), Oceania (OC), Southeast Asia (SEA), and Southwest Asia (SWA).

Temporal Distribution

Temporal distribution statistics

Figure: Temporal distribution of ChronoEarth-492K. (a) 69.5% of locations have a single observation; 28,786 locations have ≥3 timestamps enabling long-horizon modeling. (b) Time gaps between consecutive acquisitions are broadly distributed, from days to years. (c) Temporal coverage span per location — over one-third of multi-temporal sites span more than 2 years.

Spectral Groups

Group Band IDs Wavelength Range # Channels
VNIRB010 – B057447 – 925 nm48
SWIR1B081 – B097952 – 1114 nm17
SWIR2B101 – B1191154 – 1336 nm19
SWIR3B134 – B1641487 – 1790 nm31
SWIR4B182 – B2211971 – 2365 nm40
All447 – 2365 nm155

Comparison with Existing Hyperspectral SSL Datasets

Dataset # Images Patch Size Sensor Bands GSD Temporal Coverage
HySpecNet-11k ~11k128×128EnMAP20230 m
MSST ~20k64×64EnMAP20030 m
HSIHybrid ~4M9×9MultipleVar.
HyperGlobal-450K ~447k64×64EO-1 & GF-5B242/33030 m
SpectralEarth ~538k128×128EnMAP20230 m 2 Years
ChronoEarth-492K ours ~492k128×128EO-115530 m 17 Years

ChronoEarth-Benchmark

ChronoEarth-Benchmark integrates six open-source geospatial annotation products, covering land cover, crop type, forest dynamics, and soil properties across multiple continents. Three task types are defined:

Static

Single hyperspectral observation paired with the same-year label. Segmentation and multi-label classification.

Short-Horizon (SH)

Multiple temporally adjacent observations for the same label (T ≤ 4). Models aggregate short-term context under full supervision.

Long-Horizon (LH)

Future prediction: observations from 2001 to year n predict labels at year n+1. Tests temporal forecasting under partial supervision.

Dataset Region Classes Task Type Static samples SH samples LH samples
GFC Global1 Forest Change Detection 5,509
ISDASoil Africa8 Soil Classification 17,8081,549
CDL USA12 Crop Type Segmentation 10,1621,762538
CORINE Europe19 Land Cover Classification 19,774
NLCD-S USA16 Land Cover Segmentation 55,4764,0771,812
CLCD China7 Land Cover Segmentation 17,6141,069460

Benchmark Dataset Illustrations

Sample patches from each benchmark dataset

Figure: Example patches from the six benchmark datasets. (a) CDL crop type with soybean, wheat, and corn labels. (b) CLCD Chinese land cover including forest, grassland, and shrubland. (c) NLCD-S US land cover across 16 classes. (d) CORINE European land cover. (e) ISDASoil Africa soil texture classification. (f) GFC forest change detection (black = no change, white = change).

Generalization Settings

Spatial-Temporal Generalization (CORINE). Data is split into temporal ID (2001–2012) and temporal OOD (2013–2017), then further divided by distance-aware spatial grouping. This yields four evaluation conditions: in-distribution (ID), temporal OOD (T-OOD), spatial OOD (S-OOD), and joint spatial-temporal OOD (ST-OOD).

Continental Generalization (GFC). Models are trained on data-rich regions (Europe, North America, East Asia) and evaluated on underrepresented continents (Africa, Latin America, Oceania, Southwest Asia), simulating real-world annotation imbalance.

Benchmark Results

We evaluate six hyperspectral foundation models: SpectralViT, LESSViT, DOFA, HyperSigma, SatMAE, and DINOv3 (adapted), all at ViT-Base scale. Sup. denotes supervised training from random initialization without self-supervised pretraining. Bold = best per column; italic red ↓ = performance degrades when more frames are added.

Static Tasks

Static Task Results

Self-supervised pretraining on ChronoEarth consistently improves over supervised counterparts. LESSViT achieves the best overall performance, demonstrating the benefit of architectures that explicitly model spatial–spectral interactions.

Method CLCD
mIoU ↑
CDL
mIoU ↑
NLCD-S
mIoU ↑
ISDASoil
mAP ↑
DOFA47.7712.6024.0851.20
HyperSigma41.5311.4523.0554.50
DINOv347.1011.6524.4650.90
SatMAE41.2511.9422.2349.80
ChronoEarth Pretrained
Sup. SpectralViT48.0115.5029.1057.79
SpectralViT53.2920.8730.8057.70
Sup. LESSViT37.663.4919.0550.15
LESSViT54.8423.9133.5956.02
Generalization

CORINE: Spatial-Temporal Generalization (mAP ↑)

Performance degrades progressively from ID → T-OOD → S-OOD → ST-OOD, with spatial shift introducing the largest drop. SpectralViT and LESSViT pretrained on ChronoEarth achieve the highest scores across all OOD settings.

Method IDT-OODS-OODST-OOD
DOFA70.0862.7757.8551.80
HyperSigma59.9958.0954.7549.18
DINOv358.9354.8850.8046.00
SatMAE67.2658.6955.7747.20
ChronoEarth Pretrained
Sup. SpectralViT74.1463.3060.4855.21
SpectralViT83.9868.5264.1556.19
Sup. LESSViT55.4751.3449.0147.20
LESSViT84.0768.2262.5258.56

GFC: Cross-Continental Generalization

All methods degrade from ID to OOD, reflecting the difficulty of cross-continental transfer under annotation imbalance. SpectralViT outperforms LESSViT in this sparse-supervision setting, where strong spatial aggregation matters more than fine-grained spectral modeling.

Method ID OOD (Cross-Continental)
mIoU ↑F1 ↑ mIoU ↑F1 ↑
DOFA16.9929.0512.7722.65
DINOv315.9127.4511.5120.64
SatMAE9.5617.452.965.74
ChronoEarth Pretrained
Sup. SpectralViT23.2637.7416.0427.65
SpectralViT29.9046.0419.3832.47
Sup. LESSViT0.000.000.000.00
LESSViT19.2932.3416.1327.78
Cross-Satellite Transfer

Cross-Satellite Generalization (SpectralViT backbone)

Models pretrained on ChronoEarth (EO-1 Hyperion) are fine-tuned and evaluated on EnMAP-based downstream tasks from SpectralEarth. Despite originating from a different and decommissioned sensor, ChronoEarth representations match or exceed those trained on in-domain EnMAP data.

Pretraining Data BDFORET
mIoU ↑
BNETD
mIoU ↑
EuroCrops
mIoU ↑
CORINE
mAP ↑
CDL
mIoU ↑
SpectralEarth (EnMAP)76.3049.4669.3475.3377.44
ChronoEarth (ours)76.3944.3469.8279.0273.66
Temporal Tasks

Short-Horizon Tasks (mIoU / mAP ↑)

Comparing T=1 (static) vs. T≤4 (multi-frame). Adding temporal context generally improves performance. The temporally pretrained SpectralViT consistently outperforms max-pooling and supervised attention aggregation, demonstrating the value of temporal SSL.

Method CLCD (mIoU ↑) CDL (mIoU ↑) NLCD-S (mIoU ↑) ISDASoil (mAP ↑)
T=1T≤4 T=1T≤4 T=1T≤4 T=1T≤4
DOFA max 39.8839.76 ↓ 12.6814.88 14.9618.25 52.8050.63 ↓
DINOv3 max 40.3939.27 ↓ 10.8013.51 11.0716.98 47.3248.89
SatMAE max 34.8837.78 12.3412.41 15.1416.83 49.5350.99
LESSViT max 51.1954.69 24.8026.15 30.5831.02 57.3358.04
SpectralViT variants (ChronoEarth pretrained)
SpectralViT max 43.3744.55 16.4720.07 19.9019.82 ↓ 56.4956.72
SpectralViT attention 43.77 18.97 22.88 48.98
SpectralViT temporal SSL 45.57 22.64 23.59 54.14

Long-Horizon Tasks (mIoU ↑)

Evaluating future state prediction under increasing temporal history (T≤2, T≤4, T≤8). Longer history generally helps for CLCD and NLCD-S (gradual land-cover change), while CDL (crop type) shows less stable gains — longer history may introduce outdated or noisy signals for time-sensitive targets. Temporal SSL pretraining provides consistent gains.

Method CLCD (mIoU ↑) NLCD-S (mIoU ↑) CDL (mIoU ↑)
T≤2T≤4T≤8 T≤2T≤4T≤8 T≤2T≤4T≤8
DOFA max 25.5135.3438.97 18.7619.5122.64 6.0210.2710.44
DINOv3 max 20.2633.7237.04 18.1420.3421.83 5.259.3511.22
SatMAE max 20.0031.2529.85 ↓ 13.6618.2219.40 5.4310.1911.80
LESSViT max 38.8050.0854.64 28.3633.7535.52 9.9718.8314.43 ↓
SpectralViT variants (ChronoEarth pretrained)
SpectralViT max 35.8840.7049.25 25.6728.4430.11 11.9513.6011.37 ↓
SpectralViT attention 24.2832.6337.75 25.3126.9727.37 7.437.12 ↓4.53 ↓
SpectralViT temporal SSL 43.6151.2655.61 37.3439.6940.76 5.9210.359.55 ↓

Citation

If you find ChronoEarth-492K useful in your research, please cite our paper:

@misc{si2026chronoearth492klargescalelong, title={ChronoEarth-492K: A Large Scale and Long Horizon Spatiotemporal Hyperspectral Earth Observation Dataset and Benchmark}, author={Haozhe Si and Yuxuan Wan and Yuqing Wang and Minh Do and Han Zhao}, year={2026}, eprint={2605.15666}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2605.15666}, }