Global Land Cover Training Dataset (GLanCE) for Multi-Decadal Remote Sensing Analysis

The Global Land Cover (GLanCE) training dataset is a comprehensive resource designed to support the analysis of land cover and land use change from regional to global scales. Covering the period from 1984 to 2020, this dataset offers a 30-meter spatial resolution and is engineered to represent diverse biogeographic regions across the planet. It comprises nearly 2 million training units, each providing up to 23 distinct land cover attributes. This database is particularly valuable for training machine learning models on cloud computing platforms like Google Earth Engine (GEE), facilitating the mapping of both abrupt disturbances and gradual environmental transitions.

Dataset Schema and Attributes

The training database includes metadata for geographical location, temporal range, and various classification levels. Below are the primary columns available in the feature collection:

Attribute Name Description
Lat / Lon Geographic coordinates of the sample pixel.
Start_Year / End_Year The temporal range (1984-2020) for the specific segment.
Glance_Class_ID_level1 Broad land cover categories (1-7).
Glance_Class_ID_level2 Detailed land cover sub-classes (1-13).
Leaf_Type Classification for trees: broadleaf, needleleaf, or mixed.
Impervious_Percent Density of developed areas: Low (0-30%), Medium (30-60%), High (60-100%).
Veg_Density Canopy density for woody vegetation.
Change Boolean (0/1) indicating if a land cover change occurred.
LC_Confidence Interpreter confidence score (1 to 3).
Continent_Code Numeric code for the continent (1: N. America, 2: S. America, etc.).
Dataset_Code Source indicator (e.g., STEP, LCMAP, MapBiomas).
Glance_ID Unique identifier for each sample record.

Classification Hierarchy

GLanCE utilizes a two-tier classification system to provide both coarse and granular land cover information.

Level 1 Category Level 2 Sub-category Description
Water (1) Water (1) Permanent water bodies including lakes and oceans.
Ice/snow (2) Ice/snow (2) Areas with >50% perennial ice or snow cover.
Developed (3) Developed (3) Urban structures and functional related land.
Barren (4) Soil (4), Rock (5), Sand (6) Areas with <10% vegetation.
Trees (5) Deciduous (7), Evergreen (8), Mixed (9) Woody vegetation with >30% cover.
Shrub (6) Shrub (10) Vegetation >10% with <30% tree cover.
Herbaceous (7) Grassland (11), Agriculture (12), Moss (13) Non-woody vegetation; includes croplands.

Methodology and Data Quality

The dataset was curated by analysts at Boston University using a suite of Earth Engine-based tools. The process integrated high-resolution imagery from Google Earth, Landsat time-series data (including Tasseled Cap transforms), and street-level photography. Each unit represents a Landsat-scale pixel corresponding to segments identified by the Continuous Change Detection and Classsification (CCDC) algorithm.

Quality assurence was maintained through a multi-stage review process. Samples were cross-validated using machine learning techniques to filter out mislabeled units. Furthermore, the dataset was augmented with existing repositories like STEP and MapBiomas, ensuring a balanced representation of ecosystem types and post-disturbance landscapes.

Implementation in Google Earth Engine

The dataset is hosted as a FeatureCollection in Google Earth Engine. Users can filter the collection by geographic region, time, or specific land cover classes to generate training sets for classification algorithms.

// Load the GLanCE training features
var glanceTrainingData = ee.FeatureCollection("projects/sat-io/open-datasets/GLANCE/GLANCE_TRAINING_DATA_V1");

// Filter for evergreen forest samples in South America
var evergreenSamples = glanceTrainingData.filter(ee.Filter.and(
  ee.Filter.eq('Glance_Class_ID_level2', 8),
  ee.Filter.eq('Continent_Code', 2)
));

// Visualize the distribution of samples
Map.addLayer(evergreenSamples, {color: '006400'}, 'Evergreen Forest Training Units');
print('Total samples found:', evergreenSamples.size());

Data Citation

Stanimirova, R., et al. (2023) A global land cover training dataset from 1984 to 2020. Scientific Data 10, 879. https://doi.org/10.1038/s41597-023-02798-5

License: Creative Commons Attribution 4.0 International Public License.

Tags: Google Earth Engine Global Land Cover LULC Machine Learning Remote Sensing

Posted on Sun, 28 Jun 2026 16:46:10 +0000 by syd