The Global Land Cover (GLanCE) training dataset is a comprehensive resource designed to support the analysis of land cover and land use change from regional to global scales. Covering the period from 1984 to 2020, this dataset offers a 30-meter spatial resolution and is engineered to represent diverse biogeographic regions across the planet. It comprises nearly 2 million training units, each providing up to 23 distinct land cover attributes. This database is particularly valuable for training machine learning models on cloud computing platforms like Google Earth Engine (GEE), facilitating the mapping of both abrupt disturbances and gradual environmental transitions.
Dataset Schema and Attributes
The training database includes metadata for geographical location, temporal range, and various classification levels. Below are the primary columns available in the feature collection:
| Attribute Name | Description |
|---|---|
| Lat / Lon | Geographic coordinates of the sample pixel. |
| Start_Year / End_Year | The temporal range (1984-2020) for the specific segment. |
| Glance_Class_ID_level1 | Broad land cover categories (1-7). |
| Glance_Class_ID_level2 | Detailed land cover sub-classes (1-13). |
| Leaf_Type | Classification for trees: broadleaf, needleleaf, or mixed. |
| Impervious_Percent | Density of developed areas: Low (0-30%), Medium (30-60%), High (60-100%). |
| Veg_Density | Canopy density for woody vegetation. |
| Change | Boolean (0/1) indicating if a land cover change occurred. |
| LC_Confidence | Interpreter confidence score (1 to 3). |
| Continent_Code | Numeric code for the continent (1: N. America, 2: S. America, etc.). |
| Dataset_Code | Source indicator (e.g., STEP, LCMAP, MapBiomas). |
| Glance_ID | Unique identifier for each sample record. |
Classification Hierarchy
GLanCE utilizes a two-tier classification system to provide both coarse and granular land cover information.
| Level 1 Category | Level 2 Sub-category | Description |
|---|---|---|
| Water (1) | Water (1) | Permanent water bodies including lakes and oceans. |
| Ice/snow (2) | Ice/snow (2) | Areas with >50% perennial ice or snow cover. |
| Developed (3) | Developed (3) | Urban structures and functional related land. |
| Barren (4) | Soil (4), Rock (5), Sand (6) | Areas with <10% vegetation. |
| Trees (5) | Deciduous (7), Evergreen (8), Mixed (9) | Woody vegetation with >30% cover. |
| Shrub (6) | Shrub (10) | Vegetation >10% with <30% tree cover. |
| Herbaceous (7) | Grassland (11), Agriculture (12), Moss (13) | Non-woody vegetation; includes croplands. |
Methodology and Data Quality
The dataset was curated by analysts at Boston University using a suite of Earth Engine-based tools. The process integrated high-resolution imagery from Google Earth, Landsat time-series data (including Tasseled Cap transforms), and street-level photography. Each unit represents a Landsat-scale pixel corresponding to segments identified by the Continuous Change Detection and Classsification (CCDC) algorithm.
Quality assurence was maintained through a multi-stage review process. Samples were cross-validated using machine learning techniques to filter out mislabeled units. Furthermore, the dataset was augmented with existing repositories like STEP and MapBiomas, ensuring a balanced representation of ecosystem types and post-disturbance landscapes.
Implementation in Google Earth Engine
The dataset is hosted as a FeatureCollection in Google Earth Engine. Users can filter the collection by geographic region, time, or specific land cover classes to generate training sets for classification algorithms.
// Load the GLanCE training features
var glanceTrainingData = ee.FeatureCollection("projects/sat-io/open-datasets/GLANCE/GLANCE_TRAINING_DATA_V1");
// Filter for evergreen forest samples in South America
var evergreenSamples = glanceTrainingData.filter(ee.Filter.and(
ee.Filter.eq('Glance_Class_ID_level2', 8),
ee.Filter.eq('Continent_Code', 2)
));
// Visualize the distribution of samples
Map.addLayer(evergreenSamples, {color: '006400'}, 'Evergreen Forest Training Units');
print('Total samples found:', evergreenSamples.size());
Data Citation
Stanimirova, R., et al. (2023) A global land cover training dataset from 1984 to 2020. Scientific Data 10, 879. https://doi.org/10.1038/s41597-023-02798-5
License: Creative Commons Attribution 4.0 International Public License.