Evaluation of Independent Reference Datasets for Validating Land Cover and Change

  • Sebastien Chognard

Student thesis: Doctoral ThesisDoctor of Philosophy


Land cover and land cover change maps generated from Earth observation data are critical for addressing local, national and international requirements and obligations, from land management through to policy. To date, many land cover maps have been produced regionally (e.g., Corine Land Cover) and globally (e.g., WorldCover, Global Land Cover) and at different spatial resolutions for one or more years using broad taxonomic classes. Many countries have also developed their own products that are often more detailed, of finer spatial resolution and better align with local to national requirements. Change products are often generated from land cover maps, but many are based on multi-annual comparisons and are frequently domain-specific. Regardless of their coverage, robust and reliable assessments of the accuracy of land cover and change products are needed, to give confidence in their use by end-users and to estimate and identify where improvements might be made. This requirement has been increasingly recognised and, in the vast majority of cases, has led to maps being published alongside estimators of their accuracy. However, a review of the most cited papers on thematic map production and assessment showed that about 87% were validated using reference datasets generated with an active sampling strategy and 70% used experts who were often associated with the teams who created the maps.
Whilst this latter approach may have increased the reliability of these reference data, other criteria such as including independence may have been compromised. Most efforts directed towards map validation in an active approach have assigned an appropriate reference label at pre-defined sample locations through visual interpretation of image data (e.g., aerial photography), with these generally acquired at a resolution higher than that of the thematic product being assessed. Whilst this approach allows labels to be assigned to sample locations, ground observations are often needed when additional information is required (e.g., peat depth). The validation of thematic maps produced at local to global scales and over time (in the case of land cover change) also requires a sizeable but appropriate quantity and spatial distribution of reference data. However, to reduce the burden of reference dataset generation, particularly across large and dynamic areas, greater consideration is being given to Volunteered Geographic Information contributed by people, from experts to non-experts. The main aim of this research was to investigate the feasibility and requirements of passive approaches and specifically those that use i) volunteers with varying degrees of expertise or ii) automated image extraction procedures and whether these can provide independent reference datasets suitable for validating land cover and/or
change maps that are comparable to those generated using an expert-driven active approach. Particular focus was on establishing whether those generated passively met the requirements of relevance, representation and genericity, with this considering the number, diversity and spatial distribution of records as well as the external validity of the dataset. Consideration was also given to the impact of differing levels of expertise required for ground validation of land cover and land cover change products. Passive independent reference datasets were generated through a) a mobile application (Earthtrack), with this developed to allow volunteers to record both land cover and change from field observations, and b) deep learning of very high-resolution imagery obtained from aircraft, drone and satellite sensors. To ensure the exter-nal validity of the datasets (i.e., capacity to generalise across a region of interest), well-known accuracy metrics (derived from error matrices) were compared with an active approach, wherein reference datasets were generated independently through visual interpretation of imagery (i.e., based on random stratified sampling) and by experts. The study focused on Wales (United Kingdom) in its entirety as 10 m spatial resolution land cover maps were available for 2018, 2019, 2020 and 2021, with these generated from Sentinel-1 C-band radar and Sentinel-2 optical data and according to the Food and Agriculture Organisation (FAO) Land Cover Classification System (LCCS). Within Wales, focused experiments on deep learning were centred on 216km2 region located in Mid-Wales, on Aberystwyth and Sandy Haven, because of the availability of very high-resolution imagery, and the Lower Neuadd Reservoir and surrounds, Brecon Beacons National Park, as rapid and recent (post-2018) changes in land cover had occurred. Comparison of accuracy metrics indicated these were similar regardless as to whether the reference datasets were generated passively or actively, but only following removal of records that were deemed incorrect or ambiguous using experts to filter the dataset. In both cases, the labels assigned were reliable (e.g., > 90% Overall Accuracy) and the amount and diversity of records (in total and per category) was more than three times that suggested in good practice guidelines. Biases were however observed in both datasets, with these relating to i) the preferences of users to visit certain land cover types and sample locations that were more homogeneous, ii) the retention of sample points selected through deep learning to also be in more homogeneous land covers, and iii) the availability and timing of very high-resolution image acquisitions affecting the spatial location of sample points. A greater level of expertise was needed to interpret changes in land cover, particularly given that human activities and/or natural events or processes leading to change are difficult to discern in the field. Given that changes are less frequent and are commonly concentrated in small areas, the need for an active rather than passive strategy for ground-level recording was established as experts were able to discern more than three times the number of categories (based on impacts and driving pressures) compared to those with lower levels of expertise, with this assisted by domain-specific or
general knowledge of the area and prior reference to time-series of Earth observation data. In conclusion, the study shows that independent reference datasets generated using passive approaches, whether via volunteers or automated extraction from Earth observation data, can be used to validate maps of land cover. However, consideration needs to be given to biases associated with sampling homogeneous areas and the spatial distribution of records. Robust procedures are also needed to ensure that reference datasets maintain high reliability. Furthermore, they need to be sufficient in quantity and spatial distribution and representative of the landscapes of interest. Whilst experts can be part of this volunteer community, the study established that Volunteered Geographic Information can be used alone to support validation of land cover maps as long as the main requirements of a reference dataset are met. For validating change categories, implementing an active strategy with experts or semiexperts is advocated given their spatial and temporal diversity and complexity.
Date of Award2022
Original languageEnglish
Awarding Institution
  • Aberystwyth University
SupervisorRichard Lucas (Supervisor) & Pete Bunting (Supervisor)

Cite this