Spatial Extrapolation

Semi-supervised learning for the spatial extrapolation of soil information

Table of Contents

Video Abstract

Motivation

The main objective of this study was to evaluate the ability and accuracy of SSL, compared to SL, in extrapolating soil taxonomic classes in arid regions. To test the hypothesis, we selected two study areas, one as a reference and the other as a target area. We tested whether SSL had the ability to extrapolate soil classes for target areas using soil data from just a reference area and covariates derived from both reference and target areas. To the best of our knowledge, this is the first research that was designed to test SSL for extrapolating soil information.

Methods

In this study, four modelling approaches were employed to assess the ability of SLR→T and SSLR→T for extrapolating soil information from reference to target areas.

In both areas, SLR and SLT were applied to estimate soil classes (i.e., Spatial interpolation). Accordingly, two maps were generated by the highest possible accuracy for further comparisons. Then, the SLR→T that was trained in a supervisory manner was applied to the target area (i.e., Spatial extrapolation). Finally, a SSLR→T that was trained in a semi-supervisory manner was used to estimate soil classes in the target area. Specifically, the SSLR→T method used soil observations from the reference area and covariate information from both areas. Note that soil observations from the target area was only used to assess the performance of SLR→T and SSLR→T for extrapolating soil information from the reference to the target areas.

Relationships between soil classes

The motivating idea behind DSM approaches is the relationships between soils and their spatially correlated covariates. The distribution of soil classes (i.e., suborders) based on soil observations and in relation to covariates in the reference and target areas is shown in the alluvial plots Fig. 5 (Brunson, 2020). Although the relationships are complicated and difficult to follow, there are some overall trends that can be inferred. The complex relationships are expected because soil classes are the result of the action of multiple soil-forming factors over long periods of time at different scales.

Dissimilarities in covariates between the two areas

The dissimilarity index (DI) for the covariates between the reference and target areas indicate the low and high dissimilarity between two areas. The DI is based on distances to the training data in the multidimensional covariate space (Meyer and Pebesma, 2021). DI can take values ranging from 0 to ∞. In the area with identical covariates, the DI is close to 0, and with increasing values of the DI, the diversity of the covariates between two areas (reference and target areas) increases.

Spatial interpolation vs spatial extrapolation

We should acknowledge lower accuracy and higher uncertainty of spatial extrapolation (SSLR→T) compared to the spatial interpolation (SLT) in the target area. This further indicated the impossibility of matching soil-forming factors between reference and target areas. Note that spatial extrapolation is typically developed to transfer soil information from the reference area with soil data to the target area with no or just a small number of soil data. Therefore, to have a fair accuracy assessment of the SSLR→T for the spatial extrapolation, SLT was trained with different sample sizes (25, 50, 75, and 100 %) in the target area. Our results indicated that there was a positive relationship between the model accuracy and sample size. Unexpectedly, SLT_50 trained by only 50 % of samples (90 samples) indicated lower accuracy compared to SSLR→T (overall accuracy of 0.67 vs 0.65). Furthermore, the SSLR→T could produce a higher accuracy of prediction compared to SLT_25 trained by 25 % of samples (overall accuracy of 0.52 vs 0.65). This indicated the higher performance of the spatial extrapolation compared to the spatial interpolation methods trained with small sample size. However, using SLR→T for spatial extrapolation (overall accuracy of 0.40) can never reach the accuracy of the spatial interpolation methods.

Predicted maps

The predicted maps by interpolation and extrapolation methods are shown.

Conclusion

In the current research, SSLR→T was successfully developed for extrapolating soil classes from a reference area to a target area. The results were further compared with those obtained by SLR→T spatial extrapolation. Validation statistics showed the higher accuracy and lower uncertainty of SSLR→T compared to the SLR→T. Importantly, the final spatial map of soil classes obtained by SSLR→T showed relatively good agreement with that obtained by spatial interpolation (as a benchmark for comparison). We obtained a lower accuracy and a higher uncertainty for SSLR→T compared to spatial interpolation (SLT) for mapping soil classes in the target area. This was expected because of the poor matching of soil-forming factors between reference and target areas. We showed that SSLR→T trained using soil observations from the reference area and covariates from the reference and target areas could learn some patterns from outside of their covariate spaces. This shows the high generalization ability of SSLR→T for extrapolating soil information from the reference to the target areas, indicating the effectiveness of SSLR→T for spatial extrapolation.

Talk

I gave a talk on spatial extrapolation:

Ruhollah Taghizadeh
Ruhollah Taghizadeh
Postdoc Researcher

My research interests include soil mapping, data science, and machine learning