publications
List not exhaustive — see Google Scholar.
2026
-
Rapid prediction of soil hydraulic and physicochemical properties using the Nix Pro color sensorFatemeh Cheshmberah, Ali Zolfaghari, Ruhollah Taghizadeh-Mehrjardi, and 1 more authorGeoderma Regional, 2026Color is the most obvious and easily determined soil property that can provide crucial information about soil composition. The NIX™ Pro color sensor (Nix) presents a user-friendly alternative to the traditional Munsell manual method, with reduced sensitivity to environmental and human factors. Therefore, a new method was developed to use Nix in combination with machine learning to quickly and affordably estimate improved physical, hydraulic, and chemical soil properties. This study extends the application of the Nix sensor from chemical properties to include soil hydraulic properties, providing a rapid, reproducible, and practical prediction tool. The Nix and the Random Forest (RF) algorithm were employed to analyze the spectra of 150 soil samples collected from 0 to 30 cm depth in semi-arid regions of Iran. The results indicated that using the RF algorithm with CIE L*a*b* Color System data from the Nix, the best predictive performance was observed for CaCO₃ (R2 = 0.62, RMSE = 2.25) and field capacity (FC) (R2 = 0.50, RMSE = 4.99). Predictions for clay (R2 = 0.59, RMSE = 7.26), permanent wilting point (PWP) (R2 = 0.62, RMSE = 2.04), and sand (R2 = 0.49, RMSE = 10.08) also showed good agreement with measured values. Bulk density (BD) (R2 = 0.45, RMSE = 0.11) and soil available water (SAW) (R2 = 0.51, RMSE = 4.97) predictions exhibited higher errors relative to the observed ranges. The findings indicate that the Nix sensor’s portability and cost-effectiveness, along with the RF algorithm, make it a practical tool for field applications. It provides quick, reliable, and improved estimates of CaCO₃ and FC, with good performance also observed for clay, PWP, and sand. The results reflect the method’s efficacy, its possible adaptation to other areas with proper adjustment, and its integration into precision agriculture and environmental applications.
-
The Transparency Revolution in Geohazard Science: A Systematic Review and Research Roadmap for Explainable Artificial IntelligenceMoein Tosan, Vahid Nourani, Ozgur Kisi, and 5 more authorsComputer Modeling in Engineering & Sciences, 2026The integration of machine learning (ML) into geohazard assessment has successfully instigated a paradigm shift, leading to the production of models that possess a level of predictive accuracy previously considered unattainable. However, the black-box nature of these systems presents a significant barrier, hindering their operational adoption, regulatory approval, and full scientific validation. This paper provides a systematic review and synthesis of the emerging field of explainable artificial intelligence (XAI) as applied to geohazard science (GeoXAI), a domain that aims to resolve the long-standing trade-off between model performance and interpretability. A rigorous synthesis of 87 foundational studies is used to map the intellectual and methodological contours of this rapidly expanding field. The analysis reveals that current research efforts are concentrated predominantly on landslide and flood assessment. Methodologically, tree-based ensembles and deep learning models dominate the literature, with SHapley Additive exPlanations (SHAP) frequently adopted as the principal post-hoc explanation technique. More importantly, the review further documents how the role of XAI has shifted: rather than being used solely as a tool for interpreting models after training, it is increasingly integrated into the modeling cycle itself. Recent applications include its use in feature selection, adaptive sampling strategies, and model evaluation. The evidence also shows that GeoXAI extends beyond producing feature rankings. It reveals nonlinear thresholds and interaction effects that generate deeper mechanistic insights into hazard processes and mechanisms. Nevertheless, several key challenges remain unresolved within the field. These persistent issues are especially pronounced when considering the crucial necessity for interpretation stability, the demanding scholarly task of reliably distinguishing correlation from causation, and the development of appropriate methods for the treatment of complex spatio-temporal dynamics.
-
Soil moisture retrieval from Sentinel-1: Lessons learned after more than a decade in orbitM Rahmati, A Balenzano, L Bechtold, and 8 more authorsRemote Sensing of Environment, 2026Soil moisture is a critical variable for hydrology, agriculture and climate. However, large-scale soil moisture observation remains difficult due to sparse in situ networks and the inability of optical sensors to capture it under cloud cover. Synthetic aperture radar (SAR) missions, e.g., Sentinel-1, yield unique all-weather, day and night observations with a fine spatial and temporal resolution that makes them of interest for development of global soil moisture monitoring. Consequently, this review discusses the application of C-band SAR observations from the Sentinel-1 satellite mission to estimate high-resolution near-surface soil moisture. First, the importance of SAR backscatter monitoring from Sentinel-1 is emphasized. Next, the current state-of-the-art in soil moisture retrieval from Sentinel-1 is presented. Although considerable progress has been made in near-surface soil moisture retrieval, several limitations remain. Factors such as the effects of vegetation and surface roughness on the signal, sensor and scattering model limitations, spatial and temporal constraints, and uncertainties, e.g. in data assimilation, pose challenges to its usage. While Artificial Intelligence (AI)-based retrieval methods have shown promise, their interpretability, dependence on large datasets, vulnerability to data quality, and computational burden have been major challenges. Beyond methods that rely on backscatter, there have been recent works indicating that SAR interferometric observables have the potential to estimate soil moisture, especially in arid and semi-arid regions where these are particularly sensitive to moisture changes. To address these challenges, this paper recommends integrating Sentinel-1 with other satellite mission data for a multi-sensor data integration approach (e.g., Sentinel-2 and Soil Moisture Active Passive - SMAP data), refining physical and semi-empirical models, developing advanced AI techniques able to consider physical principles, and combining with emerging data from other high temporal resolution SAR missions (e.g., NASA-ISRO SAR). The review concludes with identification of key research priorities, including standardization of retrieval frameworks, improved validation efforts on standardized reference sets, and cloud processing for real-time user cases. Overall, the review provides a thorough foundation for understanding, refining, and advancing Sentinel-1 based soil moisture retrieval methods.
-
Soil pH and latitude as a major predictor of C:N:P stoichiometry in GermanyPegah Khosravani, Ndiye Michael Kebonye, Ruhollah Taghizadeh-Mehrjardi, and 3 more authorsCATENA, Mar 2026Soil stoichiometry governs nutrient cycling to ensure optimal ecosystem functionality. Although the soil carbon‑nitrogen‑phosphorus (C:N:P) stoichiometry and ecosystem functioning are closely related, much less is known about how environmental predictors regulate the spatial distribution of these ratios in temperate regions. Specifically, the statistical relationships of soil properties (such as soil pH and clay content), climate variables (like precipitation and temperature), and topographic features (i.e., slope and aspect) on C:N:P stoichiometric patterns at regional scales remain poorly understood. In our study, we combined Cubist machine learning for spatial predictions with state-of-the-art statistical approaches—generalized additive models and structural equation modeling—to disentangle the quantitative relationships between environmental predictors and soil C:N, C:P, and N:P ratios across Germany. The relative importance analysis of environmental predictors shows that soil pH is the major predictor of stoichiometric ratios, acting through its fundamental control on nutrient availability. Higher soil pH corresponded to lower stoichiometric ratios and vice versa. Latitude emerged as another important predictor due to its effect on temperature, which plays a crucial role in these ratios, such that increasing latitude corresponds to lower ratios. As expected, wall-to-wall spatial distribution maps of the stoichiometric ratios showed varying patterns due to different environmental predictor influences. These findings enhance our understanding of environmental-stoichiometric interactions and offer valuable insights needed for sustainable soil management in temperate regions.
2025
-
Towards explainable AI: interpreting soil organic carbon prediction models using a learning-based explanation methodNafiseh Kakhani, Ruhollah Taghizadeh-Mehrjardi, Davoud Omarzadeh, and 3 more authorsEuropean Journal of Soil Science, 2025An understanding of the key factors and processes influencing the variability of soil organic carbon (SOC) is essential for the development of effective policies aimed at enhancing carbon storage in soils to mitigate climate change. In recent years, complex computational approaches from the field of machine learning (ML) have been developed for modelling and mapping SOC in various ecosystems and over large areas. However, in order to understand the processes that account for SOC variability from ML models and to serve as a basis for new scientific discoveries, the predictions made by these data-driven models must be accurately explained and interpreted. In this research, we introduce a novel explanation approach applicable to any ML model and investigate the significance of environmental features to explain SOC variability across Germany. The methodology employed in this study involves training multiple ML models using SOC content measurements from the LUCAS dataset and incorporating environmental features derived from Google Earth Engine (GEE) as explanatory variables. Thereafter, an explanation model is applied to elucidate what the ML models have learned about the relationship between environmental features and SOC content in a supervised manner. In our approach, a post hoc model is trained to estimate the contribution of specific inputs to the outputs of the trained ML models. The results of this study indicate that different classes of ML models rely on interpretable but distinct environmental features to explain SOC variability. Decision tree-based models, such as random forest (RF) and gradient boosting, highlight the importance of topographic features. Conversely, soil chemical information, particularly pH, is crucial for the performance of neural networks and linear regression models. Therefore, interpreting data-driven studies requires a carefully structured approach, guided by expert knowledge and a deep understanding of the models being analysed.
-
Erosion-SAM: Semantic segmentation of soil erosion by waterHadi Shokati, Andreas Engelhardt, Kay Seufferheld, and 4 more authorsCatena, 2025Soil erosion (SE) by water threatens global agriculture by depleting fertile topsoil and causing economic costs. Conventional SE models struggle to capture the complex, non-linear interactions between SE drivers. Recently, machine learning has gained attention for SE modeling. However, machine learning requires large data sets for effective training and validation. In this study, we present Erosion-SAM, which fine-tunes the Segment Anything Model (SAM) for automatic segmentation of water erosion features in high-resolution remote sensing imagery. The data set comprised 405 manually segmented agricultural fields from erosion-prone areas obtained from the rain gauge-adjusted radar rainfall data (RADOLAN) for bare cropland, vegetated cropland, and grassland. Three approaches were evaluated: two pre-processing techniques— resizing and cropping — and an improved version of the resizing approach with user-defined prompts during the testing phase. All fine-tuned models outperformed the original SAM, with the prompt-based resizing method showing the highest accuracy, especially for grassland (recall: 0.90, precision: 0.82, dice coefficient: 0.86, IoU: 0.75). SAM performed better than the cropping approach only on bare cropland. This discrepancy is attributed to the tendency of SAM to overestimate SE by classifying a large proportion of fields as eroded, which increases recall by covering most of the eroded pixels. All three fine-tuned approaches showed strong correlations with the actual SE severity ratios, with the prompt-enhanced resizing approach achieving the highest R2 of 0.93. In summary, Erosion-SAM shows promising potential for automatically detecting SE features from remote sensing images. The generated data sets can be applied to machine learning-based SE modeling, providing accurate and consistent training data across different land cover types, and offering a reliable alternative to traditional SE models. In addition, erosion-SAM can make a valuable contribution to the precise monitoring of SE with high temporal resolution over large areas, and its results could benefit reinsurance and insurance-related risk solutions.
-
Estimating soil organic carbon using time series Band 11 (SWIR) of multispectral Sentinel-2 satellite images and machine learning algorithmsMehdi Golkar Amoli, Mahdi Hasanlou, Farhad Samadzadegan, and 2 more authorsRemote Sensing Applications: Society and Environment, 2025 -
Mapping Soil Volumetric Water Content at Multiple Depths for Variable Rate Irrigation Using UAV and Yield Monitor Data With Random ForestsR Taghizadeh-Mehrjardi, R Kerry, B Ingram, and 6 more authorsSoil Use and Management, 2025The Mountain West has been experiencing severe drought for \textgreater 20 years. As agriculture uses the greatest amount of the limited fresh water supply, employing variable rate irrigation (VRI) can reduce agricultural water use. To effectively apply VRI, accurate maps of soil volumetric water content (VWC) for the whole root zone (0–120 cm) are essential. This research employs Random Forests (RFs) with topographic, crop reflectance data from an Unmanned Aerial Vehicle and data from a yield monitor to map VWC at four depths. The RF model generally predicted VWC well, within ~1%–3% RMSE, but its performance varied between soil depths and sampling periods. Predictions were slightly more accurate for the top two depths than for the deeper two depths. The models showed that terrain and scaling factors rather than crop attributes were the most influential in predicting VWC at different depths at the scale of an individual field. An exception where crop and yield attributes were important was Fall 2017, which followed a hotter than average Summer. Both high and low spatial resolution data were important to predictions as they relate to features of different scales in the field. A jack-knife procedure showed that, on average, sampling effort could be reduced to 50–60 samples from \textgreater 100 while maintaining errors of only 2%–3%. Since SCORPAN factors vary little within a field, large samples are still needed to calibrate dense covariates, so testing the RF approach at the farm scale may be more practical for mapping soil water content.
2024
-
Global Soil Salinity Estimation at 10 m Using Multi-Source Remote SensingNan Wang, Songchao Chen, Jingyi Huang, and 8 more authorsJournal of Remote Sensing, Mar 2024Salinization is a threat to global agricultural and soil resource allocation. Current investigations of global soil salinity are limited to coarse spatial resolution of the available datasets (\textgreater250 m) and semiqualitative classification rules (five ranks). Based on these two limitations, we proposed a framework to quantitatively estimate global soil salt content in five climate regions at 10 m by integrating Sentinel-1/2 remotely sensed images, climate, parent material, terrain data, and machine learning. In hyper-arid and arid region, models established using Sentinel-2 and other geospatial data showed the highest accuracy with R2 of 0.85 and 0.62, respectively. In semi-arid, dry sub-humid, and humid regions, models performed best using Sentinel-1, Sentinel-2, and other geospatial data with R2 of 0.87, 0.80, and 0.87, respectively. The accuracy of the global models is considerable with field validation in Iran and Xinjiang, and compared with digitized salinity maps in California, Brazil, Turkey, South Africa, and Shandong. The proportion of extremely saline soils in Europe is 10.21%, followed by South America (5.91%), Oceania (5.80%), North America (4.05%), Asia (1.19%), and Africa (1.11%). Climatic conditions, groundwater, and salinity index are key covariates in global soil salinity estimation. Use of radar data improves estimation accuracy in wet regions. The map of global soil salinity at 10 m provides a detailed, high-precision basis for soil property investigation and resource management.
-
Acidification of European croplands by nitrogen fertilization: Consequences for carbonate losses, and soil healthKazem Zamanian, Ruhollah Taghizadeh-Mehrjardi, Jingjing Tao, and 4 more authorsScience of The Total Environment, May 2024Soil acidification is an ongoing problem in intensively cultivated croplands due to inefficient and excessive nitrogen (N) fertilization. We collected high-resolution data comprising 19,969 topsoil (0–20 cm) samples from the Land Use and Coverage Area frame Survey (LUCAS) of the European commission in 2009 to assess the impact of N fertilization on buffering substances such as carbonates and base cations. We have only considered the impacts of mineral fertilizers from the total added N, and a N use efficiency of 60 %. Nitrogen fertilization adds annually 6.1 × 107 kmol H+ to European croplands, leading to annual loss of 6.1 × 109 kg CaCO3. Assuming similar acidification during the next 50 years, soil carbonates will be completely removed from 3.4 × 106 ha of European croplands. In carbonate-free soils, annual loss of 2.1 × 107 kmol of basic cations will lead to strong acidification of at least 2.6 million ha of European croplands within the next 50 years. Inorganic carbon and basic cation losses at such rapid scale tremendously drop the nutrient status and production potential of croplands. Soil liming to ameliorate acidity increases pH only temporarily and with additional financial and environmental costs. Only the direct loss of soil carbonate stocks and compensation of carbonate-related CO2 correspond to about 1.5 % of the proposed budget of the European commission for 2023. Thus, controlling and decreasing soil acidification is crucial to avoid degradation of agricultural soils, which can be done by adopting best management practices and increasing nutrient use efficiency. Regular screening or monitoring of carbonate and base cations contents, especially for soils, where the carbonate stocks are at critical levels, are urgently necessary.
-
High-performance soil class delineation via UMAP coupled with machine learning in Kurdistan Province, IranRuhollah Taghizadeh-Mehrjardi, Kamal Nabiollahi, Ndiye M. Kebonye, and 6 more authorsGeoderma Regional, Mar 2024In response to the demand for spatial information on the soil to support the sustainable management of soil resources, this study applies a digital soil mapping approach to predict soil classes for a 7000 ha area, located in Kurdistan province, Iran. Based on a stratified random sampling design, 91 soil profiles were situated, described, and classified into soil great groups. Environmental covariates used for modelling soil classes included terrain derivatives, remote sensing data, distance-based rasters, and legacy geospatial information (e.g., geological map). To address the issue of data multi-collinearity among the predictors, three dimensionality reduction techniques were tested: the principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), and the novel Uniform Manifold Approximation and Projection (UMAP). An initial suite of 160 environmental covariates was reduced to 10 for all the methods and used to train a Random Forest (RF) model. The most effective model coupled UMAP with the Random Forest (RF-UMAP) machine-learner, which yielded a kappa index and overall accuracy values of 0.73 and 0.80, respectively. Within Kurdistan, topography and parent material were the main soil-forming factors influencing the prediction of the soil classes. Overall, the use of UMAP outperformed PCA and t-SNE. This study demonstrates the value of using advanced dimension reduction methods to facilitate the handling of non-linear relationships among predictor variables when using RF.
-
Using local ensemble models and Landsat bare soil composites for large-scale soil organic carbon maps in croplandTom Broeg, Axel Don, Alexander Gocht, and 3 more authorsGeoderma, Apr 2024National soil organic carbon (SOC) maps are essential to improve greenhouse gas accounting and support climate-smart agriculture. Large-scale SOC models based on wall-to-wall soil information from remote sensing remain a challenge due to the high diversity of natural soil conditions and the difficulty of accounting for the spatial location of the soil samples. In this study, we tested if the implementation of local ensemble models (LEM) can be used to improve the SOC predictions from Landsat-based soil reflectance composites (SRC) for Germany. For this, we divided the research area into 30 times 30 km tiles and calculated local generalized linear models (GLM) based on random, nearby observations. Based on the GLMs, local SOC maps were predicted and aggregated using a moving window approach. The local variable importance was analyzed to identify spatial dependencies in the correlation between the SRC and SOC. For the final SOC map, a Random Forest (RF) model was trained using the aggregated local SOC predictions, the SRC, and a full set of training samples from the agricultural soil inventory. The results show that the LEM was able to improve the accuracy (R2 = 0.68; RMSE = 5.6 g kg−1), compared to the maps based on a single, global model (R2 = 0.52; RMSE = 6.8 g kg−1). The local variable importance of the spectral bands showed clear spatial patterns throughout the research area. Differences can be explained by the local soil conditions, influencing the correlation between SOC and the spectral properties. Compared to the widely adopted integration of distance covariates such as geographical coordinates, the LEM was able the reduce the spatial autocorrelation to a greater extent and to improve the prediction accuracy, especially for underrepresented SOC values. The LEM presents a new method to integrate spatial information and increase the interpretability of DSM models.
-
A brief review of digital soil mapping in IranRuhollah Taghizadeh-Mehrjardi, Mojtaba Zeraatpisheh, Alireza Amirian-Chakan, and 1 more authorIn Remote Sensing of Soil and Land Surface Processes, 2024This paper offers a brief review of digital soil mapping (DSM) in Iran, which utilizes machine learning and environmental data to create soil maps for better soil management. The review examines the history of DSM in Iran, the latest advances in machine learning methods, and the environmental covariates commonly used in DSM. Despite a short history in Iran, DSM has gained significant interest in recent years, with multiple studies conducted in various regions of the country. The review emphasizes the importance of using advanced machine learning methods, such as random forests, and artificial neural networks for accurate soil property prediction and classification. The significance of environmental covariates, including topography, geology, climate, and land use, in DSM in Iran is also discussed. The paper concludes that DSM has the potential to play a crucial role in soil management and conservation in Iran, and further research is necessary to realize this potential.
2023
-
Transferability of Covariates to Predict Soil Organic Carbon in Cropland SoilsTom Broeg, Michael Blaschek, Steffen Seitz, and 9 more authorsRemote Sensing, Feb 2023Precise knowledge about the soil organic carbon (SOC) content in cropland soils is one requirement to design and execute effective climate and food po...
-
Model averaging of machine learning algorithms for digital soil mapping: A minimum variance frameworkPatrick Bogaert, Ruhollah Taghizadeh-Mehrjardi, and Nikou HamzehpourGeoderma, Sep 2023In the digital soil mapping framework, machine learning (ML) algorithms are currently the most popular methods for the spatial prediction of soil properties. The fast developments of easy-to-use software implementations for a large panel of ML algorithms have encouraged comparison studies between algorithms, with the goal of ranking their performances and identifying the best ones among them. However, as no firm conclusions can be drawn about the best ML algorithm to be used in general, this suggests that combining a set of them could be a better approach. Numerous methods have been proposed to do so, most of them relying on a linear weighting of the individual algorithms. However, there are almost as many methods for linearly weighting ML algorithms as there are ML algorithms, thus leaving the problem unsolved. Moreover, these weighting methods are mostly used out-of-the-box, without paying a proper attention to the associated hypotheses. In this paper, we propose to address this issue by setting the problem in a more formal framework. Starting from classical hypotheses, it is shown how the benefit of averaging various ML algorithms can be estimated from their joint performances. Relying afterwards on the most commonly used linear weighting schemes, it is reminded that, as long as the performance metrics are based on mean square errors, the best averaging method is by essence the best linear (unbiased) predictor. Using a more general Bayesian framework, it is also shown that accounting for conditional biases when weighting ML algorithms is a key issue for obtaining improved predictions, and explicit formulas are proposed for that goal. Finally, these theoretical results are illustrated and discussed using a soil data set collected over an arid and semi-arid region in Iran where clay content, calcium carbonate equivalent, soil organic carbon and electrical conductivity were measured in topsoil samples.
-
The patchiness of soil 13C versus the uniformity of 15N distribution with geomorphic position provides evidence of erosion and accelerated organic matter turnoverMitra Ghotbi, Ruhollah Taghizadeh-Mehrjardi, Claudia Knief, and 3 more authorsAgriculture, Ecosystems & Environment, Oct 2023Farming on hillslopes often affects the accumulation and loss of soil organic matter (SOM) depending on slope position and cropping patterns. Most hillslope studies focus on soil movement to characterize SOM turnover under erosive conditions. In this study, we trace erosion and characterize agronomic practices erosive impacts on SOM translocation and transformation along geomorphic positions. To achieve this, we assessed the horizontal distribution (upper 15 cm) and vertical distribution (to 100 cm profiles) of soil δ15N and δ13C isotope abundance individually. We mapped the spatial distribution of δ13C, δ15N, and SOM turnover indices as a novel approach to tracing erosion and degradation of SOM in the field. Except for tillage (conventional vs. reduced tillage), other individual agricultural practices (residue removal with no cover crop vs. retaining residuals, cover cropping, and fertilizer 0, 40, and 80 kg ha-1 nitrogen) caused no significant shifts in δ15N and δ13C values in topsoil (0–15 cm). Among the evaluated factors, topography and depth predicted soil δ15N and δ13C profiles. Trends in δ13C vs. δ15N showed a wider range of δ13C values in topsoil of upslope plots under reduced tillage, while in the depositional location, conventional tillage had the same effect. This suggests erosion under reduced tillage occurred. Erosion and accelerated decomposition gradually slowed δ13C enrichment with soil depth. Digital soil mapping approach depicted low continuity of δ13C vs. high continuity of δ15N with geomorphic position We attributed the intermediate δ13C values, and steeper slope of δ13C against logarithm of soil organic carbon (SOC) across the slope to erosion and high SOM turnover, particularly of recently added plant inputs. Current results support the prediction of intensive vs. conservation practices’ effects on upslope soil stability and the fate of SOM in both topsoil and at depth of sloping farmlands.
-
Predictive performance of machine learning model with varying sampling designs, sample sizes, and spatial extentsAbdelkrim Bouasria, Yassine Bouslihim, Surya Gupta, and 2 more authorsEcological Informatics, Dec 2023Using machine learning and earth observation data to capture real-world variability in spatial predictive mapping depends on sample size, design, and spatial extent. Nonetheless, there is still ambiguity in answering some basic questions: a) How many samples are necessary for fitting the model? b) Which sampling techniques are suitable for modeling? c) Do results vary with changes in spatial extents? These questions are crucial for spatial modeling projects and require proper investigation. In the present study, we evaluated two sampling designs with different sample sizes, considering three nested spatial extents. Specifically, we adopted the conditioned Latin Hypercube Sampling and Simple Random Sampling designs. Based on this, a Random Forest model was used to predict Above-Ground forest Biomass at local, regional, and national spatial extents, comparing different sample sizes (n = 25, 50, 100, 200, 300, and 500). We defined one national extent, five regional extents within the national extent, and a local extent inside each regional extent. Each sampling design and size combination was tested 100 iterations. The results showed that there was no significant difference between the different sampling designs. The accuracy metrics showed marginal differences for 25 and 50 sample sizes, which were then reduced to minimal and provided similar results. However, a deeper analysis of all 100 repetitions exposed a noteworthy pattern: cLHS outperformed the SRS in terms of RMSE and variability. Regarding the sampling size, the R2 values increased with increasing sample size. Nevertheless, beyond a minimum of 300 to 500 samples, the improvement in accuracy became insignificant, emphasizing the diminishing returns with excessively large sample sizes. Moreover, increasing the size of the spatial extent reduced the accuracy of the model, possibly due to the effect of environmental factors or landscape nature. Therefore, this study demonstrates the potential impact of sample size, sampling design, and spatial extents on model accuracy and emphasizes the importance of reducing the sample size to reduce the model’s complexity.
2022
-
Semi-supervised learning for the spatial extrapolation of soil informationRuhollah Taghizadeh-Mehrjardi, Razieh Sheikhpour, Mojtaba Zeraatpisheh, and 4 more authorsGeoderma, Nov 2022Digital soil mapping (DSM) can be used to predict soils at unvisited sites, but problems arise when predictions are needed in areas without any soil observations. In such situations, DSM can still extend the results from reference areas with soil data to target areas that are alike in terms of soil-forming factors and obey the same rules. Such DSM methods have low accuracy due to the complexity of spatial variation in soil, and the difficulty of matching soil-forming factors exactly between reference and target areas. A new approach for extrapolating soil information from reference to target areas is proposed in the current research. We evaluated the ability of a semi-supervised learning (SSLR→T) approach compared to a supervised learning (SLR→T) approach for extrapolating soil classes in two areas (reference and target areas) in central Iran. The SSLR→T used soil observations from the reference area and covariates from both areas. Then, the learned knowledge produced by SSLR→T was transferred to the target area to estimate soil classes. The findings revealed that SSLR→T resulted in higher overall accuracy (0.65) and kappa index (0.44) in the target area compared to the SLR→T (overall accuracy = 0.40 and kappa index = 0.18). Furthermore, the SSLR→T produced the lower values of the confusion index (mean = 0.66) compared to the SLR→T (mean = 0.80). This indicated that the SSLR→T could not only increase the accuracy but also decrease the uncertainty of the soil class predictions, compared to the spatial extrapolation predictions derived from the SLR→T. Generally, these findings indicated that leveraging covariate information from the target area during the training of DSM models in the reference area could successfully improve the generalization power of the models, indicating the effectiveness of SSLR→T for spatial extrapolation.
-
Improving the spatial prediction of soil organic carbon using environmental covariates selection: A comparison of a group of environmental covariatesMojtaba Zeraatpisheh, Younes Garosi, Hamid Reza Owliaie, and 4 more authorsCATENA, Jan 2022In the digital soil mapping (DSM) framework, machine learning models quantify the relationship between soil observations and environmental covariates. Generally, the most commonly used covariates (MCC; e.g., topographic attributes and single-time remote sensing data, and legacy maps) were employed in DSM studies. Additionally, remote sensing time-series (RST) data can provide useful information for soil mapping. Therefore, the main aims of the study are to compare the MCC, the monthly Sentinel-2 time-series of vegetation indices dataset, and the combination of datasets (MCC + RST) for soil organic carbon (SOC) prediction in an arid agroecosystem in Iran. We used different machine learning algorithms, including random forest (RF), Cubist, support vector machine (SVM), and partial least square regression (PLSR). A total of 237 soil samples at 0–20 cm depths were collected. The 5-fold cross-validation technique was used to evaluate the modeling performance, and 50 bootstrap models were applied to quantify the prediction uncertainty. The results showed that the Cubist model performed the best with the MCC dataset (R2 = 0.35, RMSE = 0.26%) and the combined dataset of MCC and RST (R2 = 0.33, RMSE = 0.27%), while the RF model showed better results for the RST dataset (R2 = 0.10, RMSE = 0.31%). Soil properties could explain the SOC variation in MCC and combined datasets (66.35% and 50.82%, respectively), while NDVI was the most controlling factor in the RST (50.22%). Accordingly, results showed that time-series vegetation indices did not have enough potential to increase SOC prediction accuracy. However, the combination of MCC and RST datasets produced SOC spatial maps with lower uncertainty. Therefore, future studies are required to explicitly explain the efficiency of time-series remotely-sensed data and their interrelationship with environmental covariates to predict SOC in arid regions with low SOC content.
-
Spatial variability of soil quality within management zones: Homogeneity and purity of delineated zonesMojtaba Zeraatpisheh, Eduardo Leonel Bottega, Esmaeil Bakhshandeh, and 5 more authorsCATENA, Feb 2022Fields are the original management zones used in agricultural ecosystems. Uniformity of soil within management zones (MZ) is crucial for sustainable soil management, long-term productivity, and avoiding environmental problems. When considering a new area for agricultural expansion or for improving the efficiency of existing agricultural practices, it is useful to identify homogeneous areas or MZs so that the land can be more sustainably used in the future. One way to identify MZs could be through soil quality assessment. Management zones were determined for an agroecosystem region in southern Iran with an area of 452 km2, and the homogeneity and purity of delineated zones were examined by soil quality assessment. Soil quality grades were calculated using 421 top-soil samples and two methods: i) the total data set (TDS) and ii) the minimum data set (MDS). The spatial distribution of soil quality grades was mapped using a random forest model. MZs were delineated using a fuzzy k-means classification algorithm based on the MDS. The random forest model mapped the spatial distribution of the soil quality well (R2 \textgreater 0.871). Among five soil quality grades, three soil quality grades, high (II), moderate (III), and low (IV), were found to cover 90.74 and 93.11% of the total studied area as predicted by the TDS and MDS, respectively. The subsequent classification of the soil quality data into MZs using fuzzy k-means identified two different MZs (p \textless 0.05). This means that there were heterogeneous soil quality grades in each of the MZs. Consequently, when fuzzy k-means is used to define MZs for the classification of agricultural ecosystems, areas with different soil quality may occur within each MZ. Soil management would be theoretically better if soil quality within management zones were homogenous. This study offers a framework to investigate the homogeneity of delineated MZs in terms of soil quality.
-
Assessing changes in soil quality between protected and degraded forests using digital soil mapping for semiarid oak forests, IranKhadijeh Taghipour, Mehdi Heydari, Yahya Kooch, and 3 more authorsCATENA, Jun 2022Soil quality, defined as the capacity of a soil to function, is one of the most important characteristics of soil. Methods for modelling and monitoring soil quality are needed for sustainable soil management and evaluating soil degradation. In Iran, resource demands have led to the deforestation of the mixed semiarid oak forests; however, the impacts of these activities on the spatial patterns of soil quality remains unclear. This study calculates a soil quality index (SQI) from an integrated suite of soil biological, physical, and chemical properties and compares the SQI between a paired degraded/deforested area and a protected forested area in Iran using a digital soil mapping (DSM) approach via geostatistical and machine learning techniques. Here, 50 soil samples were acquired for each of the degraded/deforested and protected forested areas, whereby 14 soil attributes were measured. Results showed that the soil organic carbon, total nitrogen, available potassium, cation exchange capacity, pH, clay, saturated water content, and basal respiration in the protected area were significantly higher than the degraded forest area. Furthermore, the soil quality in the protected area was substantially higher than the degraded area. To select the best modelling approach for mapping SQI, machine learning approaches using regression tree (RT), artificial neural networks (ANN), and Random Forest (RF) models were compared against geostatistical approaches using inverse distance weighted interpolation, global polynomial interpolation, radial basis function interpolation, local polynomial interpolation, and kriging (ordinary, simple, and universal). Of the machine learning techniques, the RF model (R2 = 0.66) outperformed ANN and RT, while Universal Kriging outperformed all geostatistical approaches (R2 = 0.71). By comparing the SQI maps between the degraded/deforested and protected forested areas, the soil quality was substantially higher for the protected areas. This study demonstrates a framework for assessing the impacts of deforestation on the spatial patterns of soils using DSM techniques, which will facilitate effective land use planning and sustainable forest resource management strategies.
2021
-
Enhancing the accuracy of machine learning models using the super learner technique in digital soil mappingRuhollah Taghizadeh-Mehrjardi, Nikou Hamzehpour, Maryam Hassanzadeh, and 4 more authorsGeoderma, Oct 2021Digital soil mapping approaches predict soil properties based on the relationships between soil observations and related environmental covariates using techniques such as machine learning (ML) models. In this research, a wide range of ML models (12 base learners) were tested in predicting and mapping soil properties. Furthermore, a super learner approach was used to improve model accuracy by combining the predictions of the base learners. A major challenge of using super learner and complex models is that the exact contribution of individual covariates in the overall prediction is not always known. To address this issue, permutation feature importance (PFI) analysis was applied as a model-agnostic interpretation tool. The weights assigned to each ML base learner obtained from super learner, and feature importance values obtained from each ML base learner were used to quantify the contribution of individual covariates on the final prediction. The super learner and PFI techniques were tested by predicting a variety of soil physical and chemical properties of the Urmia Lake playa in Iran. As expected, the results indicated that the super learner had substantially higher accuracies for predicting soil properties in comparison to the individual base learners. For instance, the super learner showed an improved performance in comparison to linear regression by decreasing the root mean square error by an average of 46%. The PFI analysis revealed the important contribution of geomorphic and groundwater data in predicting soil properties. Overall, the proposed approach may be used for improving accuracy of ML models in digital soil mapping.
-
Improving the spatial prediction of soil salinity in arid regions using wavelet transformation and support vector regression modelsRuhollah Taghizadeh-Mehrjardi, Karsten Schmidt, Norair Toomanian, and 7 more authorsGeoderma, Feb 2021The low potential of agricultural productivity in the majority of central Iran is mainly attributed to high levels of soil salinity. To increase agricultural productivity, while preventing any further salinization, and implement effective soil reclamation programs, precise information about the spatial patterns and magnitude of soil salinity is essential. In this study, soil salinity was predicted and mapped using machine learning (ML) and digital soil mapping approaches. Specifically, support vector regression (SVR) was combined with wavelet transformation (W-SVR) of a wide range of environmental covariates derived from a digital elevation model, remote sensing, and climatic data. Predictions of soil salinity were carried out for six standard depth increments (0–5, 5–15, 15–30, 30–60, 60–100, 100–200 cm). Cross-validation was carried out by partitioning the data into 70% used for training the model and 30% for testing the model. Uncertainty of the ML algorithms was quantified using the uncertainty estimation based on local errors and clustering (UNEEC) method. The results indicated that W-SVR performed better in predicting soil salinity for all six depth increments. The differences were most apparent for the lowest soil depth increments where W-SVR resulted in ~1.4 times higher correlation coefficient when compared to the SVR. At lower soil depths increments, covariate importance analysis indicated that topographic derivatives were the most relevant covariates in the models. For topsoil salinity, remote sensing covariates were the most relevant predictors of soil salinity. Regardless of soil depth, climatic predictors were the most important predictors. Uncertainty analysis also indicated that for all depth increments, the estimated prediction interval for SVR obtained by the UNEEC method was wider than that of W-SVR and further indicating the higher performance of W-SVR in comparison to the SVR. The predicted salinity maps showed the highest salinity for soils in the eastern parts of central Iran, which was consistent with the Agro-climatic Zoning of Isfahan Province.
-
High resolution middle eastern soil attributes mapping via open data and cloud computingRaúl Roberto Poppiel, José Alexandre Melo Demattê, Nícolas Augusto Rosin, and 25 more authorsGeoderma, Mar 2021Soil presents a high vulnerability to the environmental degradation processes especially in arid and semiarid regions, requiring research that leads to its understanding. To date, there are no detailed soil maps covering a large extension of the Middle East region, especially for calcium carbonate content. Thus, we used topsoil data (0–20 cm) from more than 5000 sites for mapping near 3,338,000 square km of the Middle East. To do this, we used covariates obtained from remote sensing data and random forest (RF) algorithm. Around 65% of the soil information was acquired from Iranian datasets and the remaining from the World Soil Information Service dataset. By using 30 covariates layers—soil, climate, relief, parent material and age features— we then trained and tuned RF regression models—in R software— and used the optimal ones (according to the minimum root mean square error) for making spatial predictions—within Google Earth Engine— of topsoil attributes and associated uncertainties at 30 m resolution. All covariates were relatively important for mapping topsoil attributes, ranging from 4% to 98%. Annual precipitation, temperature annual range and elevation were the most important ones (\textgreater31%). Overall, the prediction models trained by RF explained around 40–66% of the variation present in topsoil attributes. The ratio of the performance to interquartile distance (RPIQ) ranged between 1.59 and 2.83, suggesting accurate models. Our predicted maps indicated that sandy and loamy soils with poor organic carbon levels, alkaline reaction and high calcium carbonate content were widespread in middle eastern topsoils. Our framework overcomes some limitations related to high computational requirements and enables accurate predictions of topsoil attributes. Our maps presented correct pedological correspondences and had realistic spatial representations and interesting levels of uncertainties.
-
Assessing agricultural salt-affected land using digital soil mapping and hybridized random forestsKamal Nabiollahi, Ruhollah Taghizadeh-Mehrjardi, Aram Shahabi, and 4 more authorsGeoderma, Mar 2021Salinization and alkalization are predominant environmental problem world-wide which their accurate assessment is essential for determining appropriate ways to deal with land degradation, for better soil and crop management. In the current research, a combination of random forests and covariate data were used to assess spatial variability of soil salinity and sodicity in 436 km2 agricultural salt-affected land in Kurdistan Province, Iran. Using the conditioned Latin hypercube sampling method, 295 soil samples were sampled across the study area, and then soil reaction (pH), electrical conductivity (EC), and sodium adsorption ratio (SAR) were measured. Covariate data including terrain attributes, remotely-sensed data, groundwater table, and categorical maps were acquired. Random forest (RF) models were used to predict the spatial distribution of pH, EC, and SAR by making a relationship between soil data and covariates. Furthermore, three optimization algorithms (particle swarm optimization-PSO, genetic algorithm-GA, and bat algorithm-BAT) were used to explore if the hybridized RF works better than the standard RF. Results of 10-fold cross-validation with 100 replications indicated that the accuracy of RF + PSO was higher for predicting pH (RMSE = 0.52 and R2 = 0.67), EC (RMSE = 2.32 dSm−1 and R2 = 0.57), and SAR (RMSE = 8.98 and R2 = 0.54, respectively) in comparison to the other implemented models. Furthermore, the results disclosed that the most important covariates to predict pH, EC, and SAR were groundwater table, categorical maps, salinity index, and multi-resolution ridge top flatness. Besides, the results indicated that the mean values for pH, EC, and SAR in lowland and bare land were significantly different from the other physiographic units and land uses, respectively. Importantly, the classified map of salt-affected soils highlighted areas with a high risk of exceeding critical threshold values of pH, EC, and SAR, which is located in the center of the study area, and showed that 6.30%, 3.1%, and 4.6% of the study area are saline-sodic soil, saline soil, and sodic soil, respectively. These up to date spatial soil information on severity of soil salinity and sodicity is crucial for agricultural management of affected areas and the proposed method can be used to the other similar regions.
-
Bio-Inspired Hybridization of Artificial Neural Networks: An Application for Mapping the Spatial Distribution of Soil Texture FractionsRuhollah Taghizadeh-Mehrjardi, Mostafa Emadi, Ali Cherati, and 9 more authorsRemote Sensing, Mar 2021Soil texture and particle size fractions (PSFs) are a critical characteristic of soil that influences most physical, chemical, and biological properti...
-
Using environmental variables and Fourier Transform Infrared Spectroscopy to predict soil organic carbonMaryam Ghebleh Goydaragh, Ruhollah Taghizadeh-Mehrjardi, Ali Asghar Jafarzadeh, and 2 more authorsCATENA, Jul 2021Soil Organic Carbon (SOC) content is a key element for soil fertility and productivity, nutrient availability and potentially represents a measurement of the sink for greenhouse gas abatement. Improving our knowledge on the spatial distribution of SOC is hence essential for sustainable nutrient management and carbon storage capacity. The objective of this study was to evaluate the performance of six tree-based machine-learning models when using environmental variables (i.e., remote sensing and terrain attributes - scenario 1), Fourier Transform Infrared Spectroscopy (FTIR) data (scenario 2) and combination of environmental variables and FTIR data (scenario 3) as predictors in prediction of SOC content. The models included Random Forest, Cubist, Conditional Inference Forest, Conditional Inference Trees, Extreme Gradient Boosting and Classification, Regression Trees. Furthermore, we explored if the Bat optimization algorithm can improve the prediction accuracy of the models. The study was conducted across a 7000 ha field in the Miandoab County, Northern Iran, with a total of 80 soil samples collected systematically in a regular grid (700 × 1000 m). According to Leave-One-Out Cross-Validation, the best prediction performance was achieved by the Cubist+Bat model when environmental variables and FTIR spectra (scenario 3) were used (Coefficient of determination = 0.73, Concordance Correlation Coefficient = 0.77, Root Mean Square Error = 0.36, Mean Absolute Error = 0.31, Median Absolute Error = 0.28). FTIR data had the highest influence on the prediction accuracy of SOC. Therefore, it can be concluded that the combination of environmental variables and FTIR data with Cubist+Bat model as a precise approach to monitor SOC in semi-arid soils of Iran. The final Digital Soil Map (DSM) of SOC revealed that improvements in prediction might be possible with the collection of more soil samples in areas where the land use and topography changed over short spatial scales.
2020
-
Multi-task convolutional neural networks outperformed random forest for mapping soil particle size fractions in central IranR. Taghizadeh-Mehrjardi, M. Mahdianpari, F. Mohammadimanesh, and 4 more authorsGeoderma, Oct 2020Knowledge about the spatial distribution of soil particle size fractions (PSF) is critical for sustainable management and resource assessment of the agricultural regions. Although conventional machine learning algorithms, such as random forest (RF) or support vector machine, have been extensively used in digital soil mapping to predict the PSF, less research examined the potential of state-of-the-art deep learning approaches for such processing. Importantly, deep learning approaches such as convolutional neural networks (CNNs) are able to incorporate contextual information about the landscape, which is of great use for DSM analysis. Accordingly, this study addresses this much-needed investigation by using a patch-based, multi-task CNN for predicting PSF of clay, sand, and silt contents at six standard layers given as soil depth increments as recommended by the GlobalSoilMap.net (i.e., 0–5, 5–15, 15–30, 30–60, 60–100, 100–200 cm). The depth functions were derived from equal-area smoothing splines in a region covering large parts (~140,000 km2) of central Iran. The robustness of the proposed architecture is evaluated against RF. Additionally, to allow a fairer comparison between RF and CNN models, we used simple smoothing (mean) filters to effectively reproduce the auxiliary data which are then fed in the RF (RF*). To evaluate the three models, we established a training (75%) and test set (25%). According to the test set, for all soil depths and all PSFs, the results demonstrate that CNN consistently outperforms RF and RF* in terms of root mean square error (RMSE) and coefficient of determination (R2). At the top layer, for example, CNN decreased the RMSE values for clay, sand, and silt contents compared to the RF (22.4%, 18.9%, and 10.7%) and RF* (18.0%, 7.4%, and 9.6%). These findings indicate that even the use of feature-engineered auxiliary data did not enable the RF* models to reach the performance of CNN. The resulting maps can be used as valuable baseline soil information for the effective management of agricultural and environmental resources in the study area and beyond.
-
Synthetic resampling strategies and machine learning for digital soil mapping in IranRuhollah Taghizadeh-Mehrjardi, Karsten Schmidt, Kamran Eftekhari, and 5 more authorsEuropean Journal of Soil Science, 2020Most common machine learning (ML) algorithms usually work well on balanced training sets, that is, datasets in which all classes are approximately represented equally. Otherwise, the accuracy estimates may be unreliable and classes with only a few values are often misclassified or neglected. This is known as a class imbalance problem in machine learning and datasets that do not meet this criterion are referred to as imbalanced data. Most datasets of soil classes are, therefore, imbalanced data. One of our main objectives is to compare eight resampling strategies that have been developed to counteract the imbalanced data problem. We compared the performance of five of the most common ML algorithms with the resampling approaches. The highest increase in prediction accuracy was achieved with SMOTE (the synthetic minority oversampling technique). In comparison to the baseline prediction on the original dataset, we achieved an increase of about 10, 20 and 10% in the overall accuracy, kappa index and F-score, respectively. Regarding the ML approaches, random forest (RF) showed the best performance with an overall accuracy, kappa index and F-score of 66, 60 and 57%, respectively. Moreover, the combination of RF and SMOTE improved the accuracy of the individual soil classes, compared to RF trained on the original dataset and allowed better prediction of soil classes with a low number of samples in the corresponding soil profile database, in our case for Chernozems. Our results show that balancing existing soil legacy data using synthetic sampling strategies can significantly improve the prediction accuracy in digital soil mapping (DSM). Highlights Spatial distribution of soil classes in Iran can be predicted using machine learning (ML) algorithms. The synthetic minority oversampling technique overcomes the drawback of imbalanced and highly biased soil legacy data. When combining a random forest model with synthetic sampling strategies the prediction accuracy of the soil model improves significantly. The resulting new soil map of Iran has a much higher spatial resolution compared to existing maps and displays new soil classes that have not yet been mapped in Iran.
-
Investigation of the spatial and temporal variation of soil salinity using random forests in the central desert of IranHassan Fathizad, Mohammad Ali Hakimzadeh Ardakani, Hamid Sodaiezadeh, and 2 more authorsGeoderma, Apr 2020Traditional soil salinity studies, especially over large areas, are expensive and time-consuming. Therefore, it is necessary to employ new methods to examine salinity of large areas to reduce the time and cost of analysis. This study investigates soil salinity trends in the Yazd-Ardakan plain of Iran using remote sensing with emphasis on historic and projected land use and groundwater change between 1986 and 2030. A random forest model was used to estimate soil salinity. To predict the salinity of the Yazd-Ardakan plain in 2030, the relationships between soil and auxiliary data from 2016 were used. Land use parameters and groundwater quality parameters that are projected to change by 2030 were selected. A sensitivity analysis of a forage management model was conducted in conjunction with soil salinity modeling and the most important auxiliary data were found to be groundwater parameters and digital elevation derivatives of vegetation indices. Based on 10-fold cross-validation, random forest model predicted soil salinity with R2 value of 0.73. Comparison of soil salinity trends from 1986 to 2016 shows that during this period the size of the area with salinities in the range of 4–8 dS/m and \textgreater32 dS/m were increased from 1.6 to 3.1% (~1.5%↑) and from 13.1 to 18.3% (~5.1%↑), respectively. However, the size of the fairly high (8–12 dS/m), high (12–16 dS/m) and very high (16–32 dS/m) classes were decreased from 13.6 to 11.9% (~1.7%↓), from 20.2 to 16.5% (~3.8%↓), and from 50.2 to 49% (~1.1%↓) , respectively. In other words, it can be said that during this 30-years, we see an increase in salinity levels and a decrease in soil quality. The results of the changes in soil salinity show that between 2016 and 2030, the area of the class with \textgreater32 (dS/m) (43159.2 ha, 8.83%↑) increased, while the class with \textless4 (dS/m), was eliminated in 2030 and the 4–8 (dS/m) class is on the verge of disappearing. The trend of salinity changes in the region shows an increase from east to west, which is consistent with the trend of changes in the most important ancillary variables identified.
-
Improving the Spatial Prediction of Soil Organic Carbon Content in Two Contrasting Climatic Regions by Stacking Machine Learning Models and Rescanning Covariate SpaceRuhollah Taghizadeh-Mehrjardi, Karsten Schmidt, Alireza Amirian-Chakan, and 17 more authorsRemote Sensing, Mar 2020Understanding the spatial distribution of soil organic carbon (SOC) content over different climatic regions will enhance our knowledge of carbon gains...
-
Land Suitability Assessment and Agricultural Production Sustainability Using Machine Learning ModelsRuhollah Taghizadeh-Mehrjardi, Kamal Nabiollahi, Leila Rasoli, and 7 more authorsAgronomy, Apr 2020Land suitability assessment is essential for increasing production and planning a sustainable agricultural system, but such information is commonly sc...
-
Conventional and digital soil mapping in Iran: Past, present, and futureMojtaba Zeraatpisheh, Azam Jafari, Mohsen Bagheri Bodaghabadi, and 5 more authorsCatena, 2020 -
Spatio-temporal dynamic of soil quality in the central Iranian desert modeled with machine learning and digital soil assessment techniquesHassan Fathizad, Mohammad Ali Hakimzadeh Ardakani, Brandon Heung, and 5 more authorsEcological Indicators, Nov 2020Soil degradation reduces soil quality by a loss of organic matter, reduced soil fertility, and structural breakdown, erosion, undesirable changes in salinity, acidity or alkalinity and the effects of toxic chemicals, contaminants or excessive flooding. In this study, we investigated spatial and temporal changes of soil quality between 1986 and 2030 in the Central Iranian desert. 201 topsoil samples (0–20 cm) in Yazd-Ardakan plain of Iran were chosen with conditioned Latin hypercube sampling strategy and analysed for soil organic carbon, electrical conductivity, dry bulk density, aggregate stability, and soil heavy metals. A soil quality index was calculated using weighted index method. To predict the spatial distribution of the soil quality index, we implemented a random forest model based on a set of covariates with a coefficient of determination of 0.69 between soil quality index and covariates. The results of soil quality index changes from 1986 to 2030 show that during this period, the areal extent of the Very Low SQI Class (\textless0.4) increased by 78,242 ha (increase of 16.2%), while the extent of the Good SQI Class (\textgreater0.6) decreased by 366,018 ha (decrease of 75.8%). This indicates that the soil quality in the study area, over time, is progressively becoming poorer. The results of the study of soil quality changes can be used in land evaluation, environmental studies and integrated planning and management in order to properly and reasonably utilize natural resources and reduce future soil degradation.
2019
-
PMT: New analytical framework for automated evaluation of geo-environmental modelling approachesOmid Rahmati, Aiding Kornejady, Mahmood Samadi, and 8 more authorsScience of The Total Environment, May 2019Geospatial computation, data transformation to a relevant statistical software, and step-wise quantitative performance assessment can be cumbersome, especially when considering that the entire modelling procedure is repeatedly interrupted by several input/output steps, and the self-consistency and self-adaptive response to the modelled data and the features therein are lost while handling the data from different kinds of working environments. To date, an automated and a comprehensive validation system, which includes both the cutoff-dependent and –independent evaluation criteria for spatial modelling approaches, has not yet been developed for GIS based methodologies. This study, for the first time, aims to fill this gap by designing and evaluating a user-friendly model validation approach, denoted as Performance Measure Tool (PMT), and developed using freely available Python programming platform. The considered cutoff-dependent criteria include receiver operating characteristic (ROC) curve, success-rate curve (SRC) and prediction-rate curve (PRC), whereas cutoff-independent consist of twenty-one performance metrics such as efficiency, misclassification rate, false omission rate, F-score, threat score, odds ratio, etc. To test the robustness of the developed tool, we applied it to a wide variety of geo-environmental modelling approaches, especially in different countries, data, and spatial contexts around the world including, the USA (soil digital modelling), Australia (drought risk evaluation), Vietnam (landslide studies), Iran (flood studies), and Italy (gully erosion studies). The newly proposed PMT is demonstrated to be capable of analyzing a wide range of environmental modelling results, and provides inclusive performance evaluation metrics in a relatively short time and user-convenient framework whilst each of the metrics is used to address a particular aspect of the predictive model. Drawing on the inferences, a scenario-based protocol for model performance evaluation is suggested.
-
Some practical aspects of predicting texture data in digital soil mappingAlireza Amirian-Chakan, Budiman Minasny, Ruhollah Taghizadeh-Mehrjardi, and 3 more authorsSoil and Tillage Research, Nov 2019Soil texture is the most well-known composition in soil science. When separate components of the texture (sand, silt, and clay) are predicted independently in digital soil mapping (DSM), there is no guarantee that the separate estimates will sum to 100%. Log-ratio transformations before DSM modelling are alternatives to guarantee a constant sum of the estimates. Little is known about the effect of non-summing to 100% and transformations of particle-size fractions (PSFs) when DSM products were used to predict soil functional properties using pedotransfer functions (PTFs). Therefore, this study was conducted to investigate the effect of different soil texture modelling methods on the estimation of available soil water capacity (AWC) and the total amount of irrigation water (TIW) required for wheat production on a 4600 ha area in Khuzestan province, southwestern Iran. Specifically, this study aimed i) to assess the performance of random forest models (RF) to predict untransformed (UT) and transformed PSFs using environmental covariates; ii) to study the effects of three widely used log-ratio transformations including additive, centroid and isometric log-ratio transformations (alr, clr, and ilr respectively) on the estimations of AWC and TIW. A total of 150 soil samples were collected from the surface layers (0–30 cm) based on the conditioned Latin hypercube sampling (cLHS) procedure. Results indicated that, in terms of root mean square error (RMSE), RF provided similar accuracies in predicting PSFs for both untransformed and transformed data. However, transformation resulted in biased estimates. In addition, RF prediction based on untransformed data resulted in more correctly soil texture classes allocation when compared to transformed data. The spatial distribution of the sum of the predicted untransformed fractions indicated only small parts of the area conformed to the 100% sum. Almost the same accuracies for estimates of AWC were obtained when both untransformed and transformed predicted texture components were used as the inputs to PTFs. Data transformation can result in biased estimates of AWC. The findings indicated no significant difference between transformation methods in predicting AWC and TIW. The general patterns of the spatial distribution of the predicted AWCs across the whole area were the same for transformed and untransformed data (except for clr transformation).
2018
-
Assessing the effects of slope gradient and land use change on soil quality degradation through digital mapping of soil quality indices and soil loss rateK. Nabiollahi, F. Golmohamadi, R. Taghizadeh-Mehrjardi, and 2 more authorsGeoderma, May 2018Slope gradient and land use change are known to influence soil quality and the assessment of soil quality is important in determining sustainable land-use and soil-management practices. In this study, soil quality indices (SQIs) were developed by quantifying several soil properties to discriminate the effects of slope gradient and land use change on soil quality in 480km2 of agricultural land in Kurdistan Province, Iran. Three soil quality indices (SQIs) were used. Each of the soil quality indices was calculated using two linear and non-linear scoring methods and two soil indicator selection approaches, a Total Data Set (TDS) and a Minimum Data Set (MDS). Nine soil quality indicators: pH, Electrical Conductivity (EC), Organic Carbon (OC), Cation Exchange Capacity (CEC), Total Naturalized Value (TNV), Soil Erodibility (K), Porosity (P), Mean Weight Diameter (MWD), and Bulk Density (BD) and soil loss rate were measured for 110 soil samples (0–30cm depth). Soil quality indices maps were developed using digital soil mapping methods. The \textgreater10% slope class had the highest soil loss rate and highest percentage of soils with very low quality (grade V) based on all SQIs. The results showed that soil quality was better estimated using the Weighted Additive Soil Quality Index (SQIw) (r2=0.78) compared to SQIa (the Additive Soil Quality Index) and SQIn (the Nemoro Soil Quality Index). The agreement values of all SQIs for the non-linear scoring method were higher than the linear scoring method. The mean values of all SQIs and the soil loss rate were higher and lower in rangeland than cropland, respectively, but they were not significantly different because of intensive grazing. Slopes with a large gradient and where land use was converted to agriculture were characterized by low values of SQIs, suggesting a recovery of soil quality through changing to sustainable practices and abandoning over grazing in these areas.
-
Development and analysis of the Soil Water Infiltration Global databaseMehdi Rahmati, Lutz Weihermüller, Jan Vanderborght, and 126 more authorsEarth System Science Data, Jul 2018In this paper, we present and analyze a novel global database of soil infiltration measurements, the Soil Water Infiltration Global (SWIG) database. In total, 5023 infiltration curves were collected across all continents in the SWIG database. These data were either provided and quality checked by the scientists who performed the experiments or they were digitized from published articles. Data from 54 different countries were included in the database with major contributions from Iran, China, and the USA. In addition to its extensive geographical coverage, the collected infiltration curves cover research from 1976 to late 2017. Basic information on measurement location and method, soil properties, and land use was gathered along with the infiltration data, making the database valuable for the development of pedotransfer functions (PTFs) for estimating soil hydraulic properties, for the evaluation of infiltration measurement methods, and for developing and validating infiltration models. Soil textural information (clay, silt, and sand content) is available for 3842 out of 5023 infiltration measurements (∼ 76%) covering nearly all soil USDA textural classes except for the sandy clay and silt classes. Information on land use is available for 76 % of the experimental sites with agricultural land use as the dominant type (∼ 40%). We are convinced that the SWIG database will allow for a better parameterization of the infiltration process in land surface models and for testing infiltration models. All collected data and related soil characteristics are provided online in *.xlsx and *.csv formats for reference, and we add a disclaimer that the database is for public domain use only and can be copied freely by referencing it. Supplementary data are available at https://doi.org/10.1594/PANGAEA.885492 (Rahmati et al., 2018). Data quality assessment is strongly advised prior to any use of this database. Finally, we would like to encourage scientists to extend and update the SWIG database by uploading new data to it.
2017
-
Artificial bee colony feature selection algorithm combined with machine learning algorithms to predict vertical and lateral distribution of soil organic matter in South Dakota, USARuhollah Taghizadeh-Mehrjardi, Ram Neupane, Kunal Sood, and 1 more authorCarbon Management, May 2017The main purpose of this study, is to evaluate an advanced feature selection technique, artificial bee colony (ABC) algorithm; to reduce the number of auxiliary variables derived from a digital elevation model (DEM) and remotely sensed data (e.g. Landsat images). A combination of depth functions (e.g. power, logarithmic and spline) and data miner methods (artificial neural network: ANN and support vector regression: SVR) were applied for three-dimensional mapping of soil organic matter (SOM) in Big Sioux River watershed, South Dakota, USA. Unsurprisingly, the ABC feature selection algorithm indicated that remote sensing data (e.g. NDVI) are powerful predictors at soil surface, however, with the increasing soil depth, the terrain parameters (e.g. wetness index) became more relevant. Our findings from this study demonstrated that both the spatial models generally performed well. The mean R2 values calculated by 10-fold cross validation suggested that SVR and ANN models could explain approximately 50 and 57% of total SOM variability, respectively. However, predictive power of both models increased when ABC feature selection algorithm applied, particularly when it combined with the ANN model. Results showed that DSM approaches are very important and powerful tool to explain the 3D spatial distribution of SOM across the study watershed.
-
Assessment of soil quality indices for salt-affected agricultural land in Kurdistan Province, IranKamal Nabiollahi, Ruhollah Taghizadeh-Mehrjardi, Ruth Kerry, and 1 more authorEcological Indicators, Dec 2017Soil quality indices (SQIs) were an important tool for evaluating agro-ecosystems. Salinization and alkalization are major environmental problems that have threatened agricultural productivity since ancient times. The aim of this study is to assess soil quality in salt-affected agricultural land in Kurdistan Province, Iran, using three indices; the Additive Soil Quality Index (SQIa), the Weighted Additive Soil Quality Index (SQIw), and the Nemoro Soil Quality Index (SQIn). Each of the soil quality indices were calculated using a Total Data Set (TDS) and a Minimum Data Set (MDS) approach. The TDS consisted of nine soil quality parameters measured on 150 samples (0–30 cm depth): pH, Electrical Conductivity (EC), Organic Carbon (OC), Cation Exchange Capacity (CEC), Carbonate Calcium Equivalent (CCE), Exchangeable Sodium Percentage (ESP), Sodium Adsorption Ratio (SAR), Mean Weight Diameter (MWD), and Bulk Density (BD). Principal components analysis (PCA) was used to determine which indicators were to be included in the MDS. Indicator Kriging (IK) highlighted areas with a high risk of exceeding critical threshold values of EC, ESP, and SAR and having low soil quality. In non-salt-affected areas soil quality and the risk of exceeding critical threshold values and having low soil quality were lower and higher, respectively, compared to salt-affected regions. The MDS method showed a decrease in the area and proportion of grades with high and very high quality (I and II) and an increase in grades with low and very low quality (IV and V) compared to the TDS. The results of linear correlation, match, and kappa statistic analysis showed that soil quality was better estimated using the SQIw compared to the SQIa and the SQIn. In addition there were higher values of agreement (match and kappa statistic) for the TSD than MSD. However, using the SQIw index and MDS method can adequately represent the TDS (R2 = 0.82) and thus reduce the time and cost involved in evaluating soil quality.
2016
-
Digital mapping of soil organic carbon at multiple depths using different data mining techniques in Baneh region, IranR. Taghizadeh-Mehrjardi, K. Nabiollahi, and R. KerryGeoderma, 2016 -
Using the nonparametric \textitk-nearest neighbor approach for predicting cation exchange capacityA. A. Zolfaghari, R. Taghizadeh-Mehrjardi, A. R. Moshki, and 4 more authorsGeoderma, Mar 2016The objectives of this study were to apply a k-NN approach to predict CEC in Iranian soils and compare this approach with the popular artificial neural network model (ANN). In this study, a data set of 3420 soil samples from different parts of Iran was used. Two different sets of cheaper-to-measure soil attributes were selected as potential predictors. The first set consisted of clay, silt, sand and organic carbon (OC) contents. The second data set was constructed using OC and clay contents. Two ‘design-parameter’ parameters should be optimized before application of the k-NN approach. Results showed that the algorithm efficiency is not dependent on these parameters. A wide range of suboptimal values around the optimal values may cause a slight error in terms of estimation accuracy. However, the optimal settings of the design-parameters depend on the size of the development/reference data set. In both k-NN and ANN models, the higher number of input variables can relatively improve the estimation of CEC. But this improvement was not statistically significant at the 0.05 level. Furthermore, the results showed that increasing the size of the reference data set to a certain amount (N=1200) reduced the estimation error significantly in terms of root-mean-squared residuals (RMSE). However, no significant difference in the accuracy of k-NN and ANN methods was detected in the reference data set sizes for N\textgreater1200. Results showed no significant difference between this approach and ANN models, suggesting the competitive advantage of the k-NN technique over other techniques to develop pedotransfer functions (PTFs), for example, the redevelopment of PTFs is not necessarily required as new data become available.
-
Probabilistic inversion of EM38 data for 3D soil mapping in central IranDavood Moghadas, Ruhollah Taghizadeh-Mehrjardi, and John TriantafilisGeoderma Regional, Jun 2016Accurate determination of near surface soil electrical properties is important for agricultural and environmental management. In this respect, low frequency electromagnetic induction (EMI) has been widely used to measure soil apparent electrical conductivity. However, the potential to model soil subsurface layering has not been fully realized using EMI data. In this paper, we applied a probabilistic optimization approach, namely DREAM(ZS), on Geonics EM38 data to explore the robustness of this approach for soil subsurface conductivity mapping. The EM38 data was measured from the Ardakan region in the province of Yazd located in central Iran. Several soil samples were taken and were further analyzed in a laboratory to derive soil textural and physical parameters. The probabilistic inversion was performed in a joint multi-configuration framework considering a five layered model. The estimated values are mainly in agreement with the clay map (as the most influential factor); nevertheless, soil salinity data and inversely estimated conductivity values are poorly correlated for deeper layers due to the aridic condition and high clay content in the study area. DREAM(ZS) optimization approach appears to be promising for accurate retrieval of soil conductivity depth profile from EM38 data.
-
Predicting and mapping of soil particle-size fractions with adaptive neuro-fuzzy inference and ant colony optimization in central IranR. Taghizadeh-mehrjardi, N. Toomanian, A. R. Khavaninzadeh, and 2 more authorsEuropean Journal of Soil Science, 2016In arid regions, knowledge of the variation in soil texture is crucial for land management because it affects soil physical, chemical, biological and most importantly hydrological properties. The availability of information on soil texture is scarce even though it is required to support land-use management and sustainable development. Because it is costly to obtain information about the individual particle-size fractions (PSFs), we used digital soil mapping methods (DSM) with environmental covariates that are less costly to obtain. Specifically, we explored the use of a digital elevation model and remote sensing data as environmental covariates to predict the vertical (i.e. 0–0.15, 0.15–0.3, 0.3–0.6 and 0.6–1 m) and lateral variation in PSFs over a 150-km2 area in central Iran. We used a combination of equal-area spline depth functions and three data-mining techniques: multiple linear regression (MLR), artificial neural networks (ANN) and the neuro-fuzzy inference system (ANFIS). In addition, we explored the effect of the reduction in dimension of feature space with ant colony optimization (ACO) and correlation-based feature selection (CFS) on the accuracy of prediction of spatial models for each PSF. The results showed that the prediction of clay at 0–0.15-m depth with ACO indicated the importance of including Landsat ETM+, the digital numbers of band 7 of Landsat images (B7) and clay index, whereas at 0.60–1-m depth the wetness index and multi-resolution valley bottom flatness index (MRVBF) were important. Model evaluation by leave-one-out cross-validation with 191 soil observations indicated that the predictions by the ACO-based ANFIS model (RMSE = 4.51% and R2 = 0.74 for clay at 0–0.15-m) were more accurate than those by MLR and ANN. Spatial prediction was also better for the topsoil (0–0.15-m) than at depth (RMSE = 7.1% for clay at 0.6–1 m); therefore, we conclude that the environmental covariates tested cannot resolve subsurface variation as accurately. Nevertheless, we recommend prediction by the ACO-based ANFIS model and splines of lateral and vertical distribution of PSFs in other arid regions of Iran with the same agro-ecological conditions. Highlights Digital soil mapping of particle size-fractions (PSF) by adaptive neuro-fuzzy inference and ant colony optimization. Use of ant colony optimization (ACO) to assist in feature selection of environmental covariates. Neuro-fuzzy inference system (ANFIS) superior to multiple linear regression (MLR) and artificial neural networks (ANN). PSF prediction by ACO-based ANFIS model and splines is optimal.
2015
-
Comparing data mining classifiers to predict spatial distribution of USDA-family soil groups in Baneh region, IranR. Taghizadeh-Mehrjardi, K. Nabiollahi, B. Minasny, and 1 more authorGeoderma, Sep 2015Digital soil mapping involves the use of auxiliary data to assist in the mapping of soil classes. In this research, we investigate the predictive power of 6 data mining classifiers, namely Logistic regression (LR), artificial neural network (ANN), support vector machine (SVM), K-nearest neighbour (KNN), random forest (RF), and decision tree model (DTM) to create a DSM across an area covering of 3000ha in Kurdistan Province, north-west Iran. In this area, using the conditioned Latin hypercube sampling method, 217 soil profiles were selected, sampled, analysed and allocated to taxonomic classes according to Soil Taxonomy up to family level. To test the user accuracy (UA) we established a calibration and validation set (70:30%). Of the 5 soil family classes we map, the highest overall accuracy (0.71) and kappa index (0.69) are achieved using the DTA and ANN method. More specifically, the UA of prediction was up to 18.33% better in comparison to LR. Moreover, our results showed that no improvement was obtained in prediction accuracy of DTA algorithm with minimizing taxonomic distance compared to minimizing misclassification error (0.71). Overall, our results suggest that the developed methodology could be used to predict soil classes in the other regions of Iran.
-
Modeling Soil Salinity along a Hillslope in Iran by Inversion of EM38 DataJ. Huang, R. Taghizadeh-Mehrjardi, B. Minasny, and 1 more authorSoil Science Society of America Journal, Jul 2015Electromagnetic (EM) induction has been used to characterize the spatial distribution of salinity. However, most studies have been undertaken to map the areal distribution of the average profile salinity using measurements of the apparent electrical conductivity (ECa, mS m−1). In this study, an EM38 was used to map the distribution of salinity with depth along a 26-km hillslope in central Iran. We generated electromagnetic conductivity images by inverting EM38 ECa data collected at various heights in the EM4Soil software. A number of parameters including forward modeling (cumulative function, CF, and full solution, FS), inversion algorithms (S1 and S2), damping factor (λ), and combinations of different heights were considered to generate calculated soil true electrical conductivity (σ, mS m−1). By comparing different σ against the electrical conductivity of a saturated soil-paste extract (ECe, dS m−1) at various depths, we found that the strongest correlation and smallest modeling error was achieved using the FS, S1, λ = 12, and ECa data collected at 0.4 m alone. We then compared the results achieved by developing a linear regression between σ and ECe at various depths with those achieved using multiple linear regression (MLR) established between ECa and ECe. The inversion method was less time consuming and more robust than the MLR approach. The predicted ECe increased from the crest to the base of the toposequence. The results were consistent with the underlying geology, climate, and local topography. The methodology can be used as guidance for baseline salinity monitoring and management.
2014
-
Digital mapping of soil salinity in Ardakan region, central IranR. Taghizadeh-Mehrjardi, B. Minasny, F. Sarmadian, and 1 more authorGeoderma, Jan 2014Salinization and alkalinization are the most important land degradation processes in central Iran. In this study we modelled the vertical and lateral variation of soil salinity (measured as electrical conductivity in saturation paste, ECe) using a combination of regression tree analysis and equal-area smoothing splines in a 72,000ha area located in central Iran. Using the conditioned Latin hypercube sampling method, 173 soil profiles were sampled from the study area, and then analysed for ECe and other soil properties. Auxiliary data used in this study to represent predictive soil forming factors were terrain attributes (derived from a digital elevation model), Landsat 7 ETM+ data, apparent electrical conductivity (ECa)—measured using an electromagnetic induction instrument (EMI), and a geomorphologic surfaces map. To derive the relationships between ECe (from soil surface to 1m) and the auxiliary data, regression tree analysis was applied. In general, results showed that the ECa surfaces are the most powerful predictors for ECe at three depth intervals (i.e. 0–15, 15–30 and 30–60cm). In the 60–100cm depth interval, topographic wetness index was the most important parameter used in regression tree model. Validation of the predictive models at each depth interval resulted in R2 values ranging from 78% (0–15cm) to 11% (60–100cm). Thus we can recommend similar applications of this technique could be used for mapping soil salinity in other parts in Iran.
2012
-
Estimating Mass Fractal Dimension of Soil Using Artificial Neural Networks for Improved Prediction of Water Retention CurveBehzad Ghanbarian-Alavijeh, Ruhollah Taghizadeh-Mehrjardi, and Guanhua HuangSoil Science, Aug 2012Fractal geometry appears to be a useful tool to simulate a porous medium that can be quantified by scaling exponent(s), which is a fractal dimension(s). The objective of this study was to estimate the mass fractal dimension of the Rieu and Sposito (RS) model from readily available parameters, such as clay, silt, and sand contents; geometric mean diameter and geometric S.D. of soil particles; and total soil porosity by developing an artificial neural network (ANN) model. Two databases with a total of 190 soil samples of 12 soil texture classes were used to develop and validate the ANN model. To determine the mass fractal dimension, the RS model was fitted to measured soil-water retention data. A sensitivity analysis was also performed on the RS model parameters. The results of sensitivity analysis showed that the most sensitive parameter of the RS model is the mass fractal dimension, whereas this model is less sensitive to air entry value and soil porosity. We used the cross-validation technique, for example, repeated random splitting of the data set into subsets for the development and validation processes of the ANN model. To evaluate the developed ANN model, the estimated mass fractal dimension, measured soil porosity, and air entry value combined with the RS model were consequently used to determine soil-water content corresponding to each prescribed tension head. Results showed that the developed ANN model estimated the soil-water retention curve accurately.
-
Digital soil mapping of soil classes using decision trees in central Iran: R. Taghizadeh-Mehrjardi B. Minasny & A.B. McBratney J. TriantafilisR. Taghizadeh-Mehrjardi, B. Minasny, A. B. McBratney, and 2 more authorsIn Digital Soil Assessments and Beyond, 2012Num Pages: 6Digital soil mapping of soil classes using decision trees in central Iran - 1 - R. Taghizadeh-Mehrjardi B. Minasny & A.B. McBratney J. Triantafilis