Posted in

RF, XGBoost and NGBoost Machine Learning Approaches for Landslide Mapping: Insights from Trabzon Province, Turkey

Landslides are among the most disastrous natural events, causing significant loss of life and property. According to a World Bank report, approximately 66 million people live in areas with a high probability of landslides, while around 300 million are exposed to the risks. Therefore, it is essential to identify landslide-prone zones and the premonitory elements that contribute to mass movement activities.

Landslide susceptibility maps (LSMs) are an effective tool to address this challenge, as they illustrate the
geospatial distribution of both potential landslide and non-landslide areas. These maps provide critical information to decision-makers and public enterprises for planning infrastructure and investments. Furthermore, LSMs are considered a fundamental component of integrated emergency management. With the help of these maps, idle agricultural lands can be reactivated, and at-risk man-made structures, such as roads and bridges, can be detected in advance. This proactive approach can help prevent the adverse impacts of landslides.

Advancements in technology, particularly the integration of remote sensing and Geographic Information
Systems (GIS) with artificial intelligence, have significantly enhanced the capabilities of researchers in assessing landslide risks. A variety of machine learning techniques, including Support Vector Machines (SVM), Artificial Neural Networks (ANN), Logistic Regression (LR), and Naive Bayes, have been employed in previous studies to create Landslide Susceptibility Models (LSMs). Notably, the research conducted by Taskin Kavzoglu et al., from the Department of Geomatics Engineering at Gebze Technical University in Turkey has leveraged ensemble machine learning models such as XGBoost, NGBoost, and Random Forest (RF) to model landslide susceptibility in the Macka County of Trabzon province,
Turkey.

Turkey is particularly susceptible to landslides, especially following seismic events. Approximately 25% of residential units are impacted by mass movement phenomena. The economic repercussions of these landslides are substantial, amounting to an estimated US$80 million annually. This vulnerability is especially pronounced in the northeastern region, which is characterized by specific climatic conditions, diverse geological structures, and a mountainous landscape. As a result, there is a continuous push for research aimed at understanding and mitigating landslide susceptibility in this critical area.

The study was conducted in Trabzon Province, Turkey, which is located between latitudes 40° 55 N and 40° 36 N and longitudes 39° 19 E and 39° 47 E. This region has a mild and humid climate and an annual rainfall of approximately 2200 mm. The terrain is predominantly rugged and hilly, with elevations ranging from 120 to 2670 meters and slopes reaching up to 65°.

Geologically, Trabzon Province features carbonate rocks from the Late Jurassic to Early Cretaceous era, overlain by volcanic and magmatic rocks from the Late Cretaceous to Paleocene era. These materials primarily include basalt, andesite, dacite, granite, granodiorite, and diorite. The youngest volcanic products in the area are Eocene-aged rocks, alongside Quaternary alluvial deposits in certain locations. The principal geological unit within the region is Cru1, which is primarily composed of pyroclastic materials and volcanic rock.

The study employed twelve conditioning factors related to landslides: aspect, drainage density, elevation, lithology, the normalized difference vegetation index (NDVI), plan curvature, profile curvature, distance to rivers, road density, distance to roads, slope, and the topographic position index (TPI). All predisposing factor maps were generated using SAGA GIS version 2.3.2 and ArcGIS version 10.5. The landslide data consisted of polygon features representing 34 landslide zones, sourced from the General Directorate of Mineral Research and Exploration in Turkey. To ensure data balance, 38 non-landslide polygons were also created. The dataset was divided into 70% for training and 30% for testing purposes.

Before conducting the analysis, an assessment of feature importance was performed utilizing Information Gain (IG) and examining multicollinearity. The information gain ratio (IG) was employed to evaluate the significance of various factors. The findings indicate that elevation, with a score of 0.666, is the most significant predisposing factor for landslides. It is followed by slope (0.608), road density (0.315), drainage density (0.246), lithology (0.184), distance to rivers (0.181), distance to roads (0.133), aspect (0.119), profile curvature (0.106), topographic position index (TPI) (0.074), and plan curvature (0.060).
Conversely, the Normalized Difference Vegetation Index (NDVI), with a score of 0.053, was identified as the least significant predisposing factor for landslides.

Multicollinearity analysis was conducted to determine the correlation between two or more independent
variables. To address this issue, both the tolerance (TOL) and variance inflation factor (VIF) measures, along with correlation analysis, were employed. A VIF value >  10 or a TOL value <  0.1 indicates a critical level of multicollinearity. The results revealed that the maximum VIF recorded was 3.385 for Terrain Position Index (TPI), while the lowest TOL was 0.263 for drainage density. Accordingly, all selected variables are deemed suitable for further analysis.

Furthermore, Pearson’s correlation matrix was utilized to assess relationships between variables. A maximum correlation value of 0.7 signifies potential concerns for multicollinearity. The highest correlation observed was 0.64 between TPI and plan curvature. Since this value is below the threshold, all factors are considered appropriate for ongoing analysis.

The study evaluates model performance using Area Under the Curve (AUC) scores as the primary metric for comparison. The results indicate that NGBoost achieved the highest AUC score of 0.898, surpassing the performance of other models, specifically XGBoost with an AUC of 0.871 and Random Forest (RF) with an AUC of 0.863. To assess the predictive significance among the models, McNemar’s test was employed. A threshold value greater than 3.84 signifies statistically significant differences, while a value below this threshold indicates statistical insignificance. The results of McNemar’s test revealed a value of 0.35 between RF and XGBoost, suggesting similar performance levels. Conversely, values of 7.19 and 4.39 when comparing NGBoost to RF and XGBoost, respectively, demonstrated significant differences in performance. Thus, it can be concluded that NGBoost’s performance is statistically distinct from both RF and XGBoost.

The SHAP technique was utilized to analyze the outputs generated by the model in a more interpretable manner. The interpretation process, facilitated by the SHAP approach, clearly demonstrated that slope, elevation, and road density significantly influenced the model’s performance. In contrast, plan curvature, TPI, and profile curvature exhibited a minimal impact on the results. This observation aligns well with the findings from the IG feature selection analysis.

In conclusion, the study of landslide susceptibility in Trabzon Province, Turkey, underscores the critical
importance of identifying and analyzing predisposing factors that contribute to mass movements. The development of Landslide Susceptibility Maps (LSMs) serves as a vital tool for local authorities and decision-makers, allowing for informed planning and risk mitigation strategies. The research highlights that elevation emerges as the predominant factor influencing landslide occurrence, followed closely by slope and road density.

 

Methodology
(Source:https://doi.org/10.1007/s13369-022-06560-8)
 

 

LSM by NGBoost
(Source:https://doi.org/10.1007/s13369-022-06560-8)
 

 

Reference

Kavzoglu, T., Teke, A. Predictive Performances of Ensemble Machine Learning Algorithms in Landslide Susceptibility Mapping Using Random Forest, Extreme Gradient Boosting (XGBoost) and Natural Gradient Boosting (NGBoost). Arab J Sci Eng 47, 7367–7385 (2022). https://doi.org/10.1007/s13369-022-06560-8

Share

Leave a Reply

Your email address will not be published. Required fields are marked *