Landslide susceptibility mapping on global scale using method of logistic regression

This paper proposes a statistical model for mapping global landslide susceptibility based on logistic regression. After investigating explanatory factors for landslides in the existing literature, five factors were selected to model landslide susceptibility: relative relief, extreme precipitation, lithology, ground motion and soil moisture. When building model, 70 % of landslide and non-landslide points were randomly selected for logistic regression, and the others were used for model validation. For evaluating the accuracy of predictive models, this paper adopts several criteria including receiver operating characteristic (ROC) curve method. Logistic regression experiments found all five factors to be significant in explaining landslide occurrence on global scale. During the modeling process, percentage correct in confusion matrix of landslide classification was approximately 80 % and the area under the curve (AUC) was nearly 0.87. During the validation process, the above statistics were about 81 % and 0.88, respectively. Such result indicates that the model has strong robustness and stable performance. This model found that at a global scale, soil moisture can be dominant in the occurrence of landslides and topographic factor may be secondary.


Introduction
Landslides are a pervasive natural hazard, causing significant casualties and economic loss around the world (Budimir et al., 2015).Major news websites and online blogs from experts (such as The Landslide Blog, a thematic blog main-tained by Prof. Dave Petley at the University of East Anglia) show that landslides almost occur every day.It is important and necessary to find out where the global landslide hotspot areas are and what factors can influence the occurrence of landslides.Such information would provide a crucial reference for researchers and decision makers in industries like insurance for and project managers in some nongovernmental organizations (NGO).For international and national insurance or reinsurance companies, such a map will provide them with clear knowledge of landslide hotspots at a macro level, which will help them concentrate on those susceptible areas and form relevant marketing strategies like transferring risks (Bednarik et al., 2010).Geographers could also find it useful for revealing spatial patterns of landslide distribution.To answer these questions, studies of global landslide susceptibility are required.Such research will help give a global perspective on landslides, which may encourage international cooperation for disaster risk reduction.
At present, research methods for landslide susceptibility mapping can be divided into three major categories: qualitative factor overlay, statistical models and geotechnical process models (Dai and Lee, 2002).Generally, geotechnical process methods are developed from slope stability analyses and are applicable for site-specific landslides or when the ground conditions are quite uniform in the study area.Also, this method requires the landslide types to be known and relatively easy to analyze (Terlien et al., 1995;Wu and Sidle, 1995), and hence it is seldom used in large-scale landslide susceptibility mapping.In qualitative methods, landslide experts select landslide controlling factors and com-L.Lin et al.: Landslide susceptibility mapping on global scale bine these factors into a susceptibility map, based on their knowledge and experience of landslide investigation.(Anbalagan, 1992;Pachauri and Pant, 1992).In contrast, statistical methods include statistical determination in combinations of explanatory factors (Carrara et al., 1991;Dhakal et al., 1999).Among these three types of methodologies, the latter two are widely applied to large-scale landslide susceptibility mapping.Relatively, reproducibility of results and subjectivity in landslide modeling can be the apparent disadvantages of the method of qualitative factor overlay.In recent times, large volumes of landslide inventories and multi-source data of landslide factors have become gradually accessible to researchers and that means statistical methods are frequently used in landslide susceptibility mapping.
In statistical methods, logistic regression models have been frequently used in geological hazard research and employed to explore the factors that influence landslides and determine landslide probability (Ayalew and Yamagishi, 2005;Van Den Eeckhaut et al., 2006).Compared with other statistical approaches, Brenning (2005) found that logistic regression models have a relatively low rate of error.Logistic regression can include dichotomous dependent variables (e.g, whether a landslide occurred) and independent variables, as well as categorical or continuous variables (Chang et al., 2007;Atkinson and Massari, 1998).The fact that landslide explanatory factors can be included in the model as either categorical or continuous variables gives logistic regression models a great advantage over multiple regression models, which can only include continuous variables.Finally, logistic regression models can be used to draw susceptibility maps when combined with GIS (Lee, 2005;Bai et al., 2010).
A landslide inventory provides the basis for quantitative zoning of landslide susceptibility.Location, date, type, size, causal factors and damage are supposed to be included in this database.A commonly used landslide inventory does not yet appear but some regional or national landslide databases are now well developed.In Europe, currently 22 out of 37 contacted countries have national landslide databases, and six other countries only have regional landslide databases.Those national databases contain about 633 700 landslides in total, of which about 75 % are in Italy, and more than 10 000 landslides are in Austria, the Czech Republic, France, Norway, Poland, Slovakia, and the UK.In these 37 European countries, only six have sufficient information to perform risk analysis and one to perform a hazard analysis, while 14 countries can carry out at least a susceptibility analysis.Therefore, at a continental scale, landslide zoning seems to be limited to landslide susceptibility modeling only.Restricted access to the data also make it difficult for these data to be applied in scientific research (Van Den Eeckhaut and Hervás, 2012).
In the existing literature, there are a few studies of landslide susceptibility that were carried out on a global scale; those that exist mainly used qualitative or semi-qualitative methodologies.For example, Mora and Vahrson (1994) proposed a method for assessing landslide susceptibility in trop-ical earthquake-prone areas that included three fundamental factors (slope, soil moisture, and lithology) and two triggering factors (extreme precipitation and ground motion).Nadim et al. (2006) applied the research of Mora and Vahrson (1994) to assess global landslide susceptibility and risk.Hong et al. (2007) selected six influencing factors (slope, elevation, soil type, soil texture, land cover type and drainage density) in the model of weighted linear combination (WLC).To obtain an optimal combination of weights, they tried different combinations of factor weights to make the model results similar to the existing landslide susceptibility map of the USA.Finally, they drew a global landslide susceptibility map using the weights combination obtained above.Some scholars have also attempted to study global landslides with statistical methods.Farahmand and AghaKouchak (2013) used a global landslide inventory compiled by the National Aeronautics and Space Administration (NASA) to build a global landslide susceptibility model based on the use of a support vector machine (SVM), which includes three variables, satellite-sensed precipitation, digital elevation model (DEM) and land cover type.Compared with some complex numerical methods like SVM, logistic regression provides a simple method to produce a global landslide susceptibility map, which would be helpful in disseminating this research and could encourage further model development for its simplicity in modeling.What is more, the result from logistic regression could illustrate the relative importance of different factors in explaining landslides, which could not be achieved by some numerical methods like SVM.
This paper addresses the gap in creating global landslide susceptibility maps using the widely used statistical method, logistic regression, and demonstrating the relative significance of different explanatory factors on a global scale.In this paper, a global landslide inventory database is constructed and used for building a stepwise logistic regression model to evaluate global landslide susceptibility.Finally, a global landslide susceptibility map that visualizes this model is produced.In the landslide susceptibility model, five factors (extreme precipitation, soil moisture, lithology, relative relief and ground motion) are included as explanatory factors in stepwise logistic regression.In total, 70 % of landslide and nonlandslide events are randomly selected for logistic regression and the rest are used for model validation.It is found that this model has a good explanatory power and performs well in model prediction.Landslide explanatory factors and the extent to which these factors influence landslide occurrence can be derived from model results directly without expert experience, which is rare in statistical assessments of global landslide susceptibility.
planatory factors from previous work (Table 1) fall into seven general categories, including topography, geology, hydrology, soil, precipitation, land cover and ground motion.Generally speaking, explanatory factors for landslides can be divided into fundamental factors and triggering factors (Nadim et al., 2006).Fundamental factors include environmental conditions that generate the potential of landslide occurrence, such as topography, lithology and soil.Triggering factors explain direct effects that drive slope instability, such as ground motion and extreme precipitation.In the existing literature, the combination of trigger and susceptibility can influence landslide hazard level (Nadim et al., 2006).However, landslide models without landslide information like time and magnitude (like size, speed, kinetic energy or momentum of mass) cannot be correctly defined as hazard models (Guzzetti et al., 1999).Hence, in this paper, both fundamental factors and triggering factors are included to evaluate landslide susceptibility.
In existing studies of landslides at a regional scale, topography is regarded as a powerful explanatory factor for the occurrence of landslides (Dai and Lee, 2002;Lee and Min, 2001), and it is also demonstrated at a global scale (Hong et al., 2007).For most studies, topography includes relief characteristics such as elevation, slope gradient and slope aspect.At a global scale, factors such as elevation and slope gradient can be replaced by topographic index or relative relief, which indicate macroscopic differences in topography.Especially for landslide data with low location precision, using factors such as elevation or slope gradient that precisely relate to landslide location will reduce the accuracy of landslide susceptibility analysis (Farahmand and AghaKouchak, 2013).Therefore, a general factor such as relative relief is more appropriate, and in this paper, relative relief is used to represent topography.Relative relief is defined as the difference between maximum and minimum elevation values within an area (Chauhan et al., 2010).Relative relief has been shown to be an important explanatory factor, and landslide occurrence is generally higher in high relative relief areas (Anbalagan, 1992).
For geology, attributes like rock age and rock type can be chosen, with data mainly coming from small regional geological surveys and field studies.Studies of global landslide susceptibility have shown that lithology is a fundamental factor (Nadim et al., 2006).Landslides are more likely to occur in some rocks formed relatively later with lower intensity and are less likely in rocks formed relatively earlier with sufficient solidification and high intensity.Hence the factor of lithology is included in the landslide model.
The water condition of the land surface also affects landslides.With the development of large data-sharing frameworks for meteorological data, precipitation information is easily available and hence frequently used in landslide analysis (Farahmand and AghaKouchak, 2013).However, as Nadim et al. (2006) propose, soil moisture can also be a L. Lin et al.: Landslide susceptibility mapping on global scale proxy of the water condition for it represents the average moisture condition of the soil.Compared with mean annual precipitation, it can avoid the interruption of extreme precipitation, which can objectively reflect the possibility of slope instability in the long term and can be taken as fundamental factor of landslide occurrence.Farahmand and AghaKouchak (2013) also recommend the use of soil moisture data in studies of global landslide susceptibility.Therefore, soil moisture as an explanatory factor is adopted in this paper.
Ground motion and extreme precipitation are always analyzed as triggering factors of landslides, using data from field surveys and monitoring observations.Landslides are generally triggered by earthquakes or by heavy precipitation.Strong ground motion during earthquakes causes rocks to rupture, thus inducing landslides.As for rainfall, rain and/or meltwater that reaches the ground surface infiltrates into the ground and forms groundwater.During this process, the pressure of the water that fills the void spaces between soil particles and rock fissures rises when the amount of water infiltrating into the ground increases.A rise in pore-water pressure causes a drop in effective stress, affecting the stability of a slope, and thus is a major cause of landslides and other sediment-related disasters (Matsuura et al., 2008).Intense rainfall is believed to be a cause of shallow landslides (Caine, 1980).Current studies of landslides consider ground motion and extreme precipitation as triggering factors (Umar et al., 2014;Nowicki et al., 2014;Nadim et al., 2006).Therefore, in this paper, ground motion and monthly extreme precipitation are used as triggering factors.In summary, this paper uses relative relief, soil moisture, lithology, monthly extreme precipitation and PGA as explanatory factors for global-scale landslide susceptibility.The first three are fundamental factors, and the last two are triggering factors.
3 Methodology and data

Study area
This paper considers global continental areas from 72 • N to 72 • S, excluding Greenland and the Antarctic continent.Because this research is specific to terrestrial landslides, oceans and areas covered by glaciers or ice sheets are excluded.The scope of this paper is also limited by data coverage for explanatory factors.As the coverage area of lithology is from 72 • N to 72 • S, the final susceptibility map is limited to this boundary.

Logistic regression model
What is more, logistic regression models are commonly fitted in a stepwise manner (Budimir et al., 2015).The general form of a logistic regression model is as follows: (1) In Eq. ( 1), y is the dependent variable that reflects landslide occurrence, x i is the independent variable related to explanatory factors, β 0 is a constant, β i is the regression coefficient for the explanatory factors, and e is the random error.The probability p of the dependent variable y can be expressed as follows in Eq. ( 2): . (2)

Independent variables
In this paper, explanatory factors are put into stepwise logistic regression model as independent variables.All layer data of these explanatory factors are converted to the WGS 1984 geographical coordinate system.Original resolution of factors is reserved as simple resampling cannot make a real contribution to the accuracy and precision of information provided in the layers.Topographic data come from GTOPO30 (USGS, 2012), which is a global elevation data set from the Earth Resources Observation and Science (EROS) Center.Its spatial resolution is 30 arcsec (approximately 1 km), and it covers the earth surface from 90 • N to 90 • S and 180 • E to 180 • W. After obtaining the data, relative relief is calculated by a moving window method in ArcGIS with window size of 0.5 arcdeg.From the existing literature, there are few statements about the proper classification method of relative relief.Relative relief is hence divided into 10 types with successive ordinal values from 1 to 10, using the natural breaks method of classification (Table 2).
Lithology data come from a geological map of the world at a 1 : 25 000 000 scale (the third version) published by the Commission for the Geological Map of the World (CGMW, 2010) and UNESCO.In the Mercator projection, the northern and southern boundaries of this map are set as 72 • N and 72 • S. As a consequence, a large extent of the Antarctic continental coastline is visible with a better delimitation of the Southern Ocean.The southern half of Greenland is also visible (Bouysse, 2010).The lithology data are rasterized with a spatial resolution of 0.01 • .Following Nadim et al. (2006), global lithology data can be divided into six categories (Table 2).The spatial resolution of 0.01 • was used because the primary electronic map is vector-based.Its information can be greatly reserved by using small-scale raster when converted into raster map, and a small-scale raster can fit the coastline well.
In this paper, the soil moisture index is used to represent the local soil humidity level.With data products from the Center for Climatic Research at the University of Delaware, Willmott and Feddema (1992) proposed a new soil moisture index.In this index, soil moisture was normalized to a range from −1.0 to 1.0 with a spatial resolution of 0.5 • .Nadim et al. (2006) classified soil moisture data into levels from 1 to 5 (Table 2), with higher values indicating greater humidity.Monthly extreme precipitation with a repeat period of 100 years is calculated using historical precipitation grid data over 50 years (from 1961 to 2010) from the GPCC Full Data Reanalysis (Schneider et al., 2011).As no typical classification method for extreme precipitation exists in the literature, these precipitation data are divided into 10 levels (Table 2) with a spatial resolution of 0.5 • , according to the natural breaks in the classification method.
For ground motion, PGA with an exceedance probability of 10 % over 50 years is included (that is, a repeat period of 475 years).Data are from the global seismic hazard map created by the Global Seismic Hazard Assessment Program (GSHAP) of the International Lithosphere Program (ILP).The map shows PGA with an exceedance probability of 10 % over 50 years and a spatial resolution of 0.1 • (Giardini et al., 2003).Based on the methodology of Nadim et al. (2006), PGA can be divided into 10 levels (Table 2), with higher values denoting greater seismic hazard.

Dependent variables
The dependent variables that enter the model are global landslide inventory data and simulated nonlandslide data.; other online regional and national newspaper articles and media sources.The best resolution of the NASA global landslide inventory is 2 km.The items in the World Geological Hazard Inventory were collected manually from news reports (e.g, mass media in China, Xinhua News, and Sina News) and records in books and journals (e.g, Galli andGuzzetti, 2007 andGao, 1999).We searched information about the landslide on the internet by using keywords like landslide and debris flow.Then we read these descriptions carefully to determine whether it was a landslide, located it, and later put it into the database.Thus the main source of World Geological Hazard Inventory can be news data.By investigating these news data, we can find those landslides that are of large volume or of high danger, for these kinds of landslides can be of high news value.A large range of literature, not only reviewed academic books and journals but also newspaper and local chronicles, was included to serve as the information sources so as to investigate geological hazards which happened long time ago or in remote areas.Rich information sources can provide as many landslides as needed to reduce the uncertainty brought by the limited landslide database.The best resolution of the World Geological Hazard Inventory is 0.001 • , about 100 m.Two teams were assigned to develop and maintain this inventory.One team (about 10 persons) was responsible for collecting information from the literature and the other team (about four persons) was expected to check and review the items collected for data quality control.When combining these two databases, the time of occurrence provides a crucial standard.When two landslide events have different times (months), they are both reserved in the new database.If two events have the same occurrence time and their locations are close, investigation through details in source could determine whether they are from the same disaster.If yes, the record with higher spatial resolution is reserved and the one with lower resolution is dropped.An example of this inventory can be found in Table 3.
In the World Geological Hazard Inventory, the earliest event can be dated to 1618.In this database, there are 117 landslides before 1975, 84 between 1975 and 2000, and 274 between 2000 and 2014.The landslide events in the NASA global landslide inventory mainly happened in 2003, 2007, 2008 and 2009.Hence these two databases are complementary and they can be merged to produce a more complete landslide database.In all, the combined database stores landslide information like hazard type, occurrence time, loca-  tion (including geographical coordinates and locating precision), fatalities and data sources.Currently, this database contains 2005 landslides; their locations are shown in Fig. 1.This combined database includes landslides (debris slides, rotational slides, and slumps) and debris flows, following the landslide classification of Varnes (1984) and Cruden and Varnes (1996).
In order to demonstrate the representative of landslide data used in this research, the landslide overlay in Europe of this research is compared with the spatial distribution of landslides in the study of Van Den Eeckhaut et al. (2012).As shown in Fig. 2, it is found that the spatial overlay of landslide samples in the research of European landslide susceptibility modeling is quite similar to that of the combined landslide database in this research.It is estimated that there is about 60 % agreement between these two landslide distributions in general.The landslides in Europe are mainly distributed in mountainous areas like the Alps and the Balkan.
Nonlandslide events come from generating random points.Because landslide location accuracy is approximately 0.25 • , a buffer zone is created around the existing landslide points with a radius of 0.25 • to represent the location range of each landslide event.The buffer zone is then removed from the global continent area and the other part of the global continent forms a potential nonlandslide area.The quantity of nonlandslide points should be carefully considered.Most studies use an equal number of landslide points and nonlandslide points (Dai and Lee, 2002;Kawabata and Bandibas, 2009;Chau and Chan, 2005;Costanzo et al., 2014;Regmi et al., 2014;Mathew et al., 2009).However, a few authors prefer an unequal number (Van Den Eeckhaut et al., 2012;Felicisimo et al., 2013).For example, Van Den Eeckhaut et al. ( 2006) use 5 times as many nonlandslide cells as landslide cells, and Farahmand and AghaKouchak (2013) use 10 times as many nonlandslide cells as landslide cells.In order to carry out a sensitivity test on the landslide susceptibility model in the paper and also reduce the uncertainty included by random nonlandslide, five nonlandslide sets which each had an equal number as landslides were created using random sampling without replacement.To validate the landslide model, the method of splitting data sets is applied (Van Den Eeckhaut et al., 2012).For each data set, 70 % of landslides and nonlandslides are randomly selected for modeling, and the remaining 30 % are used for validation.A confusion matrix and Akaike's information criterion value (AIC) (Allison, 2001;Van Den Eeckhaut et al., 2006) are applied to assess model performance.In addition, this paper also adopts a receiver operating characteristic (ROC) curve to evaluate model effectiveness.The ROC curve helps to validate a model graphically (Swets, 1988), providing an analysis based on true-positive and false-positive rates.With a higher area under this curve (AUC), this model is shown to perform well in prediction (Mathew et al., 2009).

Results
The results and validation of the logistic regression models for five data sets are shown in Table 4.It is found that among these five data sets, the percentage correct in the confusion matrix ranges from 78.7 to 80.4 % during the modeling process and from 79.9 to 82.1 % during the validation process.Generally, the logistic regression models in this study show high accuracy in the confusion matrix.For the five data sets, their AUC values range from 0.8685 to 0.8846 when modeling (Fig. 3) and from 0.8809 to 0.8933 when validating (Fig. 4).On average, the AUC value in the logistic regression model is approximately 0.88, which indicates a relatively great performance in prediction.
By using the principle of having a high percentage correct in the confusion matrix, high AUC value and low AIC value, the regression model from data set 2 was selected as the global landslide susceptibility model.This model is then used to analyze the importance of the explanatory factors on landslides and employed in landslide susceptibility mapping.The formula of the best model is as follows: where P stands for the probability of landslides, and S, A, L, R, and E stand for landslide explanatory factors of soil moisture, PGA, lithology, relative relief and extreme precipitation, respectively.In the model above, all variables are significant at the 1 % confidence level.The coefficients of each factor show that the greatest contribution to landslide occurrence comes from soil moisture, which has a coefficient of approximately 0.6.The next most important factors are relative relief and extreme precipitation, with a coefficient of approximately 0.3.The contribution of PGA and lithology is relatively low, with coefficients of approximately 0.2 and 0.1.
A table with the number of landslides in each continent in a global inventory and in each data set used to model and validate is displayed, which will help readers to understand how spatially representative the data sets are (Table 5).It can be found that there are a small number of landslide records in Africa.However, when either in the modeling process or the validation process, different numbers of landslides and nonlandslides in African were selected.From Figs. 3 and 4, it is demonstrated that the results from every five data sets are relatively stable and high, which means the model can be applied effectively in Africa.Otherwise, the results of five data sets may be different.
A global landslide susceptibility map can be drawn using the model in Eq. ( 3).Based on existing susceptibility classification methods from Guzzetti et al. (2006), Van den Eeckhaut et al. (2012), this map classifies susceptibility levels according to breakpoints of 0.4, 0.6, 0.7 and 0.9.These break-  points define a susceptibility map with 5 levels, i.e., very low, low, moderate, high, and very high (Fig. 5).
The susceptibility map shows that global landslide hotspots are the Alps, the Iranian Plateau, the Pamirs, the southern Qinghai-Tibet Plateau, the mountainous region of southwestern China, the islands in the western Pacific Ocean, including Japan, the Philippines, Malaysia, Indonesia, New Zealand, northeastern North America, Central America and the Andes in South America.

Discussion
To evaluate the accuracy of the susceptibility map produced in this research, the global landslide susceptibility map is compared with four studies from the current literature that focus on large-scale landslide susceptibility.At a regional scale, two landslide susceptibility maps, i.e., European (Van Den Eeckhaut et al., 2012) and Chinese (Liu et al., 2013), are selected.At a global scale, the studies of Nadim et al. (2006) and Hong et al. (2007) are selected.
By comparing the European landslide susceptibility map drawn by Van Den Eeckhaut et al. (2012) with the European part of susceptibility map in this study (Fig. 6a), similar areas of high landslide susceptibility can be observed.The former map includes two levels (labeled high and very high) as high susceptibility with a landslide probability of over 0.8, and this study also includes two levels (levels 4 and 5) as high susceptibility with a probability over 0.7.The two maps have similar high susceptibility areas.Thus, for Europe, the landslide susceptibility map in this study agrees with an existing related study.
After comparing the Chinese landslide susceptibility map drawn by Liu et al. (2013) with the China part of susceptibility map in this study (Fig. 6b), it can be seen that the former map includes two levels (levels 4 and 5) as susceptible with a landslide probability of over 0.6.The map in this study includes three levels (labeled levels 3, 4 and 5) as susceptible with a landslide probability of over 0.6.The main differences between the two maps are in the western Sichuan Basin and southern Tibet, which is famous for its high elevation and intense relative relief.This study applies many landslide cases in these areas.However, in the landslide database of Liu et al. (2013), only a few landslides occur in these areas.This discrepancy is the reason for the differences between the two maps.
As for landslide susceptibility at a global scale, Nadim et al. (2006) and Hong et al. (2007) have made magnificent efforts on this topic.One global landslide susceptibility map (please refer to Fig. 7 in Nadim et al., 2006) has five levels (levels 5, 6, 7, 8 and 9) as susceptible, while the map from this study includes three levels (levels 3, 4 and 5) as susceptible.In general, the susceptible areas of these two maps are fairly similar, except in Madagascar and the eastern Indochinese Peninsula.
Another global landslide susceptibility map (please refer to Fig. 3a in Hong et al., 2007) has two levels (levels 4 and 5)  as susceptible, compared to the map in this study, which has three levels (levels 3, 4 and 5) as susceptible.These two maps are similar over Asia, Europe and Africa.However, it is noted that map of Hong et al. (2007) also differs from the map of this study in that it shows high landslide susceptibility in central and southern India, and low landslide susceptibility in equatorial islands such as Malaysia, Indonesia, and the Philippines.We believe that the classification of landslide susceptibility of this research could be more scientific and closer to the existing conditions.With the development of global DEM products, a DEM with finer resolution is now available to the public.The NASA Shuttle Radar Topographic Mission (Jarvis et al., 2012) has provided digital elevation data for over 80 % of the globe.These data are currently distributed free of charge.The SRTM data are available as 3 arcsec (approx.90 m resolution) DEM covering the globe from 60 • N to 60 • S. The 1 arcsec data product was also produced and are now available for all countries.To explore the sensitivity of the DEM to the model result, experiments were also performed when following all the procedures stated above, but using SRTM 90 m DEM as the source of topography.As shown in Table 6, the landslide susceptibility model with 90 m DEM had no significant difference (only an increase about 0.005 in AUC) with those models using 1 km DEM (AUC in modeling, from 0.8768 to 0.8818; AUC in validation, from 0.8871 to 0.8929).When the location precision of the landslide is not that good, using a finer DEM cannot help to increase the accuracy of landslide susceptibility analysis.A DEM with a coarser resolution (i.e., 1 km DEM) is recommended as the topographical factor in global landslide susceptibility mapping.
The accuracy of the logistic regression model in this paper is quite high compared with that of a similar experiment that is performed at the national scale (Lin et al., 2017) or local scale (Wang et al., 2016), which really exceeds expectation.
To have one single model to explain the occurrence of past landslides events on a global scale may be difficult, but the result of the model in this paper shows that the factors and their weights in this research can actually provide a good explanation of global landslide occurrence in one model.
Regarding the incompleteness of the landslide inventory in the global geological hazard database of this study, the The main focus of this research is global landslide susceptibility assessment, and hence the landslides in database of this research should be representative on global scale, i.e., having a large volume or causing significant loss.The landslides in our database can meet such requirements and are either of large magnitude or cause severe life loss or economic loss, which are hence commonly reported by news agencies.The global landslide susceptibility map built on this database can inevitably underestimate the landslide susceptibility in some sparsely populated areas or less developed areas.However, if we do not follow the guidelines, in our database there will be a large number of landslides that are occur in countries with a good landslide catalogue and a few in countries with a poor landslide catalogue.This model may lead to an overestimation of landslide susceptibility in countries with rich landslide records and an underestimation of landslide susceptibility in countries with poor landslide records.This may not be good for improving the accuracy of the map of global landslide susceptibility.Hence we think that the landslide database in our research is relatively high in representativity and reliability.We will explore the use of big data on the internet in building more a comprehensive landslide database in our future research and try to enhance the studies of landslide susceptibility when landslide catalogues from various countries can be easily accessed in the future.

Conclusions
This paper applies a stepwise logistic regression model to study landslide susceptibility on a global scale.After inves-L.Lin et al.: Landslide susceptibility mapping on global scale tigating the explanatory factors for landslides in the existing literature, five explanatory factors, extreme precipitation, lithology, relative relief, ground motion, and soil moisture, are selected.These factors are used to build a landslide susceptibility model through stepwise logistic regression based on landslides recorded in a combined global landslide inventory.It is found that the five explanatory factors perform well in explaining the occurrence of landslides on a global scale.The percentage correct in the confusion matrix of landslide classification during modeling ranges from 78.7 to 80.4 %, with an AUC value from 0.8685 to 0.8846.During validation, the percentage correct in the confusion matrix ranges from 79.9 to 82.1 %, with an AUC value from 0.8809 to 0.8933.The results from those data sets are similar, and the coefficients and ranks of each explanatory factor are relatively stable, which suggests that the model is both robust and accurate.
Existing studies of landslide susceptibility generally use topography as an explanatory factor (Budimir et al., 2015).However, on a global scale, topography is not always the primary factor for landslide occurrence.For example, Hong et al. (2007) gives priority to slope when building their global landslide models, and friction has the highest regression coefficient in model for earthquake-induced landslides (Nowicki et al., 2014).The present study shows that on a global scale, soil moisture is the most important factor, while topography (relative relief in this study) is secondary.Additionally, this study shows that soil moisture has significantly explanatory power for landslide occurrence on a global scale.Therefore, it may suggest that future work of landslide susceptibility should consider the influence of soil water condition and long-term precipitation when studying global landslide susceptibility.
Data availability.The datasets of independent variables in this study can be accessed by reviewing the data sources stated in the section of Methodology and Data.The datasets of dependent variables can be accessed by emailing the corresponding author.
Competing interests.The authors declare that they have no conflict of interest.

Figure 1 .
Figure 1.Landslide location in the combined landslide database.

Figure 2 .
Figure 2. Comparison of landslide overlay in Europe.

Figure 3 .
Figure 3. ROC curve of the modeling process.

Figure 4 .
Figure 4. ROC curve of the validation process.

Figure 6 .
Figure 6.Comparison of existing studies with the related parts of this study.

Table 2 .
Input variables used in logistic regression analysis.

Table 3 .
Example of landslide inventory in World Geological Hazard Inventory created by ADREM, BNU.

Table 4 .
Model results of stepwise logistic regression for each data set.The best fit model is in bold font.These statistics of AIC are based on the model with intercept and covariates.b Coefficients are significant at 1 % confidential level.c Coefficients are significant at 0.1 % confidential level. a

Table 5 .
Numbers of landslides and nonlandslides in each data set.Numbers on the left of the colon represent numbers of landslides, numbers on the right of the colon represent nonlandslides.

Table 6 .
Results of model based on global SRTM DEM (90 m).These statistics of AIC are based on the model with intercept and covariates.b Coefficients are significant at 1 % confidential level.c Coefficients are significant at 0.1 % confidential level. a