Rapid post-earthquake modelling of coseismic landslide intensity and distribution for emergency response decision support

Abstract. Current methods to identify coseismic landslides immediately after an earthquake using optical imagery are too slow to effectively inform emergency response activities. Issues with cloud cover, data collection and processing, and manual landslide identification mean even the most rapid mapping exercises are often incomplete when the emergency response ends. In this study, we demonstrate how traditional empirical methods for modelling the total distribution and relative intensity (in terms of point density) of coseismic landsliding can be successfully undertaken in the hours and days immediately after an earthquake, allowing the results to effectively inform stakeholders during the response. The method uses fuzzy logic in a GIS (Geographic Information Systems) to quickly assess and identify the location-specific relationships between predisposing factors and landslide occurrence during the earthquake, based on small initial samples of identified landslides. We show that this approach can accurately model both the spatial pattern and the number density of landsliding from the event based on just several hundred mapped landslides, provided they have sufficiently wide spatial coverage, improving upon previous methods. This suggests that systematic high-fidelity mapping of landslides following an earthquake is not necessary for informing rapid modelling attempts. Instead, mapping should focus on rapid sampling from the entire affected area to generate results that can inform the modelling. This method is therefore suited to conditions in which imagery is affected by partial cloud cover or in which the total number of landslides is so large that mapping requires significant time to complete. The method therefore has the potential to provide a quick assessment of landslide hazard after an earthquake and may therefore inform emergency operations more effectively compared to current practice.


Introduction
Coseismic landslides are one of the most widespread and destructive hazards to result from earthquakes in mountainous environments. Fatalities in earthquakes with landslides have been shown to be up to ten times higher than in comparable 25 earthquakes without landslides (Budimir et al., 2014). Landslides are also a key inhibitor of relief and reconstruction via the blocking of critical infrastructure and present a chronic hazard, with post-earthquake landslide rates remaining elevated compared to pre-earthquake rates for at least several years (Marc et al., 2015). Rapidly identifying the distribution of landslides following an earthquake is therefore crucial for understanding the total earthquake impacts (Robinson and Davies, 2 2013), aiding immediate emergency response efforts, including search and rescue (SAR), and assessing the longer-term postearthquake risks. If an assessment of landsliding is to be useful for emergency response, it needs to be rapid, i.e. generating outputs within the same timeframe (hours to days after the mainshock) as a response is being coordinated, understandable, and communicated to appropriate stakeholders. However, post-earthquake landslide mapping is a difficult and timeconsuming task, hindered by issues relating to the collection and processing of appropriate satellite or aerial images, cloud 5 cover, and the slow speeds associated with manually identifying and mapping large numbers of landslides. Following the 2015 M w 7.8 Gorkha earthquake in Nepal, efforts to rapidly identify and map coseismic landslides using satellite imagery were undertaken by various international groups. However, these initial efforts were not completed until more than one month after the earthquake occurred (e.g., Kargel et al. 2016), and despite most suitable imagery being collected during this time, complete inventories containing polygon data took considerably longer (e.g. Martha et al., 2016;Roback et al., 2017). 10 By comparison, the post-earthquake emergency response began to transition to a recovery phase within days to weeks after the event (Nepal Army, 2016). Consequently, the results of efforts to manually map coseismic landslides can come too late to effectively inform the emergency response.
This issue can potentially be helped through the use of empirically-based models of coseismic landsliding that describe the locations where landslides are more or less likely to have occurred. Several previous attempts have been made to produce 15 landslide models that can be applied rapidly after an earthquake (e.g. Jibson et al., 2000;Godt et al., 2008;Nowicki et al., 2014;Gallen et al., 2016), but with somewhat limited success. These models have attempted to predict the locations where landslides are most likely to have occurred, based either on statistical analysis of the locations of identified landslides from one or more prior earthquakes, or on a simplified Newmark analysis. Approaches using statistical analysis of landslides from previous earthquakes analyse the predisposing factors present at landslide sites and then identify 20 locations with similar combinations of the same predisposing factors. Historically, such approaches are location specific, only using landslides identified from a previous earthquake in the study area to train the model. However, this is not possible in regions without previously compiled landslide inventories. To address this, several attempts to develop global relationships between predisposing factors and landslide occurrence have been undertaken (e.g. Nowicki et al., 2014;, allowing the model to be location independent and thus rapidly applicable to any area of interest 25 following an earthquake. However, these models suffer from issues related to the quality and accuracy of input data, availability of training inventories from a sufficient number of different representative environments, and an inability to effectively extrapolate relationships beyond the input data limits. Approaches using a simplified Newmark analysis adapt the well-established Newmark sliding block model (Newmark, 1965) to predict slope stability using peak ground acceleration (PGA), local slope angle, and estimates of material shear 30 strength properties (Jibson et al., 2000;Gallen et al., 2016). They can therefore be rapidly applied following an earthquake using only a digital elevation model (DEM) and established ground motion prediction equations. This method is limited however, because estimates of shear strength properties are difficult to obtain and are generally unknown at scales relevant to landsliding . Gallen et al. (2016) applied a rapid Newmark analysis model immediately following the 3 2015 Gorkha earthquake, predicting 2987 landslides compared to 2214 landslides that they were able to observe. However, their model showed significant discrepancies from the observed spatial pattern of landslides and subsequent mapping (Martha et al., 2016;Roback et al., 2017) has suggested up to ten times more landslides occurred than Gallen et al's (2016) model predicted. This failure is crucial, as the locations where landslides occur dictate the required response as much or perhaps more than the number or volume of landslides. Developing a method that can accurately model both the spatial 5 distribution and intensity of landsliding within hours to days of a large earthquake is therefore vital.
This study demonstrates how rapid modelling of landslide intensity (in terms of point density per unit area) and distribution following an earthquake can be successfully undertaken using traditional statistical analyses. Instead of attempting to identify global relationships for predisposing factors, this study undertakes a more typical location specific statistical analysis using fuzzy logic in Geographic Information Systems (GIS) based on small samples of landslides identified by 10 initial mapping efforts soon after the earthquake. The aim is to investigate whether small samples of rapidly mapped landslides can be used to quickly forecast the locations of as-yet unidentified landslides, and what the necessary requirements of these initial mapping efforts are in terms of spatial coverage and number of landslides required. To do this, we first apply the approach to the 2015 Gorkha earthquake using a relatively large training sample of landslides (n = 2006) mapped within 12 days of the event before iteratively reducing the number of training landslides in order to assess the effect 15 on the output landslide forecasts. The purpose of the study is to demonstrate that meaningful assessments of the intensity and total distribution of coseismic landslides can be undertaken within a short time after an earthquake based on location specific data, allowing the outputs to effectively inform emergency response.

Gorkha earthquake application
The M w 7.8 Gorkha earthquake occurred on 25 April 2015 with an epicentre ~80 km north-west of Kathmandu ( Fig. 1), 20 rupturing a ~150 km section of the Main Himalayan Thrust (Avouac et al., 2015;Hubbard et al., 2016). It resulted in intense ground shaking throughout central Nepal leading to large-scale damage to built infrastructure and > 8000 fatalities. The earthquake predominantly affected the Lesser Himalayan region north of Kathmandu, which is characterised by steep slopes and elevations ranging from ~1000 -3000 m. Martha et al. (2016) and Roback et al. (2017) documented between 15,000 and 25,000 coseismic landslides with a total area of ~90 km 2 and affecting an area > 14,000 km 2 (Fig. 1). The earthquake was 25 followed by a series of powerful aftershocks, including a M w 7.3 on 12 May 2015, but most of the damage is believed to have resulted from the mainshock, with just 213 landslides attributed to the 12 May aftershock (Martha et al. 2016).
Landsliding from the mainshock led to widespread losses in remote communities, as well as impacts on roads and trails that provide the only access to those communities.
Efforts to rapidly identify and map the resulting landslides led to one of the largest-ever NASA-led satellite image 30 acquisition responses to an earthquake disaster (Kargel et al., 2016). However, persistent cloud cover in the affected region, combined with the time required to download and georeference large amounts of imagery, hindered these efforts. By 4 May 4 2015 (9 days after the mainshock), a joint group from Durham University (DU) and the British Geological Survey (BGS) had mapped the location of just 279 landslides, which increased to 2006 by 7 May, 12 days after the mainshock (Table 1; EwF, 2015). A parallel mapping effort was undertaken by a joint group led by the International Centre for Integrated Mountain Development (ICIMOD), NASA, and the University of Arizona (UA), and identified 4312 landslides ( Fig. 1) in the period to 2 June 2015 (38 days after the mainshock) (Kargel et al., 2016). 5 Kargel et al. (2016 argued that this was one of the fastest and broadest emergency remote sensing efforts undertaken by NASA. Importantly, however, the on-the-ground emergency response following the earthquake was decelerated from the end of May, 36 days after the mainshock, and officially ended on 15 June 2015, 51 days after the mainshock (Nepal Army, 2016). Consequently, much of the results of these mapping efforts came too late to inform any humanitarian activities during the emergency response. It is therefore clear that the current approach of manually mapping landslides post-earthquake is not 10 sufficiently fast to effectively inform or support decision making during an emergency response.
In this study, we demonstrate an approach that can rapidly assess the potential total distribution of landslides based on an initial, small sample of identified landslides collected soon after the earthquake. To do this, we first model the spatial to 2 June 2015. While this larger dataset is thought to contain some of the 2006 landslides from the DU-BGS group, differences in mapping approaches (e.g. landslide top points versus centroids) between the groups result in minor location differences between the two datasets ( Fig. 1). Following this, we iteratively reduce the number of initial training landslides and evaluate the effect on consequent landslide forecasts in order to examine the minimum number of landslides required to achieve a successful forecast. 20

Data
The initial training landslides were manually mapped by the authors in GIS in the immediate aftermath of the earthquake using imagery from a range of sources, including web-hosted high-resolution optical data in Google TM Crisis Response (e.g. UK-DMC2 © DMCii, WorldView ©, DigitalGlobe TM Inc., SPOT © CNES), imagery accessed via the Disaster Charter, imagery available from USGS HDDSExplorer, and imagery specifically tasked over regions of interest (e.g. Pleiades © 25 CNES). All data, apart from that which was hosted on web-based GIS platforms, variously required georeferencing, pansharpening, and orthorectification. A database of cloud cover was maintained to monitor where ground had been visible and mapped, and where it remained obscured. Cloud cover predominantly obscured major ridgelines (i.e. watershed boundaries) resulting in the training dataset being systematically biased against landslides that occurred at or near watershed boundaries. All identified landslides were mapped as a single polyline along the long-axis (crest to toe) allowing data on 30 landslide location, extent, and the intersection with infrastructure to be quickly extracted. Point landslide positions at the crest end of the mapped polyline are extracted for this analysis, representing the location of initial failure. Mapping was 5 undertaken at a scale of approximately 1:10,000 and was incrementally posted to the Humanitarian Data Exchange portal, and shared more widely with stakeholders in Nepal in the days and weeks after the earthquake. This mapping effort, and the lessons learned, is described in detail in Williams et al. (forthcoming).
Topographic factor analysis undertaken herein has used the open-source ASTER GDEM V2, which has a cell resolution of 30 m, while seismic data was downloaded at the time from the USGS (PGA models and fault plane solutions) and Global 5 Centroid Moment Tensor (CMT) catalog (slip vector).

Method and data analysis
Statistical approaches to modelling regional-scale landslide hazard have been summarised in several comparative overviews (e.g., Aleotti and Chowdhury, 1999;Guzzetti et al., 1999;Wang et al., 2005). The present study uses fuzzy logic as the method is computationally simple, can consider highly uncertain data inputs, has been shown to match or out-perform other 10 approaches (Pradhan, 2010;Bui et al., 2012;Pourghasemi et al., 2012), and, importantly for this study, is fast to apply.
While other approaches, such as multi-variate statistical analysis, may provide more accurate landslide forecasts, we argue that the marginal gain in forecast accuracy for such approaches is outweighed by the time required to undertake them.

Fuzzy logic
Fuzzy logic derives from fuzzy set theory in which an event is assigned a degree of membership, varying between 0 and 1, to 15 a given set, where 1 indicates that the event is entirely related to the set (full membership) and 0 indicates that the event is entirely unrelated to the set (no membership). When utilised for landslide hazard models, this membership value describes how different values of a predisposing factor influence landslide occurrence, with higher values representing greater influence. For instance, steeper slope angles are known to produce higher rates of landsliding compared to shallower slope angles, meaning the influence of slope angle on landsliding increases with increasing slope angle. Influence (I) is measured 20 for unique values of each predisposing factor using a kernel density estimate (KDE) by comparing the prevalence of a given unique value at landslide locations to its prevalence throughout the whole study area: where I x is the influence of factor value x, Lx KDE is the KDE of value x for landslide locations, Sx KDE is the KDE of value x throughout the study area, and n is the total number of unique factor values considered. Consequently, if the prevalence of a 25 value is greater at landslide locations than throughout the whole study area, that value has higher influence on landslide occurrence. I is then normalised to a 0-1 scale by The factors selected for analysis typically have a broad physical expectation in their influence on landsliding, such as increasing landslide frequency with increasing slope angle, or decreasing landslide frequency with decreasing ground 30 6 shaking. These physical expectations are based on the current understanding of landslide processes and previous observations. Factors that match the broadly expected relationship between influence and factor values are carried forward, while factors that do not match the expected relationship and cannot otherwise be explained based on understanding of landslide processes (e.g. increasing landslide frequency with decreasing slope angle) are removed from the analysis. This assumes that these factors are not playing an important role in landslide occurrence in the study area. 5 Factors carried forward undergo a semi data-driven linear regression to find the most appropriate function to describe the distribution of I norm , using the coefficient of determination (R²) to assess goodness of fit. A semi data-driven approach is preferred as this allows the user to manually alter the function if necessary (e.g., by using multiple functional forms) to achieve better goodness of fit for specific influence values at the expense of overall goodness of fit. For instance, in circumstances where the best overall goodness of fit is primarily derived from better fitting low influence values rather than 10 high influence values, the output hazard model will be optimistic in its forecasting of landslide hazard. In situations where a conservative forecast is more appropriate, altering the function to better fit high influence values at the expense of overall goodness of fit may be necessary. Typical functional forms considered include linear, exponential, power-law, and polynomial functions. The final function is known as the membership function, ( ), which describes how I norm is distributed against all values of a given factor. ( ) acts as a filter to convert factor maps ( Fig. 2), which show the 15 spatial distribution of unique values within each factor, into influence maps (Fig. 3), which show the spatial variation of I norm .
The resulting influence maps are aggregated on a pixel-by-pixel basis to produce a landslide hazard map that can have values [0, 1]. This output represents the relative likelihood of landsliding due to the combination of influence from all factors: values < 0.2 suggest landsliding is very weakly anticipated, while values > 0.8 suggest landsliding is very strongly anticipated. Landslides are therefore more likely to occur at larger aggregate influence values as landsliding is more strongly 20 anticipated at these locations.
Various approaches to aggregation exist, however the Fuzzy Gamma approach has been shown to be the most effective for landslide modelling (Bui et al., 2012;. This approach provides a compromise between the increasing effect of Fuzzy Sum and the decreasing effects of Fuzzy Product (ESRI, 2016). Fuzzy Gamma establishes the combined effect of multiple membership functions for each pixel such that 25 where H LS is landslide hazard, I normF is the membership function for factor F, where F = 1, 2, …, j; j is the number of factors to be aggregated; and γ is a user-defined parameter between 0 and 1. The effect of changes in γ has previously been tested, with the optimal value for landslide modelling shown to be ~0.9 . Values less than 0.9 were shown to better forecast landslide non-occurrence, at the expense of landslide occurrence, while values 30 greater than 0.9 achieve better forecasts for landslide occurrence but generally predict high landslide hazard everywhere; 0.9 provides the best compromise .

7
This approach can therefore identify locations in the study area where the combination of predisposing factors results in landsliding being most strongly anticipated, and hence where as-yet unidentified landslides are more likely to have occurred.

Data analysis
In the present study, landslides mapped by the DU-BGS group prior to 7 May 2015 are used as the initial training dataset (n=2006) and landslides mapped by the ICIMOD-NASA-UA group prior to 2 June 2015 are used as the test dataset 5 (n=4312). The initial training dataset is used to derive the KDEs and membership functions, and the test dataset is used to measure the accuracy and sensitivity of the model via the area under receiver operating characteristic (ROC) curves. The ROC curve is a fundamental tool for evaluating model accuracy and plots the true positive rate of the model against the false positive rate using various cut-off values (Metz, 1978;Zweig and Campbell, 1993). Landslides occurring in cells with H LS values above the cut-off are considered true positives, while cells above the cut-off without landslides are considered false 10 positives. To be considered successful, models must typically achieve area under the curve (AUC) values > 0.7 .
Distances from fault plane, rivers, and confluences are calculated as the two-dimensional Euclidean (straight line) distance between pixel centroids. For fault plane distance this does not account for depth, assigning pixels directly above the plane 20 with distance = 0. Rivers are defined using a flow accumulation tool with an upstream contributing area threshold of 5 km², and river confluences are defined as the point where two or more rivers intersect. NRSD is calculated as the distance between a river and the nearest watershed boundary, where a value of 0 represents a pixel in the river and a value of 1 represents a pixel on the watershed boundary. Relief is calculated as the standard deviation of elevation values within a square window of radius a) 30 cells (hillslope relief); and b) 10 cells (sub-hillslope relief). Az Epi is derived by combining local slope aspect (As) 25 with the Euclidean direction to the epicentre (Dir), such that The broad physical expectations for each factor in terms of influencing landsliding are based on understanding of landslide processes and previous observations (Keefer, 2000;Khazai and Sitar, 2004;Lee et al., 2008;Meunier et al., 2008;Kritikos et 8 al., 2015;Parker et al., 2015). For slope angle, increasing gradient is expected to yield increasing landslide frequency as the angle between normal stress and gravity increases, producing greater downslope force. Increasing distance from fault plane is expected to produce decreasing landslide frequency due to regional attenuation of ground motions and wave amplitudes.
More landslides are expected at higher elevations due to a combination of greater gravitational potential and rock weakening processes such as glacial erosion. Increasing distances from rivers and river confluences are expected to return decreasing 5 landslide frequency as the amount of undercutting of basal hillslopes and downwearing by the rivers also decreases, with river confluences potentially acting as knickpoints where greater erosion rates occur. Planform curvature is used to identify ridges and slope shoulders that can amplify ground shaking and result in higher landslide frequencies (Meunier et al., 2008).
NRSD is therefore expected to produce a quadratic relationship with landslide frequency, with higher landslide frequencies expected close to rivers (NRSD ~ 0) and ridges (NRSD ~ 1). Larger relief values may result in larger landslide frequencies as 10 gravitationally-induced shear stresses increase and rock mass strength decreases with the local and regional height of hillslopes (Schmidt and Montgomery, 1995). Slope aspect in relation to epicentre and slip vector is expected to increase landslide frequency on either facing or opposing slopes due to directional patterns of topographic amplification resulting from the incidence angle of seismic waves (Meunier et al., 2008).
With the exception of earthquake-specific parameters (D F , PGA, Az Epi and Az SV ), each of these factors can be directly derived 15 from a global DEM, such as the ASTER GDEM V2 used herein. This reduces the initial data requirements, helping to facilitate rapid modelling. This is especially important for locations such as Nepal where data for other predisposing factors, such as lithology, structural geology or rainfall, are either absent or difficult (and thus time-consuming) to obtain. PGA, fault plane solutions and CMT data are available from the USGS and Global CMT catalog within a few hours of the earthquake occurring. Consequently, no pre-event data collection is necessary to implement this model, as all necessary data can be 20 gathered post-earthquake from openly available sources.

Peak ground acceleration
PGA is given special consideration in the model, separate to the other predisposing factors. PGA models following an earthquakesuch as those produced by the USGS ShakeMapoften change frequently as more acceleration data become available (Fig. 4). Rapid assessment of the influence of PGA on landslide occurrence may therefore give misleading results, 25 as the best PGA model available in the immediate earthquake aftermath may not yet accurately reflect the true pattern. For instance, initial PGA models following the Gorkha earthquake showed that PGA was highest in the region south of Kathmandu. This suggested that the majority of landslides, which occurred north of Kathmandu, occurred in locations with low-to-moderate PGA (Fig. 4). The time required to generate a final (or near-final) PGA model can be on the order of several days or weeks, and the shaking model may therefore be completed too late to be effectively incorporated into the 30 rapid assessments considered herein. Furthermore, there is commonly a high degree of scatter in models that attempt to predict the number, size, and location of coseismic landslides from PGA (e.g. Keefer, 1984;Meunier et al., 2007). Results 9 relying on initial PGA models are therefore likely to have such large uncertainties that they are impractical for informing emergency responders.
To account for difficulties in modelling landslides from PGA, we therefore consider PGA as a threshold factor. This involves setting I norm = 0.0 for all values below a given threshold, while values above the threshold are assigned I norm = 0.5. This ensures that locations that do not exceed the threshold cannot sustain landslides, while landslide influence in locations above 5 the threshold is dictated by the combination of other factors. From Fig. 4 and Martha et al. (2016) and Keefer (1984) this threshold is suggested to be ~0.08 g.
Importantly, while the location and magnitude of PGA values varied between initial USGS ShakeMap models over the first few days, the area contained within the 0.08 g contour was nearly invariant (Fig. 4). Given that ShakeMap interpolates PGA between seismic stations (Wald et al., 2005), it is easier to rapidly define the footprint of the area experiencing shaking (i.e. 10 PGA > 0.08 g) compared to defining the highest values within this zone, as the number of recording stations increases with distance from epicentre, even in regions with low recording density.

Factor influence modelling
Of the 12 predisposing factors considered, seven show a clear and definable relationship between influence and factor value, 15 and these are carried forward in the modelling process. The remaining factors showed a relationship that did not match the physical expectation and could not otherwise be explained, and are therefore removed from the analysis. For instance, NRSD shows the expected relationship at locations close to rivers but not for locations close to ridgelines (Fig. 3i). This is in contrast to the relationship observed for planform curvature (Fig. 3e). This may be a consequence of ridgelines herein for NRSD being defined using watershed boundaries, which are systematically obstructed from view in the images used to 20 collect the initial training inventory. Further, landslides occurring on slope 'shoulders' within a watershed are not accounted for by NRSD, but are accounted for by planform curvature. This highlights an issue with using NRSD to represent topographic amplification on ridgelines as it cannot account for intra-watershed ridgelines. Similarly, the expected relationship with elevation is not recorded and there is no clear reason why landslides should be strongly influenced at elevations of 2000 -3000 m asl and only weakly influenced at higher elevations (Fig. 3h). More likely this reflects the fact 25 that higher elevations were predominantly outside of the 0.08 g contour. For these reasons, NRSD and elevation are among the four factors that are not carried forward.
For the factors carried forward, increases in slope angle correlate with increased influence up to slope angles of ~65° (Fig.   3a), at which point the relationship ends due to the small area covered by slope angles > 70° (Fig. 2a). Slopes <~15° show no influence suggesting these slope angles are insufficient to sustain landsliding (i.e. I norm = 0). Increased distances from the 30 fault plane, rivers, and river confluences generally result in decreased influence (Fig. 3b-d), despite a small and unexplained increase at distances of ~3000 m from rivers. Increases in relief at both hillslope and sub-hillslope scales are associated with increased influence up to ~200 m and ~90 m respectively (Fig. 3f-g). Similar to slope angle, the variation above these values likely reflects the small area covered by such values (Fig. 2f-g). Larger planform curvature values also correlate with increased influence, with notably higher influence on laterally convex (ridgeline) pixels compared to laterally concave (valleys) pixels (Fig. 3e).
These findings are similar to those of other studies (Khazai and Sitar, 2004;Parker et al., 2015; in that 5 they show coseismic landslides are most strongly encouraged on steep, laterally convex hillslopes close to rivers, and most weakly encouraged on shallow, linear slopes situated far from rivers.

Model success
Using the membership curves derived in Fig. 3 showing the variation in I norm for each factor, we aggregate different 10 combinations of the seven predisposing factors carried forward to find the best performing combination of variables to model the spatial distribution and intensity of the 4312 test landslides. Using all seven predisposing factors results in AUC = 0.838 but the best performance (AUC = 0.870) is achieved by excluding sub-hillslope relief and planform curvature (Fig. 5).
The worst performing model (AUC = 0.761) excludes hillslope relief and planform curvature (Fig. 5). Importantly, all models achieve AUC values > 0.7, suggesting that any combination of these factors is able to model spatial landslide 15 distribution with reasonable accuracy.
The output of the most successful model is shown in Fig. 6 and shows the locations where landslides are strongly-to-very strongly anticipated (H LS > 0.6) in red and weakly-to-very weakly anticipated (H LS < 0.4) in blue. The maximum modelled H LS = 0.93 suggesting landsliding from this earthquake was very strongly anticipated in at least some locations. The total area where landslide occurrence is mildly-to-very strongly anticipated (H LS > 0.4) covers ~18,000 km 2 , approximately the 20 same size as the total affected area (~14,000 km 2 ) mapped by Martha et al. (2016) and Roback et al. (2017). The area where landsliding is very strongly anticipated (H LS > 0.8) covers ~85 km 2 , similar to the 90 km 2 total landslide area mapped by Martha et al. (2016). This area is concentrated in the major river valleys in Gorkha, Dhading, Rasuwa, and Sindhupalchok districts (Fig. 6), corresponding well with observed landsliding (Fig. 1) and confirming the high AUC value. With a combination of a high maximum H LS value and a large area where landsliding is strongly anticipated, the model suggests that 25 widespread landsliding from this earthquake was highly likely.

Landslide intensity and spatial distribution
In order to be useful for informing emergency responders, it is essential that the final output model explicitly demonstrates both the spatial extent and intensity, either in terms of number, area or volume, of landsliding. To do this, we estimate landslide intensity in terms of number density per unit area as a function of the kernel density of pixels with H LS values 30 exceeding some user-defined threshold (e.g. all cells with H LS ≥ 0.9). The total spatial extent of landsliding is then represented by the area where kernel density is > 0. This assumes that the intensity of landsliding in any given area is directly linked to the frequency of cells above the corresponding threshold within that area. An area dominated by cells with high H LS is expected to experience a higher intensity of landsliding (i.e. a higher number density) than a comparable area dominated by cells with low H LS .
Taking the most successful hazard model (Fig. 6), we set threshold limits at the 99 th , 95 th , and 90 th percentile of H LS values 5 where the area with values above the threshold accounts for 1%, 5%, and 10% of the total area (H LS > 0) respectively, corresponding to H LS = 0.74, H LS = 0.65 and H LS = 0.60 (Fig. 6 inset) . The corresponding cells are extracted and the kernel density for each threshold is calculated using a 1 km 2 moving window (Fig. 7). To test how well the training inventory is able to model total landslide intensity and distribution, the results are compared against the kernel density of all 4312 test landslides, also defined using a 1 km 2 moving window. Because this study considers individual pixels, the total number of 10 cells exceeding each threshold is large (1% = 388,694; 5% = 1,927,614; 10% = 3,827,036) and consequently the corresponding kernel densities (100 -1000 per km 2 ) are not directly comparable to real landslide number densities (1 -100 per km 2 ). This occurs because groups of adjacent pixels exceeding the threshold may in reality represent only one or two landslides, but each pixel is considered individually in the kernel density analysis. To account for this, we scale all kernel density models onto a 0-1 scale in which the maximum kernel density = 1 and all other values are scaled proportionally. The 15 results therefore consider relative landslide intensity (i.e. how locations compare relative to each other) rather than absolute intensity.
By setting a threshold at the 99 th percentile (H LS > 0.74), the model is able to accurately predict both spatial distribution and relative intensity of landsliding (Fig. 7). Generally, the model performs well, although it over-predicts in Gorkha district near the mainshock epicentre and under-predicts in Sindhupalchok district near the M w 7.3 aftershock. In particular, it is notable 20 that the model successfully highlights the high relative intensity of landsliding near to the mainshock epicentre, despite no training landslides coming from this region (Fig. 1), as well as closely matching the observed spatial pattern of landsliding.
However, it fails to identify landslides in the hills south of Kathmandu, suggesting that landslides here were not influenced in the same way, or by the same factors, as landslides further north. None of the training landslides are located in this region ( Fig. 1). Comparatively, setting thresholds at the 95 th (H LS > 0.65) and 90 th (H LS > 0.60) percentiles substantially over-25 estimates both the relative intensity, particularly close to the Nepal-China border, and the total landslide pattern, whilst still underestimating landslide intensity near the M w 7.3 aftershock (Fig. 7).

Minimum landslide numbers
The current method is trained on 2006 landslides mapped within the first 12 days following the mainshock. We test the impact of reducing the number of landslides involved in training the model in order to estimate how soon after an earthquake 30 this method could be implemented, assuming that the number of landslides mapped immediately after an earthquake may be quite small (Table 1). In reducing the number of training landslides, we also consider the effect of their spatial distribution on the model results by comparing landslide sub-samples randomly distributed across the affected area with sub-samples that are clustered (Fig. 8). These conditions represent mapping efforts undertaken with both partial cloud cover (randomly distributed sub-samples) and more regular cloud cover with intermittent gaps (clustered sub-samples). This allows us to test the effect of both the number and spatial distribution of training landslides on model performance, as well as consider the weather conditions under which this approach may be best suited.
For the randomly distributed sub-samples, we iteratively select 500, 250, and 100 landslides from the original 2006 training 5 landslides. We repeat this five times, producing five sub-samples that mimic the incremental mapping of landslides following an earthquake. For the clustered sub-samples, we split the original training landslides into 10 separate clusters based on landslide proximity to each other. To test the effect of each sub-sample on the model, we calculate the distribution of I norm as before for the seven predisposing factors for which membership functions have been derived. We then evaluate the R 2 fit of each sub-sample to the corresponding membership curves ( ( )) and compare the results with the R² values 10 achieved for the original 2006 training landslides (Fig. 3).
For the majority of factors, we achieve R 2 values within 10% of the original using 500 randomly distributed landslides (Fig.   8). For all factors other than distance from fault plane and distance from confluences, we achieve R 2 values within 25% of the original when using 250 randomly distributed landslides, while both relief factors and planform curvature achieve the same result for 100 randomly distributed landslides (Fig. 8). However, for the clustered sub-samples there is little 15 consistency between factors in regards to the number of landslides and the R² values achieved. For the sub-hillslope relief, distance from rivers, and distance from confluences factors, the largest clusters have the greatest discrepancies in R² values compared to the original, while smaller clusters are able to achieve R² values within 10% of the original (Fig. 8).
This suggests that almost identical estimates of landslide hazard (Fig. 6), intensity and spatial pattern (Fig. 7) could have been made using just several hundred training landslides, provided that these landslides had a sufficiently wide spatial 20 distribution. Had these landslides been clustered then it is unlikely that similar membership curves, and thus hazard models, would have been derived. We therefore suggest that our model is most suited to conditions of partial cloud cover where the total visible area of ground, and therefore mapped landslides, is small but covers a wide area, or where large numbers of landslides are visible and thus the time to complete mapping is dependent on slow manual identification methods. 6 Discussion 25

Implications for post-earthquake emergency response decision support
This study has derived a model that can be applied rapidly post-earthquake to aid emergency response. Comparison of modelled relative landslide intensity and pattern (Fig. 7) with the location of training landslides (Fig. 1) reveals locations where landslide intensity was expected to be high, but landslides had, at the time, not been mapped. In particular, the model anticipates that landsliding in Gorkha and Dhading districts occurred at similar intensities to Sindhupalchok district despite 30 very few training landslides in the former locations. Likewise, the model correctly identifies that the training inventory under-represents the intensity of landsliding in Rasuwa district.
While the successful output landslide intensity map (Fig. 7) is coarse in detail, we argue that the output is useful for helping to inform emergency response planning. Overlaying the intensity map with population distributions and critical network data can quickly allow emergency responders to identify regions where landslides are expected to have caused losses and therefore require urgent aid. These results can potentially be used to alert decision makers to the need for response in specific locations before landslides have been confirmed either remotely or on the ground (e.g., in Gorkha and Dhading). In regions 5 badly affected by landslides, ground communications may be severely impacted, meaning the need for response can be overlooked until impacts, which might include landslides, are identified, potentially many days after the event. This study may therefore reduce this risk by enabling decision makers to identify regions likely to have sustained intense landsliding, potentially allowing them to target these areas for response before landslides are conclusively identified. Such an approach may lead to improving response times for badly affected and isolated regions following major earthquakes. 10

Potential speed of application
The method presented herein has been developed and applied after the fact and therefore it is necessary to discuss the potential speed with which it could be applied following a future earthquake. Firstly, this study has deliberately focussed on predisposing factors that can be rapidly and directly derived from a DEM, allowing the model to be applied with only minimal additional data requirements. Global DEM datasets at 30 m resolution are freely available from NASA and can be 15 downloaded for the relevant region in a matter of minutes. Likewise, the USGS ShakeMap programme rapidly assesses earthquake parameters within minutes of the earthquake occurring.
We have also shown that it is possible to apply this method from only several hundred landslides (Fig. 8). While the mapping efforts described herein took up to 38 days or more to identify >4000 landslides (Kargel et al., 2016), the joint DU-BGS group had identified 279 landslides within 9 days of the mainshock (Table 1). This was despite the majority of the 20 affected area suffering from cloud cover, suggesting that in events with less cloud cover this time could potential be reduced to just a few days. The proliferation of rapidly available crowdsourced data from unconventional but quickly available sources such as social media may also be used to inform this model (Earle et al., 2010;Bruns and Liang, 2012). Importantly however, the results herein suggest that when cloud free imagery is available for the majority of the affected region, systematically mapping from this imagery in considerable detail, as is currently done, is not required. Instead, mapping could 25 be undertaken in far less detail using a random sample of grid squares covering the entire affected area, with the results then fed into the model described herein. Such an approach may allow landslide mapping and modelling to combine more effectively following an earthquake to inform emergency response operations.
Furthermore, various automated landslide mapping techniques are currently being developed (e.g. Booth et al., 2009;Borghuis et al., 2007;Li et al., 2016), and some methods are even capable of identifying landslides through cloud cover 30 (Kimura and Yamaguchi, 2000). Such techniques can accurately map landslide locations much faster than traditional manual mapping, potentially reducing the time required to map sufficient numbers of landslides. However, whilst automated mapping offers many advantages, at present the range of imagery, the often complex topography, and the poor radiometric 14 distinction between intact slopes and landslides means that manual mapping is often the most reliable approach, particularly in regions without extensive pre-earthquake research.

Other predisposing factors
There are several other notable factors that have been shown to influence coseismic landsliding elsewhere that have not been considered herein, notably bedrock lithology (Keefer, 2000;Parise and Jibson, 2000;Khazai and Sitar, 2004;Dai et al., 5 2011), structural geology (Hoek et al., 2005;Selby, 2005;Moore et al., 2009), and rainfall (Iverson, 2000;Dellow and Hancox, 2006). Whilst these factors are certainly important for landslide occurrence, obtaining sufficiently accurate data is often difficult and time-consuming. This is especially true for lower income countries such as Nepal, where the necessary data either may not exist or may not be readily accessible. Further, we have shown that accurate landslide models can be created without the need for these factors, so long as other important factors, in particular slope angle, are considered (also 10 see Pawluszek and Borkowski (2017) for a comparison of the role of topographic factors and non-topographic factors). Of course, if these data-sets do exist, then they can easily be incorporated into the factor analysis.

Benefits, limitations, and uncertainties
Several other models exist that allow rapid landslide modelling (Jibson et al., 2000;Godt et al., 2008;Nowicki et al., 2014; including the model of Gallen et al. (2016), which was applied following the Gorkha earthquake. The 15 present study provides a useful complement to these existing models, as well as a more general understanding of factors influencing slope failure. One of the primary benefits of our model over others is its ability to accurately model both the spatial distribution and relative intensity of landsliding. Gallen et al. (2016) were able to accurately model the total number of landslides in the Gorkha earthquake but found large discrepancies between their predicted landslide pattern and the observed pattern. Incorrectly modelling the locations of landslides may have detrimental effects on emergency response by 20 focussing responders on areas not requiring immediate response.
Our model also benefits from being trained on landslides known to have occurred during the event. Consequently, the approach allows predisposing factors and the corresponding membership curves to be tailored to the specific location and earthquake under consideration, rather than relying on global or regional relationships (e.g. . No preevent knowledge of landsliding in the affected area is necessary as the approach allows the influence of various factors to be 25 rapidly assessed post-earthquake. The model has been shown to be successful despite systematic bias in the initial landslide inventories, with cloud cover predominantly obscuring the major ridgelines. This suggests that despite the training inventories containing few landslides from major ridgelines, the model forecasts these cells as high hazard, successfully representing the numerous landslides in the test inventory on major ridgelines. This is inferred to be a result of the use of the planform curvature factor, which models high landslide influence in convex cells (i.e. ridgelines), and further highlights the 30 issues with using NRSD to mimic topographic amplification on ridgelines. The lack of pre-event data requirements is also beneficial, as it does not necessitate the collection and storage of global datasets. Where data connections are limited or vulnerable, setting up geodatabases of topography and its derivatives alongside a platform to run models such as the one described prior to an earthquake would be a prudent earthquake preparedness measure.
However, there are important limitations in the model to consider. Firstly, the model is unable to estimate individual landslide area or volume, which becomes of vital importance at the local scale and for assessing post-earthquake impacts like sediment aggradation and flood hazard (e.g., Huang and Fan, 2013). Additional information on the size of landslides would 5 be beneficial, as larger landslides have greater runout and therefore potentially pose a greater hazard. Furthermore, the model only considers relative intensity of landsliding, rather than absolute intensity. Whilst this allows determination of the relative impacts between locations, it does not assess absolute impacts, which may be important. The success of the model is also highly dependent on the spatial distribution of the initial inventories, requiring that the training landslides have a large spatial distribution with limited clustering. In conditions of widespread cloud cover this will be difficult to achieve as landslides can 10 only be mapped through a few isolated gaps in the cloud, resulting in highly clustered training datasets. Ensuring a wide spatial coverage may also hinder the speed at which this approach can be applied by requiring the collection and processing of numerous satellite images. During the response to the Gorkha earthquake, the majority of time required to manually map landslides was taken up by downloading and processing the necessary images (Williams et al., forthcoming). Consequently, despite this approach requiring just several hundred training landslides, currently image collection and processing speeds are 15 the major components influencing the speed of application. However, this is likely to improve with the increase in more medium resolution satellites, such as Sentinel 2, which can rapidly provide suitable resolution imagery for large areas, enabling modelling methods such as that described herein. Finally, as with all landslide hazard models, the accuracy of this method is reliant on the accuracy and resolution of the input data. Errors in locating landslides can result in incorrect estimates of I norm and therefore the eventual hazard models. Similarly, the model is reliant on the resolution and accuracy of 20 the input DEM from which all non-seismic influence factors are derived.

Conclusions
This study has addressed the need for more rapid assessments of coseismic landslide intensity and distribution following a major earthquake. Manual mapping methods can be too slow to effectively inform emergency response operations and are too readily affected by issues with image collection and visibility. The present study has demonstrated an empirical method 25 to model landslide intensity and distribution using fuzzy logic based on initial inventories with small numbers of landslides.
We have shown that landslides mapped in the first few days following an earthquake can be successfully used to assess the influence of various predisposing factors, provided that these landslides have sufficiently wide spatial distribution. The influence of these predisposing factors can be used to model the total pattern and relative intensity of landsliding prior to total landslide inventories being completed using a simple threshold analysis. Applied to the 2015 Gorkha earthquake, our 30 approach identifies: slope angle; distance from fault plane, rivers and river confluences; hillslope and sub-hillslope relief; and planform curvature as the key pre-disposing factors influencing landsliding during this event. The model output suggests that landsliding during the event was strongly anticipated in the major river valleys in Gorkha, Dhading, Rasuwa, and Sindhupalchok districts, agreeing with observed patterns of landsliding. Calculating the kernel density of cells in the 99 th percentile of modelled hazard values is able to accurately and simultaneously represent both the spatial pattern and relative intensity of landsliding during the event. Assessments of the minimum number of landslides required to achieve these results suggest that just several hundred landslides throughout the affected area need be identified for the method to be successfully 5 applied. Consequently, this suggests that systematic high fidelity mapping of landslides following an earthquake may not be necessary for informing rapid modelling attempts. Instead, mapping should focus on rapid sampling from grid squares covering the entire affected area and using the results to inform the model herein. Doing so could potentially have allowed this model to have been implemented within hours to days after the Gorkha earthquake occurred, permitting results to be available to emergency responders far earlier than allowed by traditional mapping techniques. 10 The method herein improves upon other approaches which have been shown to accurately simulate either landslide intensity or distribution, but not both simultaneously. A key strength of our study is the use of open access data sources, allowing the model to be implemented globally without the need for pre-event collection and storage of data. This may therefore present a useful approach for rapidly assessing landslide hazard following an earthquake to effectively inform decision makers during emergency response operations. 15 Author contribution. TRR conceived and designed the model with contributions from NJR and ALD. JGW, MEK, JB, and HJAB undertook the collection and preparation of imagery and mapped the landslide data. TRR undertook the data analysis and modelling, and prepared the manuscript with contributions from all co-authors.