Sensitivity analysis and calibration of a dynamic physically based slope stability model

Physically based modelling of slope stability on a catchment scale is still a challenging task. When applying a physically based model on such a scale (1 : 10 000 to 1 : 50 000), parameters with a high impact on the model result should be calibrated to account for (i) the spatial variability of parameter values, (ii) shortcomings of the selected model, (iii) uncertainties of laboratory tests and field measurements or (iv) parameters that cannot be derived experimentally or measured in the field (e.g. calibration constants). While systematic parameter calibration is a common task in hydrological modelling, this is rarely done using physically based slope stability models. In the present study a dynamic, physically based, coupled hydrological–geomechanical slope stability model is calibrated based on a limited number of laboratory tests and a detailed multitemporal shallow landslide inventory covering two landslide-triggering rainfall events in the Laternser valley, Vorarlberg (Austria). Sensitive parameters are identified based on a local one-at-a-time sensitivity analysis. These parameters (hydraulic conductivity, specific storage, angle of internal friction for effective stress, cohesion for effective stress) are systematically sampled and calibrated for a landslide-triggering rainfall event in August 2005. The identified model ensemble, including 25 “behavioural model runs” with the highest portion of correctly predicted landslides and non-landslides, is then validated with another landslide-triggering rainfall event in May 1999. The identified model ensemble correctly predicts the location and the supposed triggering timing of 73.0 % of the observed landslides triggered in August 2005 and 91.5 % of the observed landslides triggered in May 1999. Results of the model ensemble driven with raised precipitation input reveal a slight increase in areas potentially affected by slope failure. At the same time, the peak run-off increases more markedly, suggesting that precipitation intensities during the investigated landslide-triggering rainfall events were already close to or above the soil’s infiltration capacity.


Introduction
Shallow landslides are abundant geomorphological phenomena in mountain regions across the world.The related processes are usually understood as translational sliding movements of soil material along a pre-defined slip surface at a depth of up to 2 m (Cruden and Varnes, 1996;Lateltin et al., 2005).In Austria, shallow landslides are typically triggered by heavy rainfalls (Andrecs et al., 2002;Markart et al., 2007;Zieher et al., 2016), causing damages to residential struc-T.Zieher et al.: Calibration of a dynamic physically based slope stability model tures and infrastructure, as well as a loss of agricultural land.To prevent future impacts, it is essential to identify potentially affected areas.For this task, various modelling techniques are currently applied, including (i) expert-based (e.g.Kienholz, 1977), (ii) statistically based (e.g.Carrara et al., 1991) and (iii) physically based approaches (e.g.Baum et al., 2010).The latter ones are typically based on the limit equilibrium concept and employ physical laws to relate resisting to driving forces.Their result is a dimensionless factor of safety (FOS), which is a quantitative measure of slope stability.Many physically based approaches include a hydrological and a geomechanical model element and can be further divided into (i) steady-state (e.g.Dietrich and Montgomery, 1998;Montgomery and Dietrich, 1994) and (ii) dynamic models (e.g.Baum et al., 2010;Crosta and Frattini, 2003).In contrast to steady-state models, dynamic models allow for the spatio-temporal assessment of hillslope hydrology and stability.Physically based slope stability models can be upscaled to medium scale (1 : 10 000 to 1 : 50 000) using a raster-based geographical information system (GIS).However, such spatially distributed models require data on the spatial distribution of the included parameters (van Westen et al., 2006).To overcome the problem of usually unknown material characteristics throughout the study area, probabilistic approaches have proven feasible (Hammond et al., 1992;Raia et al., 2014).
Before applying a spatially distributed physically based model, parameter values are often calibrated to minimize the difference between observations and simulation results.One way of achieving this is to vary the model input parameter values in order to find optimum values or value ranges which yield a general agreement between observations and simulations (back calculation).This task is common in hydrological modelling involving a high-dimensional parameter space (e.g.Dobler and Pappenberger, 2013;Tang et al., 2007).The underlying principles also apply to physically based slope stability models.Theoretically, calibration is not necessary as long as the parameter values are based on a sufficient number of direct measurements or laboratory tests.However, a calibration is advisable (i) if the spatial distribution and variability of parameter values is unknown, (ii) to account for model shortcomings compared to the represented physical processes, (iii) to account for uncertainties of laboratory tests and field measurements or (iv) if parameter values cannot be derived experimentally or measured in the field (e.g.calibration constants).The calibration procedure should be based on physical reasoning and only involve sensitive parameters (i.e.parameters with a distinct impact on the model's outcome) (Bathurst et al., 2005;Wagener and Kollat, 2007).To identify sensitive parameters, a sensitivity analysis is usually performed.A simple but often applied method is based on the local assessment (one representative raster cell) of the impact of systematic variations of one-parameter-at-a-time (OAT) on the model's results (e.g.Hammond et al., 1992).This method is also frequently used for parameter value cali-bration (e.g.Gioia et al., 2016;Salciarini et al., 2006).However, the OAT assessment of parameter sensitivity becomes unreliable with an increasing number of considered parameters, correlated parameters and non-linear model behaviour (Wagener and Kollat, 2007).As an alternative, global methods which cover the whole parameter space can overcome this drawback (Dobler and Pappenberger, 2013;Tang et al., 2007).Their main disadvantage is the high computational effort, usually requiring a high-performance computing cluster (HPCC).Depending on the sampling technique, a multitude of parameter value combinations is tested and evaluated based on observations.However, instead of identifying a single parameter set which explains the observations best, an ensemble of "behavioural model runs" is often used for the final prediction.These model runs are in general agreement with the observations, while their disagreement reflects model uncertainty (Bathurst et al., 2005;Wagener and Kollat, 2007).
In the present study, the parameters of a revised form of the spatially distributed, dynamic, physically based slope stability model TRIGRS 2.0 (Transient Rainfall Infiltration and Grid-Based Regional Slope-Stability Analysis; Baum et al., 2008Baum et al., , 2010) ) are systematically tested and calibrated.The four main steps of the analysis are shown in Fig. 1.First, sensitive parameters of the revised model are identified with a local OAT sensitivity analysis.The tested parameter space is derived from a limited number of laboratory tests and relevant literature.Then, the four identified sensitive parameters (hydraulic conductivity, specific storage, angle of internal friction for effective stress, cohesion for effective stress) are systematically sampled from a uniform distribution.Unlike in probabilistic parameter sampling strategies (e.g.Raia et al., 2014), the parameters are sampled with defined, constant increments.In the calibration procedure, the best 25 "behavioural model runs" are identified out of 10 000 conducted simulations considering each sampled parameter value combination.The ensemble of these 25 model runs optimally predicts the location and the supposed triggering timing of observed shallow landslides, triggered during a rainfall event in August 2005.The predictive performance of this model ensemble is then tested for another landslidetriggering rainfall event which occurred in May 1999.Finally, the model ensemble is re-run with positively scaled input precipitation maps to give an estimate of potential impacts of increasing precipitation intensities on slope stability.
The objectives of the present study are  to evaluate the capability of the identified model ensemble for quantifying potential changes in slope stability associated with increasing precipitation intensity.

Study area
The study area is located in the Laternser valley in Vorarlberg, the westernmost province of Austria (Fig. 2a).It covers the catchment area (52.1 km 2 ) of the river Frutz, a tributary of the Rhine.The valley extends about 13 km in the east-west direction, following the strike angle of the Bregenzerwald Mountains.Its highest point is the Hoher Freschen (2004 m) at the head of the valley.The outlet at approximately 500 m is characterized by a steeply incised gorge.In the Laternser valley about half of the catchment area is covered by forest (2001: 51.0 %; 2006: 50.9 %).A majority of the forest stands are composed of fir (Abies alba Miller) and spruce (Picea abies L. Karsten), with beech (Fagus sylvatica L.) occurring below 1300 m (Amann et al., 2014).Around 1.2 % of the catchment area is occupied by settlements and infrastructure.The remaining area is predominantly used as hay meadow or pasture or a combination of both.

Geology
The Laternser valley is built up by different tectonic units, including a variety of geological units (Fig. 2c, Table 1).Helvetic nappes in the western and northern part of the valley include competent limestones (e.g.Schrattenkalk, Seewerkalk) and marls with calcareous layers (e.g.Drusbergschichten).
To the south-east, Ultrahelvetic nappes are superimposed, which are mainly built up of clayey marls and shales (e.g.Leimernmergel).On top in the south-east of the catchment area, Penninic nappes make up more than half of the val-ley.These nappes include mainly sandstones (e.g.Reiselsberger Sandstein, Planknerbrückenserie) and thinly layered marls (e.g.Piesenkopfschichten) (Friebe, 2007;Heissel et al., 1967;Oberhauser, 1982Oberhauser, , 1998)).Widespread till deposits and hillside debris cover more than 57 % of the catchment area.These units are overly susceptible to shallow landsliding (Zieher et al., 2016).In numerous cases, subglacial till is reported to act as an impermeable layer and slip surface for the unconsolidated material on top.Furthermore, marls of the Ultrahelvetic nappes, as well as less competent sandstones of the Penninic nappes, are particularly susceptible to shallow landsliding (Zieher et al., 2016).

Climate
Oceanic air masses advecting from the north-west dominate the warm temperate climate of Vorarlberg.On the Alpine rim in northern Vorarlberg, precipitation amounts are higher due to blocking of the inflowing air masses (Werner and Auer, 2001a, b).Because of the valley's orientation, it is prone to north and north-westerly weather conditions.At Innerlaterns station (location mapped in Fig. 2c), mean annual precipitation exceeds 1800 mm a −1 (period 1981-2010).Considering a potential evaporation in Vorarlberg on the order of 600 mm a −1 (Werner and Auer, 2001a), a year-round high amount of seepage water can be assumed.
On the synoptic scale, the landslide-triggering rainfall events in May 1999 and August 2005 occurred in the course of so-called Vb weather situations (van Bebber, 1891;Formayer and Kromp-Kolb, 2009).Such synoptic meteorological situations are characterized by a low forming south of the Alps, subsequently moving to the north-east.The moisture taken up over the Mediterranean and Adriatic Sea is transported to eastern-central Europe, potentially causing heavy rainfalls in large parts of Austria (Seibert et al., 2007).

Landslide-triggering rainfall events
Figure 3a, b show the daily and cumulative deviation of precipitation from the long-term mean (1981-2010) covering 1 year before the landslide-triggering rainfall events in May 1999 and August 2005 for the region around the Laternser valley.For the period of June 1998 to January 1999, the cumulative deviation of precipitation was balanced overall, including a dry August and a wet September and October (Fig. 3a).Afterwards, particularly the second half of February 1999 was exceptionally wet.Locally, fresh snow depth exceeded 2 m within 3 days, leading to catastrophic snow avalanches (Bollinger et al., 2000;Heumader, 2000).In March and April 1999, precipitation corresponded to the long-term mean, but precipitation in February and April provided an elevated level of the cumulative deviation.From 11 to 14 May, a rainfall event with a total sum of 144.4 mm occurred.No shallow landslides are reported for this event.However, increased soil moisture must be assumed before www.nat-hazards-earth-syst-sci.net/17/971/2017/Nat.Hazards Earth Syst.Sci., 17, 971-992, 2017 Table 1.Information on the geological units shown in Fig. 2c and their respective lithology (Heissel et al., 1967;Oberhauser, 1958Oberhauser, , 1982;;Friebe, 2007).Only geological units covering more than 1 % of the catchment area are listed.The slope angle map is based on a digital terrain model derived from airborne laser scanning (ALS) in 2011, serving as input data for modelling (resampled to a spatial resolution of 10 m).The box plots show the slope angle distribution for forest and non-forest areas.In the geological map only geological units covering more than 1 % of the catchment area are listed in the legend (data source: Heissel et al., 1967;Oberhauser, 1982).The shallow landslide inventory shows landslides triggered by the rainfall events in May 1999 (82; yellow) and August 2005 (356; red) occurring on undisturbed hillside slopes (Zieher et al., 2016).The areas covered by forest were derived from ALS data acquired in 2011.
the onset of the landslide-triggering rainfall event on 21-22 May, with a total sum between 134.0 mm at Frastanz station and 212.8 mm at Thüringen station (Fig. 3c).Monthly precipitation sums from November 2004 to June 2005 generally fell below the long-term mean, except for February and May (Fig. 3b).Therefore it can be expected that no exceptional antecedent soil moisture preceded the rainfall event in August.However, the amount of precipitation in July and the first half of August corresponds to the long-term mean.Therefore, no exceptionally dry conditions preceded the landslide-triggering rainfall event.After days with repeated minor rainfalls, a phase of intense precipitation started on 22 August.At Innerlaterns station, the 24 h cumulative sum amounted to 244 mm.The highest precipitation intensity was recorded in the late evening on 22 August and during the night (21:00 to 22:00: 19.4 mm h −1 ).The triggering time of four landslides was reconstructed from protocols of the local voluntary fire brigade (Fig. 3d).Most landslides occurred over the course of the night from 22 to 23 August.

Shallow landslide inventory
A comprehensive shallow landslide inventory was compiled for the catchment area of the Laternser valley, based on the systematic interpretation of nine orthophoto series covering the period from 1950 to 2012 (Zieher et al., 2016).Landslide mapping was aided by digital terrain models (DTMs) derived from two airborne laser scanning (ALS) campaigns and their differential digital terrain model (dDTM).In addition, data from two field surveys conducted immediately after two landslide-triggering rainfall events in May 1999 and August 2005 and associated archive data were included in the inventory.In total, 82 shallow landslides attributed to the rainfall event in May 1999 and 356 shallow landslides triggered in August 2005 were used for this study (Fig. 2d).Only rainfalltriggered shallow landslides which occurred on undisturbed hillside slopes were considered.They account for three quarters of the observed landslides for both rainfall events.Observed shallow landslides on other slope types may involve additional causative factors for slope failure, which are not included in the model (e.g.weakened foot slope).Of the considered landslides, 28 (34.1 %; May 1999) and 88 (24.7 %; August 2005) are located within forests.

TRIGRS 2.0 model
The dynamic, physically based, coupled hydrologicalgeomechanical model TRIGRS 2.0 was developed by Baum et al. (2008Baum et al. ( , 2010) ) and is written in the Fortran programming language (USGS, 2016).TRIGRS 2.0 is based on a raster environment and implements a hydrological model element (a run-off model and two types of infiltration models) and a geomechanical model element (infinite slope stability model).It is suitable for modelling the spatio-temporal progression of slope stability in the course of rainfall events with a duration of up to a few days (Baum et al., 2010).
In the model, the infiltration process and associated effects on slope stability are computed dynamically for each raster cell in defined time intervals.Run-off R d is routed downslope from raster cells where the precipitation intensity P plus the incoming run-off R u from adjacent raster cells above exceed the infiltration capacity (equal to the hydraulic conductivity K s ; Baum et al., 2008): However, the amount of run-off is not passed on to the next time interval.The available amount of water ready for infiltration on each raster cell is passed on to the infiltration model.For tension-saturated initial conditions, a generalized pore pressure diffusion model after Iverson (2000) can be applied.The predictive performance of Iverson's model has been tested in the Laternser valley on a plot scale (Zieher et al., 2017).For unsaturated conditions, an analytical solution for unsaturated flow following Srivastava and Yeh (1991) can be applied.However, the exponential model describing the soil water retention curve (Gardner, 1958) used for linearizing Richard's equation is considered suitable for coarse-grained materials (Baum et al., 2008) and hence not suitable for the application in the Laternser valley.The details of the infiltration models have been presented in previous studies (e.g.Baum et al., 2010;Iverson, 2000;Kim et al., 2013;Park et al., 2013;Salciarini et al., 2006).The result of both infiltration models is the evolution of pore pressures with depth and time as a response to the infiltration of timevarying precipitation.Pore pressures ψ(d, t) are passed on to the infinite slope stability model relating driving to resisting stresses (FOS): However, the original version of TRIGRS 2.0 does not account for effects of vegetation.Kim et al. (2013) extended the model to include vegetation effects on hydrology and slope stability.They conclude that root reinforcement and tree surcharge can affect slope stability, while interception has only minor effects during landslide-triggering rainfall events.Following Kim et al. (2013), lateral root cohesion c r (Pa) and tree surcharge s t (Pa) were added to Eq. ( 2): Instead of adding a constant value for c r (e.g.Kim et al., 2013), a linear decrease of c r with depth up to a given rooting depth d r (m) was assumed, accounting for the distribution of roots with depth as observed in other studies (e.g.Bischetti et al., 2005Bischetti et al., , 2009)).If the rooting depth exceeds the regolith depth, c r is only considered down to the regolith-bedrock interface (roots are not expected to penetrate the bedrock).For the revised form of TRIGRS 2.0, three additional parameters (c r , s t and d r ) must be given.The three parameters are allowed to vary spatially and can be prepared as parameter maps.

Model parameters
Table 2 shows the required parameters and their values considered in previous studies with the original TRIGRS model (versions 1.0 and 2.0) and a revised form (Kim et al., 2013).In the cited studies, the time-varying precipitation intensities are derived from meteorological stations in or near the study area.The slope angle maps are calculated using digital elevation models (based on interpolated contour lines) of various spatial resolutions.Regolith depth maps are prepared as a function of the slope angle (Salciarini et al., 2006), using a geomorphologically indexed model (Zizioli et al., 2013), using a spline interpolation of direct measurements (Kim et al., 2013) and with spatially constant values (Park et al., 2013;Vieira et al., 2010).The initial depth of the water table d wi (positive in downward direction) is assumed to be either at the regolith-bedrock interface (Kim et al., 2013;Park et al., 2013;Vieira et al., 2010) or at a depth relative to it (Salciarini et al., 2006;Zizioli et al., 2013).For the background infiltration rate describing a steady-state infiltration component, constant values (e.g.Kim et al., 2013;Vieira et al., 2010) or multiples of K s (e.g. Park et al., 2013) were used.
For the landslide-triggering rainfall events considered in the present study, hourly precipitation maps were prepared for the whole province of Vorarlberg.Based on hourly precipitation records from available meteorological stations throughout the province, hourly precipitation maps were generated using a spline interpolation.Figure 4 shows the respective time series and the resulting cumulative precipitation maps for the Laternser valley.The temporal course of the precipitation intensities differs distinctly (August 2005: short and intense; May 1999: prolonged and less intense), while cumulative precipitation sums over the considered duration are of the same order (May 1999: 263 mm; August 2005: 252 mm).For modelling the temporal evolution of slope stability, FOS maps were computed for nine (May 1999; Fig. 4a) and seven (August 2005; Fig. 4c) time steps with intervals of 9 h to completely cover both rainfall events.
The required slope angle map (Fig. 2b) was derived areawide from a DTM after Wood (1996).The DTM was generated with ALS data acquired in 2011, with a reported accuracy of 10 cm horizontally and 7.5 cm vertically (Wiedenhöft and Vatslid, 2014).The data quality of the DTM from 2011 exceeds the quality of the DTM from 2004, particularly in ar-   eas covered by forest.The spatial resolution of the prepared parameter maps was set to 10 m with regard to the most abundant size of observed landslide scar areas, which is on the order of 100 m 2 (Zieher et al., 2016).Furthermore, the chosen spatial resolution was considered a compromise between the topographical representation of the surface, the computational efficiency for the modelling and the required minimum length-to-depth ratio (on the order of 8 : 1) for the application of the infinite slope stability model (Milledge et al., 2012).
Regolith depth, also referred to as soil depth (e.g.Lanni et al., 2013) or soil thickness (e.g.Catani et al., 2010;Segoni et al., 2012), is still one of the most difficult and laborious parameters to measure on a catchment scale, yet it is crucial for physically based modelling of slope stability (Dietrich et al., 1995;Lanni et al., 2012;Segoni et al., 2012).It is defined as the thickness of unconsolidated material covering the earth's surface, i.e. the depth from surface to bedrock (Fairbridge, 1968).Regolith depth can be assessed by (i) direct measurements (e.g.Lanni et al., 2012;Wiegand et al., 2013), (ii) means of geophysics (e.g.Davis and Annan, 1989;Sass, 2007) and (iii) modelling (e.g.Dietrich et al., 1995;Heimsath et al., 1997).Furthermore, the depth of past landslides can be derived from multitemporal, remotely sensed elevation data (Zieher et al., 2016).For regolith depth mapping, regression models correlating regolith depth to either elevation, slope angle or other derivatives were used in previous case studies on shallow landslide susceptibility (Baum et al., 2010;Lanni et al., 2012;Salciarini et al., 2006;Segoni et al., 2012).For the assessment of regolith depth in the Laternser valley, 126 dynamic cone penetration tests (DCPTs) were conducted along four transects.A lightweight dynamic cone penetrometer with a 10 kg hammer dropped from a height of 0.5 m onto an anvil of 6 kg was used (e.g.Wiegand et al., 2013).Following ÖNORM EN ISO 22476-2:2012, the number of strokes for penetrating vertical increments of 10 cm was recorded in the field.After completing 50 strokes, the penetration tests were stopped if the penetrated increment was less than 10 cm (ÖNORM EN ISO 22476-2:2012).The final depth was recorded to the nearest centimetre, with the maximum detectable depth of 6.0 m exceeded only once.Furthermore, the maximum vertical depths of 96 shallow landslides triggered on 21-22 May 1999 and of 249 shallow landslides triggered on 22-23 August 2005 are available for validation (Fig. 5b).The landslide depths were measured in the field after the triggering event in May 1999 (Andrecs et al., 2002) and derived from the analysis of a dDTM for the landslides triggered in August 2005 (Zieher et al., 2016).The final depths of the DCPTs were used to train generalized linear models (GLMs) with local morphometric parameters as predictors, including elevation, slope angle, minimum and maximum curvature (Wood, 1996), and the topographic wetness index (Beven and Kirkby, 1979).A stepwise backward predictor selection revealed a linear model with the slope angle yielding the best agreement with the cumulative landslide depths from 1999 and 2005 (Fig. 5a).It outperforms the curvature and the combined slope angle-curvature model, particularly for depths below 2.0 m.The resulting empirical re-lationship for regolith depth d max and the slope angle β is The derived regolith depth map (Fig. 5b) also matches the field observation that on slopes which are inclined more than approximately 60 • the surficial cover of unconsolidated material is of minor depth or not present at all.Furthermore, on very steep slopes there is a transition from sliding to toppling and falling as the predominant types of failures (Baum et al., 2010).
For the derivation of the geotechnical and hydrological parameter values suitable for the Laternser valley, a limited number of laboratory tests were conducted.On the southfacing slopes of the study area, geotechnical samples were collected from eight sites where shallow landslides had been triggered in 1999 (BIN-02), 2002 (ROH-01), 2005 (BIN-01, BON, MAZ, REU, ROH-02) and 2013 (INN), close to populated areas in the Laternser valley (Fig. 2c, Table 3).The abbreviations were chosen according to the closest settlements (BIN: Bingadels; BON: Bonacker; INN: Innerlaterns; MAZ: Mazona; REU: Reute; ROH: Rohnen).In the geological map (Fig. 2c), the sampled sites are mapped as hillslope debris (BIN-01, BIN-02), till deposits (INN, MAZ, REU, ROH-01), Leimernmergel (BON) and Drusbergschichten (ROH-02).Back walls were laid open at the top of the landslide scarps.Two undisturbed and one disturbed sample were taken at two depths at each site except for location ROH-02.There, samples of one depth were considered sufficient because of the homogeneously structured regolith.The undisturbed samples were collected with the help of core cutters (diameter 9.6 cm) and stored airtight.Furthermore, buckets of material were taken from the respective depths.The grain size distributions (Fig. 6b), wet and dry bulk densities and water contents were determined for all samples.With the lower samples, geotechnical parameters (ϕ , c , Atterberg limits) were derived from the respective laboratory tests (Fig. 6a, d).The upper samples were used to obtain estimates for the specific storage S s , based on the constrained modulus E s (Pa) derived from oedometer tests (Rowe and Barden, 1966).The respective val-ues for S s (m −1 ) were derived from where ρ w (kg m −3 ) is the density of water, g is the acceleration of gravity (9.81 m s −2 ), n is porosity, β w is the compressibility of water (4.4 × 10 −10 m 2 N −1 ) and α s (m 2 N −1 ) is the compressibility of bulk soil (Fig. 6c), derived from where v is Poisson's ratio, for which a constant value of 1/3 was assumed (e.g.Lu and Godt, 2013;Schmidt et al., 2014).E s depends on the prevailing stress level (i.e.overburden height; Schmidt et al., 2014) and was derived for a depth of 1-2 m (e.g.Berti and Simoni, 2010).The hydraulic diffusivity D 0 (m 2 s −1 ) was derived from However, K s was not tested in the field or laboratory.Its parameter values were calibrated over several orders of magnitude.The background infiltration rate was set to zero to consider a conservative estimate of pore pressure conditions assuming a slope-parallel groundwater flow (Baum et al., 2008(Baum et al., , 2010)).
For the parameters representing the effects of vegetation on slope stability in the revised model, spatially constant parameter values are assumed within the area covered by forest.A conservative set of parameter values is derived from respective literature with c r set to 2.5 kPa (e.g.Bischetti et al., 2009;Steinacher et al., 2009), s t set to 2.5 kPa (e.g.Steinacher et al., 2009) and d r set to 1.0 m (e.g.Bischetti et al., 2009;Kutschera and Lichtenegger, 2002).However, these values were only applied within areas covered by forest.A forest cover map was prepared, based on the normalized digital surface model (nDSM) derived from the ALS data from 2011.The areas covered by forest for the time of the two landslide-triggering rainfall events in August 2005 and May 1999 was adapted manually, using highresolution orthophotos from 2006 (ground sampling distance of 0.125 m) and 2001 (ground sampling distance of 0.25 m) respectively.

One-parameter-at-a-time sensitivity analysis
Following Hammond et al. (1992), the model's sensitivity against each parameter is tested individually.For each parameter, central, minimum and maximum values are defined based on laboratory tests, field investigations and respective literature (Table 4).The resulting FOS p i for each parameter p i sampled over the specified range is related to the respective FOS p central based on the defined central parameter values: The resulting relative deviation FOS reflects the model's sensitivity against each parameter.However, interactions between parameters are not considered (Dobler and Pappenberger, 2013;Hammond et al., 1992).

Parameter calibration and validation
In previous studies, local OAT parameter tests were used for the calibration of parameter values (e.g.Gioia et al., 2016).In the present study, the calibration of the four identified sensitive parameters (ϕ , c , K s , S s ; Sect.4.1) is based on systematic testing of parameter value combinations for the whole catchment area (global calibration), computed with a HPCC (162 nodes, 1.944 Intel Xeon Gulftown compute cores).For each parameter, 10 values are sampled from a uniform distribution in equal increments from the defined minimum to maximum (e.g.Beven and Freer, 2001).Because of the limited number of laboratory tests, it is not possible to infer probability distributions of the parameter values.The hydrological parameters are sampled on the logarithmic scale (Table 5).
The predictive performance of each FOS map resulting from the 10 000 calibration runs with seven time steps each (514.9GB of data) was assessed with the receiver operating characteristic (ROC) principle (Begueria, 2006).Using physically based slope stability models, a FOS < 1.0 indicates a potential slope failure, while a FOS ≥ 1.0 suggests a stable slope.The coordinates of the point in the ROC plot where the FOS falls below 1.0 represent the correctly predicted fractions of observed landslides (true positives; TP) and non-landslides (true negatives; TN).The basic idea of the calibration procedure is to identify parameter value combinations which result in an optimum prediction of observed landslides and non-landslides, at a FOS threshold falling below 1.0, by minimizing the distance to the perfect classification (D2PC; Formetta et al., 2016;Mergili et al., 2017;Fig. 7).Data processing and analysing included the open source GRASS GIS 6.4 (GRASS Development Team, 2014), Python 2.7 programming language (Python Software Foundation, 2016) and R statistical software (R Core Team, 2016).
The identification of "behavioural model runs" out of the 10 000 calibration runs is based on the following observations and assumptions: 1.At the beginning of the simulations, the slopes throughout the Laternser valley must be stable (FOS ≥ 1.0).
2. Most shallow landslides were triggered after the highest precipitation intensity occurred (FOS falls below 1.0).
3. Optimum parameter values can be derived from the simulations which correctly predict the most observed landslides and non-landslides (minimized D2PC) while satisfying the first two assumptions.
The necessary observations for the assessment of the predictive performance are obtained from the shallow landslide www.nat-hazards-earth-syst-sci.net/17/971/2017/Nat.Hazards Earth Syst.Sci., 17, 971-992, 2017  inventory.For 261 out of 356 shallow landslides triggered in August 2005, the scar areas are available, delineated with the help of a dDTM (Zieher et al., 2016).A shallow landslide is regarded as correctly predicted if the FOS falls below 1.0 in at least one raster cell intersecting the scar area.This strategy was chosen because of the discrepancy between the regular raster environment (input and output maps) and the mapped shallow landslide scar area polygons.The spatial resolution of 10 m results from a compromise between the size of most shallow landslide scar areas, the constraints of the infinite slope stability model and the representation of the topography.However, it remains unknown which pixel represents an actually observed shallow landslide.This results from positional uncertainties of the involved data sets,   as well as from the smoothed representation of the topography associated with the coarse raster resolution.It is therefore assumed that the raster cell with the lowest FOS intersecting the scar area polygon represents the respective landslide (e.g.Montgomery and Dietrich, 1994;Casadei et al., 2003;Keijsers et al., 2011).For landslides with no scar area mapped (95 landslides triggered in August 2005, landslides triggered in May 1999), a planimetric circle with a radius of 5.6 m (resulting in an area of 100 m 2 ) around the scar point (mapped in the visual centre of the scar areas) is used instead.

One-parameter-at-a-time sensitivity analysis
The OAT sensitivity analysis of the geomechanical model element's parameters reveals that an increase in parameter values can have positive (ϕ , c and c r ) and negative effects (β, d max , s t ) on slope stability (Fig. 8a).For the tested parameterization, variations of d r do not show effects on the FOS.In the calibration procedure, the parameters representing the effects of vegetation are kept constant within the respective areas covered by forest, while the parameters ϕ and c are tested systematically.
For the parameters of the hydrological model element, the sensitivity analysis is based on the precipitation time series from 22 to 23 August 2005 to account for time-dependent responses.The model's sensitivity against the precipitation input is tested with scaled time series of this rainfall event.Depending on the previous precipitation input, the parameters K s , S s and d wi have different effects on the resulting FOS.Reducing K s by 2 orders of magnitude, the FOS increases up to 30 % due to the lowered infiltration, while higher parameter values result in a reduced FOS as reaction to the enhanced infiltration.The magnitude of S s essentially controls the temporal dynamics of the modelled infiltration process.By reducing S s , the value of D 0 increases (Eq.7), leading to a quicker infiltration of the precipitation input.Thus, lowering the S s by 1 order of magnitude leads to a reduction of the FOS by more than 20 %, while higher parameter values lead to an enhanced FOS.Decreasing the d wi by 100 % (initial water table at the surface) results in a reduced FOS by 24 %.Compared to the other hydrological parameters, the model's sensitivity against the scaled precipitation time series is lower.By varying the precipitation input within a range of ±50 %, the resulting FOS changes by −4 to +9 %.The precipitation input is given by the interpolated hourly precipitation sums and the d wi is set to the regolith-bedrock interface for the calibration procedure with the rainfall event in August 2005, while the parameters K s and S s are tested systematically.

Calibration with the landslide-triggering rainfall event in August 2005
The temporal prediction rates and the respective coordinates for a FOS falling below 1.0 in the ROC plot for the shallow landslides triggered on 22-23 August 2005 are shown in Fig. 9. Table 6 shows the respective minimum and maximum prediction rates for the calibration steps.The value ranges are presented for the best-performing output time step of each simulation.The D2PC is given for the coordinates of the FOS falling below 1.0, while the area under the ROC curve (AUC) as a measure of the overall predictive performance (Begueria, 2006) is based on the full FOS range of the resulting maps.Considering all 10 000 calibration runs (Fig. 9a), many parameter combinations yield completely stable conditions over all computed time steps (no correctly predicted landslides; TPR = 0.0 %) but also to unstable conditions at time step t = 0. Allowing for 0.5 % of the catchment area to fail at time step t = 0, 7300 calibration runs remain (Fig. 9b).However, many of the remaining calibration runs predict slope failures before the onset of the landslide-triggering rainfall event.Assuming that most shallow landslides were triggered   4.
after the maximum precipitation intensity, 1134 calibration runs remain (Fig. 9c).Several of these remaining calibration runs do not predict any of the observed shallow landslides over time (TPR = 0.0 %).Therefore, the 25 calibration runs with the highest sum of correctly predicted landslides and non-landslides are selected, while minimizing the D2PC ("behavioural model runs", Fig. 9d).With these model runs, the location and the supposed triggering timing of 46.6 to 70.5 % of the observed shallow landslides can be predicted, while 71.0 to 90.3 % of the observed non-landslides remain stable.It is assumed that this identified model ensemble is able to represent the spatial and temporal occurrence of shallow landslides triggered on 22-23 August 2005.The resulting parameter value combinations are regarded as best for the dynamic modelling of slope stability in the Laternser valley.

Validation with the landslide-triggering rainfall event in May 1999
To test the identified model ensemble's predictive performance, it is applied for the landslide-triggering rainfall event in May 1999.Despite the different nature of the rainfall events (August 2005: short and intense; May 1999: prolonged and less intense), most landslides are again predicted after the highest precipitation intensity (time step 6; after 45 h; Fig. 10).Hence, assuming that the landslides observed for the rainfall event on 21-22 May 1999 were triggered after the maximum precipitation intensity occurred, the model ensemble is able to predict the location and the supposed triggering timing of most of these landslides.However, the melting of the accumulated snow from the preceding winter may have led to an enhanced soil moisture and a rise of the water table.Therefore, three scenarios for the d wi were considered (100, 75 and 50 % of the regolith depth; Fig. 10, Table 7).
Assuming the d wi to be at the regolith-bedrock interface, between 43.9 and 79.3 % of the observed landslides are pre-dicted correctly.Increasing the d wi to 75 % of the regolith depth, the true positive rate rises to 51.2-89.0 with up to 4.9 % of the landslides predicted at t = 0.By further increasing the d wi to 50 % of the regolith depth, the true positive rate rises to 58.5-95.1 %, while up to 30.3 % of the landslides are predicted at t = 0. Setting d wi to 75 % of the regolith depth is therefore considered adequate for simulating slope stability for the landslide-triggering rainfall event in May 1999.8. Generally, the model ensemble is better at predicting the landslides triggered in May 1999.However, non-landslides are better predicted for the rainfall event from August 2005.

Comparison of the model ensemble's predictive performance
In total, the model ensemble correctly predicts 73.0 % of the landslides triggered in August 2005 (landslides, which are predicted correctly by at least one ensemble model run).

Calibrated parameter values
Unlike the OAT sensitivity analysis, the presented calibration procedure can reveal parameter interactions.The calibrated parameter values are shown in (Fig. 12).For the geotechnical parameters, ranges of 21-35 • for the angle of internal friction for effective stress and 4-8 kPa for the cohesion for effective stress are optimum value ranges.The results of four of the eight conducted shear tests are within these ranges.Furthermore, the distribution of the calibrated geotechnical parameters suggests that lower angles of internal friction for effective stress can be compensated by increasing the cohesion for effective stress and vice versa.This can be expected from Eqs. ( 2) and (3).In case of the hydrological parameters, the calibration procedure reveals optimal value ranges between 10 −6 and 10 −5 m s −1 for the hydraulic conductivity and between 10 −2 and 10 −1 m −1 for the specific storage.Compared to the experimentally derived range of the specific storage, the calibrated parameter values show a tendency towards higher values.The resulting hydraulic diffusivity (Eq.7) is in the range of 10 −5 -10 −3 m 2 s −1 .These ranges theoretically cover a variety of materials, from sands to clays (e.g.Prinz and Strauß, 2011).

Model ensemble's sensitivity against increased precipitation intensity
According to the Austrian Assessment Report (Kromp-Kolb et al., 2014), frequency and magnitude of extreme precipitation events are expected to increase over Austria in a future climate.Using the model ensemble, the impact of increasing precipitation intensity on shallow landslide susceptibility is assessed.The precipitation input from August 2005 is scaled up to 125 % in increments of 5 % (Fig. 13a).The resulting change in the proportion of unstable areas is shown in Fig. 13b.It increases from 7.6 % (±2.4 %; 1 standard deviation) for the original rainfall event in August 2005 to 8.5 % (±2.7 %) for the same rainfall event scaled to 125 %.At the same time, the predicted mean surface run-off observed after 40 h (time step with the highest run-off) increases distinctly.It rises from 9.8 × 10 −4 m s −1 (±1.3 × 10 −3 m s −1 ; 1 standard deviation) for the original rainfall event in August 2005 to 1.7 × 10 −3 m s −1 (±1.8 × 10 −3 m s −1 ) for the same rainfall event scaled to 125 %.This  is an increase of 76.0 % compared to the run-off generated with the original rainfall input from August 2005 (Fig. 13c).

Discussion
The OAT sensitivity analysis reveals a high impact of the slope angle and the regolith depth on the resulting FOS.The slope angle map is derived area-wide from a DTM based on ALS data.Their accuracy is considered sufficient for the derivation of slope angles at a spatial resolution of 10 m.However, resulting slope angles may differ, depending on the respective calculation method (e.g.Horn, 1981;Wood, 1996).The regolith depth map used in this study is based on a linear model with the slope angle as the only predictor.It is shown that this model is suitable to predict the cumulative distribution of regolith depth for depths up to 2.0 m.However, its spatial distribution may be better reproduced with techniques including further predictors, like geomorphology or land cover (e.g.Catani et al., 2010;Tesfa et al., 2009).
Compared to the impact of the geotechnical parameters, the effect of the vegetation parameters is rather small.This can be attributed to the conservative set of parameter values assumed for the three vegetation parameters.
For the calibration procedure, the tested parameters are assumed to be constant throughout the catchment area.In other studies, property zones according to the geological substratum are defined with varying parameter value ranges.However, for the proposed calibration procedure, interactions between such property zones would have had to be included (e.g.enhanced run-off from zones above with lower infiltration capacity).Considering such interactions would have exceeded the available computational capabilities.
The parameter value ranges considered in the calibration procedure are derived from laboratory tests conducted on samples from eight sites.It is assumed that these ranges are representative for the whole catchment area.However, results of additional laboratory tests conducted on samples from other locations could further extend these ranges.In contrast, the tested parameter space already covers a wide range of material properties and raises the question of whether laboratory tests are required for the suggested calibration procedure at all.Such parameter value ranges could be derived from textbooks as well (e.g.Prinz and Strauß, 2011).Nevertheless, results of laboratory tests can be helpful for interpreting and validating the parameter combinations of the identified model ensemble.
Four parameters with a high impact on the model outcome were systematically sampled from a uniform distribution with defined increments and ranges.Hence, the subsequent calibration procedure, which considers each parameter value combination, remains deterministic.However, the combination of the results of the identified model ensemble must not be confused with a probability of failure, since the sampling and selecting of the parameter values is done systematically.Probabilistic approaches (e.g.Hammond et al., 1992;Raia et al., 2014), including a randomized parameter sampling strategy, could overcome this limitation while considering the uncertainty of the input parameters.If the probability distributions of the parameters throughout the study area are known, probabilistic approaches can be applied to derive the probability of failure.Theoretically, the resulting parameter value combinations of the identified model ensemble could provide insights into the area-wide probability distributions of the tested parameters.However, further investigations are necessary, including an enhanced sampling strategy.Improved and optimized models (e.g.Alvioli and Baum, 2016) will facilitate this objective.
The constrained set of 25 simulations, which optimally predict the observed landslides and non-landslides, is selected by minimizing the D2PC at a FOS threshold right below 1.0.Further performance indicators could be used for this task instead (e.g.Formetta et al., 2016;Mergili et al., 2017).However, for validating the results of physically based slope stability models, a performance indicator which is independent of a threshold (such as the AUC) can be misleading.As shown in Table 5, the AUC is less sensitive over the tested parameter value ranges compared to the D2PC.As a consequence, a high D2PC for the coordinates of the FOS threshold right below 1.0 (indicating a bad model performance) can go along with a high AUC (typically indicating a good model performance).Thus, for validating the results of physically based slope stability models, a performance indicator considering a FOS threshold right below 1.0 must be preferred over an indicator independent of a threshold.Nevertheless, the minimum D2PC increased during the calibration procedure from 0.34 to 0.41, suggesting worsening results.However, the simulations with lower D2PCs are associated with an unrealistic early triggering of the observed landslides before the onset of the rainfall event.Therefore, in case of dynamic slope stability models, the temporal progression of the performance indicators must be considered.
In the calibration procedure, FOS maps were calculated for seven time steps with intervals of 9 h.For computational reasons it was not possible to compute hourly output for all 10 000 simulations.A re-calculation of hourly output maps with the parameter combinations of the identified model ensemble showed that, in the time intervals between the original output time steps, slightly more observed landslides were predicted correctly in some cases.Theoretically, even more observed landslides could be predicted within the T. Zieher et al.: Calibration of a dynamic physically based slope stability model hourly time steps.Therefore the proposed calibration procedure may yield a different model ensemble if more output time steps were considered.In the same way, parameter values were tested in discrete intervals.Using parameter values in-between these intervals could enhance the model's predictive performance.Hence, the assessed predictive performance must be taken as a conservative estimate.
For both landslide-triggering rainfall events, some of the ensemble model runs show a decrease in the temporal true positive rate after the maximum precipitation intensity.This observation is associated with decreasing pore pressures due to less infiltrating water.For some observed landslides, which are predicted to fail around the maximum precipitation intensity, the reduced pore pressure later causes the FOS to rise above 1.0, and hence stable slopes are predicted again.However, this behaviour also suggests a sufficient calibration of the parameter values, since the model reacts to the temporally varying precipitation intensity.
With the model runs of the identified model ensemble between 46.6 and 70.5 % of the observed landslides triggered in August 2005 and between 51.2 and 89.0 % of the observed landslides triggered in May 1999 can be predicted correctly.In total, the model ensemble correctly predicts 73.0 % of the landslides triggered in August 2005 and 91.5 % of the observed landslides triggered in May 1999.A direct comparison with prediction rates of further studies conducted in other study areas is difficult, since site-specific characteristics (e.g.soil material, conditions prior to landslide triggering, size of the study area) and data availability and quality (e.g.landslide inventory, DTM) may vary considerably.Still, the model ensemble fails to predict the remaining 27.0 % of the landslides triggered in August 2005 and 8.5 % of the landslides triggered in May 1999.Furthermore, the identified model ensemble cannot explain why landslides triggered in August 2005 were not triggered in May 1999.Areas predicted as unstable are in good agreement for both rainfall events.Further local factors may control the triggering of the landslides (e.g.local precipitation patterns, preferential flow, concentrated surface run-off, locally weak layers).Such local effects and properties are not covered by the model nor by the input parameter maps.Moreover, the geomechanical model element includes a simplified representation of landslide geometry, while an instant failure mechanism of the whole landslide is assumed.The model's simplifications of complex processes, together with the applied parametrization, may explain the shortfall in spatial and temporal prediction accuracy.
The resulting slope stability maps of the identified model ensemble show a bias from east to west.Compared to the observed landslides, the predicted landslide density is noticeably higher in the eastern half of the catchment area.This bias might be related to the lithology.The south-eastern part of the Laternser valley is built up of sandstones (Penninic nappes), while the western and northern part is underlain by limestones, marls and shales (Helvetic and Ultrahelvetic nappes).Furthermore, the unconsolidated material located in the cirques of the south-eastern part of the valley is mostly coarse-grained debris originating from debris slides/debris flows and rockfalls from source areas above.Therefore, the material may feature higher angles of internal friction compared to the respective value range considered in the model ensemble.As a result, the slopes may remain stable in nature while they are predicted to fail by the ensemble.
The results of the identified model ensemble suggest a lower prediction rate of shallow landslides located in the forest.Therefore, the chosen representation of the effects of vegetation on slope stability in the revised model may be too simple.Furthermore, a conservative, spatially constant set of parameter values was chosen for the parameters describing the effects of vegetation.In forest stands, these parameter values vary spatially according to tree species, age and density.Parameter maps for the effects of vegetation accounting for these attributes could further improve the model's predictive performance (e.g.Schwarz et al., 2010Schwarz et al., , 2012)).
The results of the model ensemble based on a scaled precipitation intensity suggest a slight positive trend of unstable areas, while the surface run-off increases markedly.However, since subsurface flow is not considered and the run-off is calculated for each time step individually, the model will fail in predicting actual stream flow.Nevertheless, this result suggests that the precipitation intensities during landslidetriggering rainfall events are already close to or above the infiltration capacity under present-day conditions.A potential increase in precipitation intensity might thus lead to an increase in surface run-off rather than slope failure.However, considering the uncertainty indicated by the model ensemble, both trends are not significant.

Conclusions
In the present study, a revised form of the model TRIGRS 2.0 is calibrated based on a limited number of laboratory tests and a detailed shallow landslide inventory.The parameter space of four identified sensitive parameters is tested systematically.A model ensemble including 25 "behavioural model runs" is identified which correctly predicts most landslides and non-landslides for a landslide-triggering rainfall event in August 2005.The predictive performance of this ensemble is tested for a landslide-triggering rainfall event in May 1999.Finally, the ensemble is used to quantify potential changes in slope stability associated with increasing rainfall intensities.
It is shown that despite the simplified representation of the involved processes, the location and the supposed triggering timing of 73.0 % of the observed landslides triggered in August 2005 and 91.5 % of the observed landslides triggered in May 1999 are predicted correctly by the identified model ensemble.The inability of the model to correctly predict the remaining landslides may be in part related to the simplifications of the related processes.To overcome these issues, additional processes should be included in the model (e.g.subsurface flow).However, the spatial variability of the input parameter values remains an unresolved issue.
The assessment of changes in slope stability associated with scaled precipitation input shows a slight increase in potentially affected areas.At the same time, the peak runoff increases markedly.Even though neither trend is significant, this could indicate that the precipitation intensities of past landslide-triggering rainfall events were already close to the soil's infiltration capacity.However, a general increase in precipitation intensity could lead to an increase in the frequency of landslide-triggering rainfall events.Rainfall events which did not trigger any shallow landslides in the past may become trigger events under a changing climate in the future.

Figure 1 .
Figure 1.Workflow with the main steps of the analysis.

Figure 2 .
Figure 2. Location of the Laternser valley (a), slope angle map (b), geological map with sampled sites (c) and shallow landslide inventory (d).The slope angle map is based on a digital terrain model derived from airborne laser scanning (ALS) in 2011, serving as input data for modelling (resampled to a spatial resolution of 10 m).The box plots show the slope angle distribution for forest and non-forest areas.In the geological map only geological units covering more than 1 % of the catchment area are listed in the legend (data source:Heissel et al., 1967;Oberhauser, 1982).The shallow landslide inventory shows landslides triggered by the rainfall events in May 1999 (82; yellow) and August 2005 (356; red) occurring on undisturbed hillside slopes(Zieher et al., 2016).The areas covered by forest were derived from ALS data acquired in 2011.
where d (m) is the vertical depth (positive in downward direction), t (s) is time, ϕ (deg) is the angle of internal friction for effective stress, β (deg) is the slope angle, c (Pa) is the cohesion for effective stress per unit area, γ w (9806.6N m −3 ) is the unit weight of water and γ s (N m −3 ) is the unit weight of soil.Raster cells where the FOS falls below 1.0 are considered slope failures.Each cell with a FOS < 1.0 represents a single shallow landslide(Milledge et al., 2012).The model's results are FOS maps showing a quantitative measure of slope stability in space and time.

Figure 3 .
Figure 3. Landslide-triggering rainfall events in the Laternser valley on 21-22 May 1999 (a, c) and 22-23 August 2005 (b, d).The map (e) shows the meteorological stations considered.Regional daily mean (07:00-07:00) and cumulative deviation of precipitation from the longterm mean (1981-2010) are shown for the period of 1 year before the rainfall events (a, b).Cumulative precipitation for 3 days covering the landslide-triggering rainfall events are shown for meteorological stations within and surrounding the Laternser valley (c, d).Hourly precipitation sums are shown for Ebnit station in May 1999 (c), because at Innerlaterns station missing values are present in the respective hourly time series.Estimated triggering times of four shallow landslides were derived from protocols by the voluntary fire brigade.Data source: Hydrographic Service of Vorarlberg (HD), Central Institute for Meteorology and Geodynamics (ZAMG).

Figure 4 .
Figure 4. Hourly precipitation time series (a, c) and spatially interpolated precipitation sums (b, d) for the duration of the landslide-triggering rainfall events in 1999 (a, b; 07:00 on 20 May to 07:00 on 23 May) and 2005 (c, d; 07:00 on 21 August to 12:00 on 23 August).The error bars and the shading for the cumulative precipitation sum in (a) and (c) indicate the range of the interpolated hourly precipitation sums within the catchment area.

Figure 5 .
Figure 5. Cumulative distribution of d max derived from observations and models (a) and the resulting regolith depth map (b).

Figure 6 .
Figure 6.Results of the conducted laboratory tests.(a) Direct and triaxial shear tests, (b) grain size distributions, (c) compressibility of bulk soil and (d) Atterberg limits.

Figure 7 .
Figure7.Principle of the receiver operating characteristic (a; modified afterMetz, 1978) and application in the calibration procedure (b).FOS: factor of safety; TPR: true positive rate; FNR: false negative rate; TNR: true negative rate; FPR: false positive rate; D2PC: distance to perfect classification; AUC: area under the ROC curve.

Figure 8 .
Figure 8. OAT sensitivity of the model results (change in factor of safety) for tested parameter value ranges for the geomechanical model element (a) and the hydrological model element (b).Respective parameter values are listed in Table4.

Figure 11
Figure 11 shows the resulting areas of slope failures predicted by the model ensemble for both rainfall events.The colours indicate the number of model runs predicting slope failures per raster cell.Areas shown in red indicate a high agreement of the model ensemble, while yellow areas are identified by only one model run.The coordinates in the ROC plots associated with the number of agreeing model runs are shown in Fig. 11c for the rainfall event in August 2005 and Fig.11ffor the rainfall event in May 1999.The area, which is predicted to fail by at least one model run of the model ensemble, includes the most observed landslides (highest TPR) while the TNR is considerably low.With all 25 model runs in agreement, the rate of correctly predicted landslides is distinctly lower, while the TNR increases markedly.The prediction rates of the 25 model runs are shown in Fig.11dfor the rainfall event in August 2005 and Fig.11efor the rainfall event in May 1999.Respective maximum and minimum prediction rates are listed in Table8.Generally, the model ensemble is better at predicting the landslides triggered in May 1999.However, non-landslides are better predicted for the rainfall event from August 2005.In total, the model ensemble correctly predicts 73.0 % of the landslides triggered in August 2005 (landslides, which are predicted correctly by at least one ensemble model run).

Figure 9 .
Figure 9. Temporal prediction rate for the seven time steps and rate of correctly predicted landslides (true positives) and non-landslides (true negatives) at a FOS falling below 1.0 for the calibration runs.All 10 000 calibration runs (a), calibration runs which satisfy assumption 1 (b; n = 7300), calibration runs which satisfy assumption 2 (c; n = 1134) and the 25 calibration runs which predict most landslides and non-landslides (d).In (d), only the coordinates with the highest true positive rate for the 25 calibration runs are shown.The grey lines in (d) indicate the D2PCs of these runs.

Figure 11 .
Figure 11.Predictive performance of the model ensemble.The maps show areas predicted to fail in response to the rainfall event in August 2005 (a) and in May 1999 (b).The colours indicate the number of model runs predicting the respective areas to fail.The ROC plots likewise show the coordinates of the correctly predicted landslides and non-landslides for August 2005 (c) and May 1999 (f) associated with the number of model runs which are in agreement.The predictive rates of the model ensemble (see Table 8) are visualized for August 2005 (d) and May 1999 (e).The colours indicate the true positive rate.TPR: true positive rate; TNR: true negative rate; FPR: false positive rate; FNR: false negative rate; AUC: area under the ROC curve.

Figure 12 .
Figure 11.Predictive performance of the model ensemble.The maps show areas predicted to fail in response to the rainfall event in August 2005 (a) and in May 1999 (b).The colours indicate the number of model runs predicting the respective areas to fail.The ROC plots likewise show the coordinates of the correctly predicted landslides and non-landslides for August 2005 (c) and May 1999 (f) associated with the number of model runs which are in agreement.The predictive rates of the model ensemble (see Table 8) are visualized for August 2005 (d) and May 1999 (e).The colours indicate the true positive rate.TPR: true positive rate; TNR: true negative rate; FPR: false positive rate; FNR: false negative rate; AUC: area under the ROC curve.

Figure 13 .
Figure 13.Scaled rainfall event of August 2005 (a) and resulting changes in slope stability (b) and surface run-off (c), based on the ensemble runs with a scaled precipitation input.The area shaded in grey shows one standard deviation.

Table 2 .
Parameters for the revised TRIGRS 2.0 model and parameter values considered in previous studies.DEM: digital elevation model; K s : saturated hydraulic conductivity. *

Table 3 .
Metadata for the eight sampled landslide sites and results of the conducted laboratory tests.
* Results for a depth of 2.0 m.

Table 4 .
Parameter value ranges and central values considered in the OAT sensitivity analyses.
Variations in β and d max result in non-linear effects on slope stability.An increase in β or d max lowers the FOS.Both parameters are derived from a DTM and direct field measurements.Increased parameter values for ϕ and c distinctly enhance the FOS.Their impact is greater than the effects of the parameters associated with the vegetation (c r , s t , d r ).While variations of s t have minor www.nat-hazards-earth-syst-sci.net/17/971/2017/Nat.Hazards Earth Syst.Sci., 17, 971-992, 2017

Table 5 .
Tested parameter values used for the calibration runs.

Table 7 .
Prediction rates of the model ensemble for the landslide-triggering rainfall event in May 1999.Three scenarios for the initial depth of the groundwater table in relation to regolith depth are considered.TPR: true positive rate; TNR: true negative rate; FPR: false positive rate; FNR: false negative rate; D2PC: distance to perfect classification; AUC: area under the ROC curve.

Table 8 .
Prediction rates of the model ensemble for the landslide-triggering rainfall events in May 1999 and August 2005.For the rainfall event in May 1999, an initial depth of the water table of 0.75×regolith depth was considered.TPR: true positive rate; TNR: true negative rate; FPR: false positive rate; FNR: false negative rate; D2PC: distance to perfect classification; AUC: area under the ROC curve.