The main assumption on which landslide susceptibility assessment by means of stochastic modelling lies is that the past is the key to the future. As a consequence, a stochastic model able to classify past known landslide events should be able to predict a future unknown scenario as well. However, storm-triggered multiple debris flow events in the Mediterranean region could pose some limits on the operative validity of such an expectation, as they are typically resultant of a randomness in time recurrence and magnitude and a great spatial variability, even at the scale of small catchments. This is the case for the 2007 and 2009 storm events, which recently hit north-eastern Sicily with different intensities, resulting in largely different disaster scenarios.
The study area is the small catchment of the Itala torrent (10 km
Both of the two predictions resulted in largely acceptable performances in terms of fitting, skill and reliability. However, a loss of performance and differences in the selected predictors arose between the self-validated and the chrono-validated models. These are interpreted as effects of the non-linearity in the domain of the trigger intensity of the relationships between predictors and slope response, as well as in terms of the different spatial paths of the two triggering storms at the catchment scale.
Debris flows are among the most hazardous geological phenomena, which directly threat human lives in the light of their high energy and rapid propagation over slopes and drainage systems. In order to predict these phenomena, together with physically based approaches, which are mainly focused on the detection of the rainfall thresholds responsible for their triggering (e.g. Peres and Cancelliere, 2014; Bordoni et al., 2015) and on the physical modelling of the propagation phase (e.g. Schraml et al., 2015), susceptibility models (Brabb, 1984), suitable to depict prediction images of the sites where these phenomena are more likely to activate on a catchment/regional scale, are required as well. Combining the two approaches allows optimization of the use of early warning systems (e.g. Lagomarsino et al., 2015; Segoni et al., 2015; Stähli et al., 2015) – in doing so mitigating the debris flow risk on a catchment/regional scale.
Landslide susceptibility assessment can be achieved by means of different
methods, among which the stochastic approach has gained increasing importance
in the last 2 decades in regional assessment applications. In fact,
statistic models produce objective, quantitative and verifiable estimates of
the spatial probability for new landslides in a given study area. Moreover,
the stochastic approach is very easily implementable on geographic
informative systems (GIS), making use of the very diffused nature of present databases
of physical–environmental attribute layers. These methods are based on some
generally accepted assumptions, the basic one being
Unfortunately, very often, susceptibility assessment studies are affected by a lack of temporal information on the landslide inventory, which makes it impossible to perform a pure temporal or chrono-validation.
Based on the scheme described above, in order to elude the lack of temporal
information, strategies for the validation of the predictive models can be
defined. Specifically, when seasonal or event inventories (Guzzetti et al.,
2012) are not available, a validation can be performed by following a
It is evident how the whole scheme of the stochastic approach is strictly dependent on the basic assumption being held. Any changes in the real relationships between preparatory causes and landslide activity will affect the prediction skill of the obtained susceptibility models. Extreme events produce morphodynamic responses that can lie outside of the general rule. In fact, due to intense triggering, such as a storm, the same area can result in an “out-of-range” slope response because it could not be correctly predicted by a model skilled in fitting “normal” landslide scenarios. This could be a result of the non-linearity of the relationship between preparatory causes and landslides in the domain of the trigger intensity. Besides, the Mediterranean storms are typically affected by randomness in time recurrence and magnitude and a great spatial variability, even at the scale of small catchments. It is therefore necessary to check for this kind of behaviour to find a strategy which maximizes the ability of a susceptibility model to predict extreme events.
In spite of the wide diffusion of landslide susceptibility studies by means
of statistical modelling, few cases are focused on detecting predictive
limits when facing storm-triggered multiple debris flow events (e.g. Von
Ruette et al., 2011; Tseng et al., 2015). In particular, the application of
specific validation strategies to evaluate the effect of the trigger
phenomena in modifying the predictive performance of the models is very
rare. A contribution to this topic is presented here, drawing on a case study in
north-eastern Sicily, where two recent storm events (2007 and 2009) hit the Ionian
side of the Peloritani Mountains (Fig. 1) with different intensities.
Specifically, the study area is the Itala catchment (nearly 10 km
Study area:
In order to investigate our topic, the debris flows activated on the occasion
of the two extreme events were mapped by integrating remote and field
surveys, and a simple set of predictors was prepared by utilizing a
Testing a susceptibility model against future landslides is quite a hard
task, especially because it would require researchers to “wait for the future to happen”
(Guzzetti, 2005). Nevertheless, when a multi-temporal landslide inventory is
available, the validation can be performed using a temporal criterion to
separate calibration and validation data sets. Among others, Guzzetti et
al. (2005) performed a “temporal verification procedure” which evaluates
the effect of five landslide inventory updates on the performance of a
susceptibility model. Similarly, other authors used a temporal criterion to
validate the results of landslide susceptibility analysis at different
scales (Zêzere et al., 2004; Vergari et al., 2011; Wang et al., 2014), but
none of them worked with storm-triggered debris flows and event inventories.
Von Ruette et al. (2014) adopted a spatial partition scheme, with a partial
insight into temporal validation which was limited for predicting the
landslides triggered by two rainfall events in two close, but different,
catchments. Chang et al. (2014) concentrated their focus on exploring the
role of rainfall in controlling the chrono-validation performance for a much
larger-scale case, demonstrated in a larger area (2868 km
The Messina area (Fig. 1) and the debris flow event of 2009 have been the focus of study in several scientific articles centred on different topics. Several studies have been devoted to the implementation of remote and semi-automatic techniques for landslide recognition and mapping of such a significant multiple occurring regional landslide event (Ardizzone et al., 2012; Mondini et al., 2011; Ciampalini et al., 2015). Del Ventisette et al. (2012) focused their research on the Giampilieri village area, analysing the triggering mechanism and estimating the volumes involved in the debris flow. They also applied a method based on conditional analysis in order to obtain a susceptibility map. Goswami et al. (2011) and De Guidi and Scudero (2013) explored the relationship between tectonic setting and landslide susceptibility, taking the Giampilieri and Scaletta catchments as study areas. Reichenbach et al. (2014) evaluated the influence of land-use change on debris flow susceptibility for the Briga catchment. Stancanelli and Foti (2015) compared two different numerical models for simulating the 2009 debris flow event in the lower coastal sector of the affected area. Aronica et al. (2012a) published a detailed description of the 2009 event, with an insight into the saturation conditions of the soils and an evaluation of the difference of DEMs in the total volume of mobilized material for the Giampilieri catchment. Rainfall thresholds for the landslide activations have been investigated by Gariano et al. (2015), in the framework of a regional study, and by Peres and Cancelliere (2014), who conducted a specific study on the Ionian-Peloritan area, hit by the 2009 event. Lombardo et al. (2014) tested spatial exportation techniques for logistic regression-based susceptibility models, in the Briga and Giampilieri catchments.
With regards to the 2007 event, Aronica et al. (2012b) applied a physically based modelling tool to simulate the debris flows affecting a very small catchment, located 5 km south of the Itala stream.
In contrast to the above-mentioned research, in this paper, by studying two well split event inventories produced by two triggering events with different intensities, the relationship between controlling factors of triggers and morphodynamic responses are examined, and their effects on the predictive performance of stochastic susceptibility modelling are verified. Moreover, until now, no study has been published for the Itala catchment on the 2007 event, nor chrono-validated models and maps have been produced for the 2009 event.
The study area is located in the north-easternmost edge of Sicily (southern
Italy), on the Ionian slopes of the Peloritan ridge, 20 km southward from
the town of Messina (Fig. 1a). Specifically, the Itala catchment is located
in the Itala municipality territory, stretching 10 km
According to the Köppen classification (Köppen, 1923), the climate in the region is classified as Mediterranean (Csa)-type, being therefore characterized by a dry season from April to September and a wet season from September to March, with an average yearly rainfall of nearly 900 mm. In addition, due to the warm water of the Mediterranean Sea and the proximity of the ridge to the sea coast, storm events are frequent in the autumn season in this area of Sicily.
Due to the limited length, together with high steepness of the Ionian-Peloritan torrents, although they are usually almost dry, under raining conditions the discharge can rapidly increase, causing floods which affect the infrastructure (especially roads) located in the proximity of the riverbanks. Moreover, during autumn storm events, the combination of the hydrologic regime and geomorphologic setting occasionally determines severe morphodynamic responses, including multiple debris flows and debris flood events, such as those which occurred in 2007 and 2009. The potential occurrence of this kind of event makes the whole area of Ionian-Peloritan catchments one of the most exposed zones to hydrogeological risk in Sicily.
The inhabited areas of the Itala catchment are located in very dangerous areas, either at the base of very steep terraced slopes, or near the outlet of the streams. With respect to the land use, the area can be divided into an eastern and a western sector. The former is highly terraced and mainly cultivated with citrus groves; the latter is characterized by chestnut forests and pastures. The study area is strongly affected by wildfires during the summer season; this influences the density of vegetation, the soil structure and the erosional processes acting on the slopes.
Bar plot showing the cumulative (1 day, 3, 7, 10 and 20 days) rainfall in mm respectively for the main nine events recorded in the Itala catchment area.
Time series of 2 months' precipitation for the Messina (Ist.
Geofisico) and Briga rain gauges:
Overview of the area hit by the 2009 event:
The storm events of 2007 and 2009 have been analysed on the basis of two rain gauges belonging to the “Osservatorio delle Acque Sicilia”, located in Briga and Messina Osservatorio (Fig. 1a). In particular, as the Peloritan area was historically hit by other storm events, a detailed analysis of antecedent rainfall conditions and of the historical record of debris flow events was carried out. The most important extreme meteorological events were selected first and, on the basis of the historical archive of the two main local newspapers (“Gazzetta del Sud” and “Giornale di Sicilia”), the associated landslide activity was identified. However, the estimation of the severity of the slope responses to the triggering storms cannot be accurately assessed at a basin scale from this kind of historical data. Therefore, with the exception of the 2007 and 2009 inventories, the classification of the debris flow events was limited to a qualitative ordinal scale (no landslides: N-L; tens of landslides: T-L; hundreds of landslides: H-L), based on the significance and frequency of damage reported for the Itala catchment area.
By analysing the daily cumulated rain from 1975 to 2011 (with the exception of 6 years with no rainfall data: 1987, 1988, 1989, 2003, 2004 and 2005), the nine heaviest rainfall events were detected on the basis of a 100 mm threshold, which corresponds approximately to the rain quantity recorded during the 2007 event. Figure 2 shows the 1-,3-, 7- and 20-day cumulated rainfall for the nine events, together with the corresponding debris flow activity reported for the Itala catchment area (indicated by red labels on the bar plot). Among the nine selected events, only five caused important multiple occurrence of debris flows, whose effects were reported in local newspapers. In fact, for the cases of 2 December 1996, 8 September 2000 and 20 January 2009, no landslide events were reported in local newspapers, which could indicate that either no landslides were activated or that they were not significant enough in terms of damage caused to the villages. In these cases, the daily peak of rain was not anticipated by significant rain in the previous days. The more intense event of 1 March 2011 was responsible for the activation of tens of debris flows in a sector located about 5 km south of Messina, but no landslides are reported for the Itala catchment.
Among the events which caused reported landslides, the 30 October 1985 and 4
October 1996 events have very similar characteristics. In both cases, the main
events were anticipated by significant precipitation in the antecedent 72 h.
In contrast to this, the event of the 24 November 1995 was recorded with
123 mm day
The 26 October 2007 and the 1 October 2009 events are quite distinct when
compared to the others. In fact, on the one hand, the 2007 daily rainfall
event was anticipated by 3 dry days and heavy rainfall condition in a period of a week;
on the other hand, the severity of the 2009 rainfall event is evident in both
the daily (more than 200 mm) and in the 10- and 20-day precipitation,
which exceeded 350 and 400 mm, respectively. In particular, Fig. 3a shows
that the main event in 2007 (registered at Briga with 102 mm of rain in
24 h) was anticipated by longer and more extended raining periods, which
lasted from 20 to 23 October, resulting in a cumulative weekly rainfall of
220.4 mm. The storm triggered hundreds of debris flows in the whole area,
but only 73 in the Itala catchment. The 2009 event (Fig. 3b) presented the
highest daily rain (nearly 220 mm); moreover, it followed two previous
events: on 16 September, 49.2 mm in 6 h; and on 23–24 September, 79.6 mm in
10 h, determining a cumulative rain quantity which exceeded 412 mm in
20 days. As a consequence, on 1 October 2009 in an area of less than 10 km
To give a view of the large spatial variability of rainfall storms in this
area, it is worth noting that although intense rainfall was recorded at the Briga
rain gauge (102 and 220 mm day
The application of binary logistic regression (BLR) for landslide susceptibility assessment typically requires the following steps: the partition of the study area into mapping units, which are then characterized with respect to a set of potential predictors; the assignment of stability conditions to each mapping unit, based on its spatial relation with a set of known landslides (e.g. inclusion or intersection); the extraction of a balanced (stable/unstable) data set from the whole set of mapping units; the regression of the modelling function; and the verification of the performance of the model in correctly predicting stability conditions for each pixel, the latter defined on the basis of a set of unknown landslides.
This chapter describes the methods and the model building strategies which have been adopted to investigate the main research topic: exploring skills and limits in predicting the source areas of storm-triggered debris flows.
Comparison of morphologies between two different images resulting
in five different cases:
Debris flow event inventories:
Discrete variables:
The typologies of the landslides that were activated during the 2007 and 2009
events are mainly classified as channelized debris flows and debris
avalanches or hillslope debris flows (Varnes, 1978; Hutchinson, 1988; Hungr
et al., 2001, 2014), which affected the weathered mantle of the metamorphic
bedrock on the very steep slopes of the Itala catchment (Fig. 4). However, as
this paper aimed to study susceptibility to new activations or the prediction of source
areas, the whole set of phenomena was processed as a single type,
using in the following the general sense of the term
Landslide recognition was performed by integrating a field survey, which was carried out soon after the 2009 disaster, and orthophoto analysis which allowed the slopes to be visualized at different dates. In particular, high-resolution lidar (Light Detention And Ranging) data were used from two different acquisitions, 2008 and 2009, respectively. These data were provided by the Territory and Environment Department of the Sicilian government (ARTA 2008 – Assessorato Regionale Territorio e Ambiente) and the National Civil Protection (PCN 2009, Protezione Civile Nazionale). The ARTA 2008 data (taken in August) include 0.25 m pixel orthophotos and a DEM, with 2 and 0.22 m for horizontal and vertical resolution, respectively. The PCN 2009 data were acquired 6 days after the 2009 event and includes 15 cm pixel orthophotos, and a 1.1 m cell DEM. In addition, multi-temporal (2005, 2006, 2010 and 2012) Google Earth™ (GE) images were analysed in order to compare the 2007 and 2009 mapped phenomena with the previous and following slope conditions.
Continuous variables:
An event inventory (Guzzetti et al., 2012) has to report only those landslides which have been triggered by a single specific trigger occurrence, such as an earthquake, rainfall or snowmelt. To fit this constraint, first landslide mapping was carried out on the 2008 and the 2009 images, obtaining a first version of the 2007 and 2009 inventories. However, the mapped landslides were supposed to be activated during 26 October 2007 for the first inventory and 1 October 2009 for the second. Therefore, the morphologies mapped on 2007 were also compared with the 2006 GE images. By combining the data obtained from the three time frames, five different cases were obtained (Fig. 5): (a) debris flows mapped on the 2007 orthophotos but which activated before the 2007 event; (b) debris flows which activated during the 2007 event but did not reactivate or retreat during the 2009 event; (c) debris flows which activated during the 2007 event that retreated or reactivated during the 2009 event; (d) debris flows which activated during the 2007 event which had been completely eroded during the propagation phase of the 2009 event and (e) debris flows which activated during the 2009 event in precedent stable areas.
The final event inventories (Fig. 6) contained 73 debris flows for 2007, corresponding to cases (b), (c) and (d), and 616 for 2009, corresponding to case (e). Each landslide inventory was stored in two separated vector layers: the first containing a polygon representing the source areas, and the second containing the landslide identification points (LIP), corresponding to the highest point along the crown of each mapped phenomenon (Costanzo et al., 2012b, 2014; Lombardo et al., 2014).
Binary logistic regression (BLR) is a multivariate statistical technique, based on a frequentist approach, which is used to model the expected value of a response variable (the outcome) by a linear combination of either continuous and/or discrete predictor variables (Hosmer and Lemeshow, 2000). With respect to other frequentist methods (e.g. discriminant analysis), it does not require any linearization or transformation to obtain normal distributed covariates. Moreover, the outcome of BLR is easily interpretable for applied scientists.
In binary logistic regression the response variable
The relationship between the predictors and the probability for the response
variable to assume the value 1 is linearized by the logit function (
The odds ratios (OR), which are calculated by simply exponentiating
In order to estimate the best intercept and
In the present research, we applied BLR under a stepwise selection routine,
which has already been successfully adopted in landslides and debris flow
susceptibility studies (Begueria, 2006; Meusburger and Alewell, 2009;
Atkinson and Massari, 2011; Costanzo et al., 2014; Heckmann et al., 2014;
Lombardo et al., 2014). The stepwise selection is an iterative procedure,
which selects the best performing and most parsimonious set of predicting
variables. It can be performed either in forward or in backward mode. In the
first case, the procedure starts from an “intercept only” model and consists
in selecting and adding, at each step, the variable which maximises the log-likelihood value. On the contrary, the
backward stepwise selection starts from a full model, including all the
variables, and removes the variables iteratively until the model reaches the
best fitting. In the forward stepwise selection, at every step the procedure
introduces all the variables iteratively and selects the one that maximizes
the
All the statistical analyses which are hereafter discussed were performed by using open source software (TANAGRA: Rakotomalala, 2005).
The first step in modelling the debris flow susceptibility using a stochastic approach is to select those mapping units in which the study area has to be partitioned. Mapping units are the basic spatial elements in which the model will be able to produce a prediction. Two main types of mapping units are adopted in literature: hydro-geomorphological units and regular grids. The former allows the model to take advantage of the morphodynamic homogeneity of the area which is included in each single unit, corresponding to hydrological or slope units; the latter optimizes the matching between the spatial resolution of the source layers of some important predictors, typically having the same grid structure of the DEM.
In the present research, a raster-based structure was adopted by partitioning the study area into a grid of 8 m square cells, which required also the rasterization of the spatial distribution of all the covariates.
Starting from a DEM and a geological map, the following eight potential predictors have been selected and their value assigned to each cell in which the study area has been partitioned (Figs. 7 and 8): outcropping lithology (GEO), land use (USE), aspect (ASP), steepness (SLO), topographic wetness index (TWI), plan (PLAN) and profile (PROF) curvatures and distance from tectonic features (DFAULT).
Outcropping lithology and tectonic features are proxy variables expressing the mechanical properties of the bedrock and the weathered mantle. These variables were obtained from a 1 : 50 000 available geological map (Lentini et al., 2007), which was derived from 1 : 10 000 field surveys.
Information on land use allows the model to summarize those potential modifications of the natural structure of the regolithic mantle and the bedrock which are related to anthropogenic activities. In order to express these properties, a land-use map, based on the analysis of the orthophotos ARTA 2007/2008 and PCN 2009 and field recognition, was prepared. The final land-use map contains six classes: (i) medium-high vegetated terraces (MHVT); (ii) low vegetated terraces (LVT); (iii) chestnut forests (CF); (iv) pastures (P); (v) urbanized areas (UA); (vi) river beds and beaches (RB).
Slope steepness, plan and profile curvatures are related to the energy of
the relief. Steepness is commonly used as a predictor in landslide
susceptibility and very often it demonstrates a very high importance. In fact,
especially for debris flow analysis it is expected to be one of the most
significant variables because it is directly linked to the shear strength
acting onto the potential shallow failure surface. Moreover, for shallow
failures presenting slide or flow mechanisms, the topographic surface and the
rupture plane or zone can be considered as almost parallel. In this case, the
slope steepness is a proxy for the real inclination of the potential failure
surface. Steepness also controls the overland and subsurface flow velocity
and runoff rate. At the same time, the topographic curvatures control the
divergence and convergence, both of surface runoff and shallow gravitational
stresses (Ohlmacher, 2007). Curvatures are expected to be the best proxy
variables for convergent flow of water (plan curvature) and changes in flow
velocity (profile curvature). In this study the
The topographic wetness index is defined as ln(As
Aspect controls the intensity of the solar insolation at the Earth's surface, and as a consequence, also the evapotranspiration and flora and fauna distribution and abundance. It is very important to consider the erosional processes related to the chemical physical weathering, operated by water, temperature and vegetation, in the determination of landslide susceptibility. Further, ASP frequently assumes a role of proxy variable for the attitude of the rock layers.
The source for the calculation of the topographic attributes was the DEM ARTA 2007/2008 subsequently resampled at 8 m pixel size with the nearest neighbour approach. The resampling operation on the original DEM (2 m pixel size) smoothed the effects of microtopography and possible noise existing on the original data.
All the factors were calculated using SAGA GIS (System for Automated Geoscientific Analysis, Conrad, 2007).
Once the layers of the predictors were obtained, they were combined in a multivariate grid whose cells status (stable/unstable) was defined on the basis of the intersection with the LIPs. Each cell hosting at least one LIP was set as unstable, in order to calibrate the models in predicting the locations of future LIPs, which in our scheme correspond to debris flow initiation areas.
Model validation is a mandatory component of susceptibility assessment studies (Carrara et al., 2003; Guzzetti et al., 2006; Frattini et al., 2010; Rossi et al., 2010). No matter the method adopted in modelling the susceptibility, rigorous and quantitative validation procedures are the only criterion for accepting or rejecting a predictive model.
The validation of a model requires the availability of a calibration and a validation set of landslides or outcomes. The training landslides are applied to calibrate the maximum-likelihood fitting, so that the regression coefficients are optimized; the predicted probability which is generated by the model is then compared to the actual unknown target pattern which is defined by the validation landslides set. The accuracy of a model is then evaluated by comparing the produced prediction image to the known (calibration) and unknown (validation) target patterns. In particular, the degree of fit expresses the ability of the model to classify the known cases, while the prediction skill is the ability to predict the unknown cases.
As proposed by Chung and Fabbri (2003), calibration and validation data sets can be obtained by time partition, random time partition or spatial partition. The first is possible when multi-temporal landslides inventories are available, the second is based on randomly partitioning single-epoch data sets and the third on sub-dividing the study area into two similar sub-sectors. Random time partition procedures can be applied either on the landslide inventory (Conoscenti et al., 2008a) or on the mapping units database (Conoscenti et al., 2008b), whilst spatial partition can also be performed also on not nested or adjacent areas such as in the study aimed at susceptibility model exportations (von Ruette et al., 2011; Costanzo et al., 2012a; Lombardo et al., 2014).
However, validating a model requires precision, robustness and
geomorphological adequacy or coherence for testing its accuracy, both in terms of
predictive performance and inner structure of the model. The latter
corresponds, in a stepwise BLR procedure, to the rank and the coefficients of
the selected predictors (Frattini et al., 2010; Costanzo et al., 2014;
Lombardo et al., 2014). Moreover, as BLR does for balanced
(positive/negative cases) data sets, a single regressed data set must contain
the positive cases (unstable cells) and an equal number of randomly selected
negatives (Atkinson et al., 1998; Süzen and Doyuran, 2004; Nefeslioglu et
al., 2008; Bai et al., 2009; Van Den Eeckhaut et al., 2009; Frattini et al.,
2010; Costanzo et al., 2014), which could determine a low representativeness
of the analysed cases. In particular, in this study, each pixel containing a
LIP has been considered as being in the diagnostic area (Rotigliano et al., 2011), while
the negative cases have been randomly selected in the catchment, outside the
landslide polygons. In order to obtain a better dispersion of points and to
avoid autocorrelation of the spatial variables, the distance in the random
selection was maximized. Therefore, every model was composed of 146 balanced
cases (positive/negative), for 2007, and 1232 balanced cases, for 2009. This
heavily reduces the number of actually analysed cases to a very small
percentage of the cells in which the study area is partitioned, so that a
need of testing the representativeness of the worked subset also arises. To
control the possible effects introduced by this procedure, multi-extraction
of negatives are to be performed and more than one data set regressed. Specifically,
a multiple extraction produces
Value prediction and confusion matrix of cross-folded validation for the 2007 data set.
Value prediction and confusion matrix of cross-folded validation for the 2009 data set.
In this research, two suites of 10 data sets were extracted for both the 2007 and
2009 models; a 10-fold cross validation procedure was then
applied to each data set, which gave a total of one hundred probability
estimates (10 replicates
Selected variables for the 2007 suite of models:
Selected variables for the 2009 suite of models:
Once a cut off for the estimated probability is fixed to split positive and
negative predictions, the crossing with a target pattern results in the
production of true positives (TP), true negatives (TN), false positives (FP:
type I errors) and false negatives (FN: type II errors) cases. Contingency
tables are used to summarize these data and to compute the model error rate,
(TP
A cut-off independent technique for estimating the accuracy of a predictive model is represented by receiver operating characteristic (ROC) curves, which depict the trade-off between success and failures for the decreasing probability threshold, in sensitivity versus 1-specificity plots. The area under the curve (AUC) in the ROC plots is the most adopted metric for the accuracy of the predictive models.
The precision and accuracy of the model can also be represented in spatial terms, by preparing prediction and error maps. For each mapping unit, the mean susceptibility and the dispersion of its estimates are plotted and compared to the actual distribution of the unknown positives.
In order to investigate the main research topic, two kinds of modelling procedures have been conducted. A self-validation scheme was applied for each of the two event inventories (2007 and 2009), by randomly splitting (90/10 %) the 10 extracted balanced data sets of the two temporal suites into a calibration and a validation subset. For each data set, the random splitting procedure was applied 10 times, resulting in one hundred self-validated replicates.
A chrono-validation scheme was then applied, by calibrating the model with the whole event inventory of each epoch and validating the performance in matching the event inventories of the other. We hereafter refer to forward chrono-validation, if calibrating with 2007 and validating with 2009, and vice versa to backward chrono-validation, if calibrating with 2009 and validating with 2007. For each temporal model suite, we produced 10 prediction images based on the 10 data sets of the other suite, again having one hundred backward and one hundred forward chrono-validated replicates.
The results of the cross-validation procedures for the one hundred 2009 and 2007 self-validated models are presented in Tables 1 and 2. Generally, the 2009 models (Table 2) resulted in a better performing prediction with lower (0.336, for 2007; 0.219, for 2009) and more stable error rates. Similarly, the ROC-AUCs (Table 3) attested to the good quality of the models, with a higher performance for the 2009 model (2009 AUC was 0.85, 2007 AUC was 0.70) and no evidence of overfitting.
Distribution of the AUC and error rate values calculated on the 10 replicates for 2007 and 2009 modelling and 100 models during the chrono-validation process.
With regards to the predictors, the 2007 model suite selected five variables
(Fig. 9), four of which had a frequency of more than 5/10: west and
south-west slope aspect, steepness and phyllites to meta-arenites (FDNb) outcropping lithology resulted as
the main causative factors for the 2007 debris flows. A larger set of
variables (17) was included by BLR in the 2009 model suite (Fig. 10), 15 of
which were selected more than five times. Among the topographic variables, the
most important were: steepness, all the pixels without any northward aspect
component, profile curvatures (both concave and convex) and plan convex
curvature of slopes. Together with topographic variables, FDNb and paragneiss to mica shists (MLEa)
lithologies, distance from tectonic elements (DFAULTS) and chestnut forests
(CF) and pastures (P) land-use classes were always selected with high and
stable rankings. Concerning the
HK values for the 100 replicated chrono-validations (in bold, the maximum values).
Comparison of the mean ROC curves obtained for the self-validated and chrono-validated models.
Susceptibility and error maps for the 2007 and the
2009 calibrated models:
Map of residuals calculated as percentage differences between the two (2007 and 2009) mean susceptibilities.
Dispersion density plot calculated using a 2-D binned kernel density algorithm (range for density calculation 0.045 xy). Positive cases for 0.5 cut-off values are reported for the two inventory events.
Once the overall quality of the predictive performance of the 2007 and 2009 models was assessed, regressions were run for the 10 full (without splitting into calibration and validation subsets) data sets of each event inventory, which maximized the fitting of the models. For both these full self-validated models (Fig. 11), the obtained ROC-AUCs are above the good performing threshold (> 0.81 for 2007; > 0.87 for 2009), with average error rates of 0.26 for 2007, and 0.22 for 2009. The 2007 and 2009 full models were then submitted to forward and backward chrono-validation, respectively, resulting in largely acceptable ROC-AUCs (> 0.75) and error rates (< 0.3), although a loss in the predictive performance of both the temporal predictions was observed. In particular, by comparing the self-validation and the chrono-validation performances, a decrease in AUC from 0.81 to 0.77 for 2007, and from 0.87 to 0.78 for 2009, arose. Also, the mean error rate values increased from 0.26 to 0.30 for 2007, and from 0.20 to 0.28 for 2009. It is worth noting the strong decrease in performance affecting the 2009 model, which led the two chrono-validations to be almost equivalent. In Fig. 12, the calculated mean (over 100 replicates) ROC curves are shown. Coherently, the HK mean scores are comparable between forward and backward validations, presenting a maximum of 0.433 and 0.446, respectively (Table 3).
A spatial view of the obtained prediction images for the 2007 and 2009 models
is given in Fig. 13. In particular, the susceptibility maps show the spatial
distribution of the mean probabilities for the 10 replicates, whilst the
error maps describe the dispersion of the estimates, represented by a
2
At a first glance, the two susceptibility maps appear quite different: the 2007 map shows a more diffused and graduated susceptibility, with the north-western and south-eastern sectors of the catchment hosting high susceptible areas. On the contrary, the 2009 map is characterized by a marked spatial separation between the north-eastern high susceptible sector and the remaining larger part of the catchment, which has a low susceptibility. In terms of error maps, the 2007 model is affected by a generally higher level of error, with the maximum values located in the central sector and minimum values along the stream network. The 2009 model, on the contrary, produced lower errors, with the exception of the stream network, which is characterized by relatively higher values, and two single small areas, corresponding to the outcrops of poorly diffused lithologies (see Fig. 1).
To compare the two landslide susceptibility maps, taking into consideration the distribution of the debris flows which occurred in 2007 and 2009, LIPs were located onto a map of the residuals. This map represents the difference between the two (2007 and 2009) mean susceptibilities (Fig. 14). The residuals confirmed the dissimilarity between the two models in estimating the susceptibility of the catchment, with higher probabilities in the southern and north-western sectors for the forward-validated models, and in the north-eastern sector for the backward-validated models, respectively.
By comparing the two susceptibility estimates in a dispersion density plot
(Fig. 15), the above-described trend is verified. The two models linearly
agreed in the higher range of susceptibility, whilst a larger dispersion
existed in the lower and intermediate susceptibility range. In particular,
for the stable areas (near the origin of the plot) the higher densities
pixels are shifted toward a more than 45
From a binarized perspective, by setting the cut-off value for
stable/unstable discrimination to 0.5, the final number of joint predictions (II,
for TP, and IV, for FN sectors) was 77 %, whilst disjoint predictions (I
and III sectors of the plot) reached 23 %. The two chrono-validated
models performed in predicting the whole set of observed positives with different results:
the backward-calibrated model produced 46
In this study, the findings of previous studies (Zêzere et al., 2004; Guzzetti et al., 2005; Vergari et al., 2011; Wang et al., 2013) regarding the effectiveness of temporal partition procedures to explain future landslides are generally confirmed here, even in the case of debris flows triggered by an extreme rainfall event. The above-described results attest to a symmetry between forward and backward chrono-validations, as well as the main assumption on which stochastic modelling is based. However, through the analysis of the self-validated models, it was identified that the 2009 model resulted in a higher predictive performance, with a higher number of selected variables. This could be interpreted as a direct consequence of the greater number of debris flows which compose the 2009 inventory (1 order of magnitude more), so that a larger spectrum of multivariate conditions of the slopes was involved in failures and included in the data sets for the fitting of the models. However, the first four selected predictors for the 2009 model correspond to those composing the structure of the 2007 model: slope morphology (steepness, curvature and aspect), soil use and outcropping lithology.
The comparison between the performances of the self-validated and the chrono-validated models has highlighted a loss in accuracy which is slightly more marked for the higher performing self-validated 2009 model. Therefore, although a large difference between the accuracy of the two self-validated models is observed, the comparison between the forward and backward chrono-validated models shows very smoothed differences in terms of ROC-AUC and error rates. This suggests that, in spite of the higher performance which the 2009 model obtained in classifying the same 2009 event, its skill in back-predicting the 2007 debris flow source areas is the same shown by the 2007 event in forward-predicting the debris flow source area of 2009.
The loss in performance demonstrated by the 2009 model suggests that using
self-validated models for temporal prediction can mislead the user in
estimating the performance of the model. In fact, one would expect that the
model calibrated with the largest landslide inventory would be the best-performing
in chrono-validation as it also includes the less extreme
morphodynamic responses. However, in spite of the similar inner structure of
the 2007 and 2009 models, the predictive performance of the 2009 backward
model lowered to the same ROC-AUC and error rates of the 2007 forward
model. The reason for this behaviour could be connected to the different
local characteristics of the two storm events, which hit the
slopes differently, even in such a small catchment. This would indicate, for this study
case, that inside a 10 km
At the same time, a non-linearity of the morphodynamic response of the slopes (different coefficients and/or predictors) could affect the performance in chrono-validation: a larger event does not produce a larger response which include less intense storms, but rather a different one. The larger the difference between the triggering events, the greater the distinction in the response of the preparatory conditions.
In the domain of the predictors, this is highlighted by the different inner structures of the models. If compared to the 2007 event, the 2009 event also activated eastern and south-eastern-facing pixels, as well as high metamorphic-grade (MLEa) lithologies and terraced deposits; topographic curvatures, distance from faults and soil use (the latter with negative coefficients) have also taken an important role in controlling the distribution of the debris flow source areas. However, this richer structure of the model does not increase its predictive ability with respect to the distribution of the 2007 debris flows; the backward chrono-validation does not demonstrate this greater accuracy. This suggests that the 2007 debris flows were activated through different, even if largely overlapping, mechanisms.
In the domain of the geographical space, the map of the residuals provided a spatial view of the different behaviour of the two models, giving the interpreter clues for the real path followed by the two storm fronts inside the Itala catchment. The 2009 model markedly overestimated the susceptibilities in the central-northern sector of the catchment, whilst the 2007 model produced higher susceptibilities than 2009 in the north-western inner mountain sector. Regardless of the different intensities, this spatial trend suggests that the 2009 storm path was limited to the coastal area, whilst the 2007 storm affected the whole catchment more homogeneously, activating also the slopes of the mountain sector. This interpretation is also confirmed by the different spatial distribution of the debris flows of the two event inventories and it agrees with the findings on a much larger scale of Chang et al. (2014).
However, from a risk perspective, the difference between the two models did not produce a significant loss in prediction, as only a limited number of cases resulted in a false positive prediction. This is why the mapped debris flows are largely located in the more susceptible pixels. However, the results of the present research have confirmed that the larger difference between the two models has been observed in the intermediate susceptibilities interval, which is the same region of the error plots where the self-validated models show poor precision. This difference is also attested by the HK scores, which confirmed the good prediction skills, but with maximum values proximal to 0.45. Under the considered triggering conditions, the multivariate relationship between debris flow activation and predictors is in fact linear, so that no single marked cut-off value for probability accurately discriminates positives from negatives. Nevertheless, it is worth highlighting the selection of a 0.5 cut-off value, which resulted in a higher performance for the temporal prediction of the positive cases (forward chrono-validation) of the 2007 calibrated model.
Finally, it is worth comparing here the results obtained for
chrono-validation (AUC was0.77
The results obtained in this research confirmed that the basic assumption on which susceptibility modelling is based (the past is the key to the future) must be critically accepted in the case of extreme events. In fact, in the case of the two storm events considered here, the dissimilarities in the intensity and the real path of the two storm fronts produced measurable differences in the behaviour of the two derived predictive models, both in the domain of the predictors and in the spatial pattern of the susceptibility maps. Two main causes have been recognized here: on the one hand, the slopes did not linearly respond to the trigger intensity, so that different predictors and coefficients were fitted by the two regressed models; on the other hand, effects produced by the spatial non-homogeneity of the rain intensity for each single storm event, even at the scale of such small catchments, were detected.
In terms of the operative use of the susceptibility maps, the effects identified attest to the risk of either over- or under-estimating the susceptibility, both for the 2009 and 2007 models. In particular, limits arise in the general perspective of using the most severe and available inventory for calibrating the best-performing model. In fact, in this research it was verified that this best-performing self-validated model did not result in the most accurate one in chrono-validation, also demonstrating susceptibility underestimation and false negative production.
In the present study, the differences between the two models basically reside in the intermediate susceptibility interval, so that a precautionary approach in reclassifying the susceptibility map could be adopted, accepting the precision limits in the intermediate probability classes. However, larger differences between the triggering storms to which calibration and validation event inventories are connected could result in larger predictive limits and more misleading susceptibility maps.
The strict relation between trigger intensity, slope response and prediction performance arises also from the comparison of this study to another study carried out by applying spatial partition or transferability validation strategies in two adjacent catchments for the same 2009 trigger, obtaining a better predictive performance. In the opinion of the authors, this difference confirms limitations of the chrono-validation procedure when working with extreme rainfall events. For this reason, the application of transferability or chrono-validation should be evaluated from time to time on the basis of the availability of historical records of phenomena, information on the trigger event, and similarity with other areas where debris flow events have already occurred. At the same time, the production of susceptibility maps such as those presented in this paper constitutes a basic starting point for modelling propagation, run-out and magnitude associated to the predicted phenomena, so that an estimation of the debris flow hazard is achieved within a given area.
The findings and discussion of this research are the results of research activities carried out in the framework of the PhD research projects of Mariaelena Cama and Luigi Lombardo at the “Dipartimento di Scienze della Terra e del Mare” of the University of Palermo (XXV cycle). Luigi Lombardo's PhD thesis is internationally co-tutored with the Department of Geography of the University of Tübingen (Germany).
This research was supported by the project SUFRA_SICILIA, funded by the ARTA-Regione Sicilia, and the FFR 2012/2013 project, funded by the University of Palermo.
M. Cama, C. Conoscenti, L. Lombardo and E. Rotigliano have commonly shared all parts of the research as well as the manuscript preparation. V. Agnesi has taken part in the final discussion of the data.
Authors wish to thank two anonymous referees for having provided suggestions and comments, which greatly enhanced the quality of this paper.Edited by: F. Guzzetti