Influence of uncertain identification of triggering rainfall on the assessment of landslide early warning thresholds

Uncertainty in rainfall datasets and landslide inventories is known to have negative impacts on the assessment of landslidetriggering thresholds. In this paper, we perform a quantitative analysis of the impacts that the uncertain knowledge of landslide initiation instants have on the assessment of landslide intensity-duration early warning thresholds. The analysis is based on an ideal synthetic database of rainfall and landslide data, generated by coupling a stochastic rainfall generator and a physically 15 based hydrological and slope stability model. This dataset is then perturbed according to hypothetical “reporting scenarios”, that allow to simulate possible errors in landslide triggering instants, as derived from historical archives. The impact of these errors is analysed by combining different criteria to single-out rainfall events from a continuous series and different temporal aggregations of rainfall (hourly and daily). The analysis shows that the impacts of the above uncertainty sources can be significant. Errors influence thresholds in a way that they are generally underestimated. Potentially, the amount of the 20 underestimation can be enough to induce an excessive number of false positives, hence limiting possible landslide mitigation benefits. Moreover, the uncertain knowledge of triggering rainfall, limits the possibility to set up links between thresholds and physio-geographical factors.


Introduction
Thresholds estimating rainfall conditions correlated to landslide occurrence are useful for landslide early warning systems (Guzzetti et al., 2007;Highland and Bobrowsky, 2008;Sidle and Ochiai, 2013).Commonly, thresholds are derived by empirical approaches based on the direct statistical analysis of historical rainfall series and landslide inventories, from which a line roughly separating triggering from nontriggering conditions is drawn.Among the various thresholds types, precipitation intensity-duration power law thresholds (hereafter referred to as ID thresholds), introduced by Caine (1980), have been derived for many regions of the Earth and are still considered as a valid empirical model (Caracciolo et al., 2017;Gariano et al., 2015;Peruccacci et al., 2017;Vennari et al., 2014), though they are affected by several theoretical and practical limitations (Bogaard and Greco, 2018).
Thresholds derived for different geographical areas vary significantly, and some attempts have been made to find a rationale underlying this variability by linking threshold parameters to physio-geographical and climatic features (Guzzetti et al., 2007(Guzzetti et al., , 2008)).Nevertheless, rainfall and landslide data quality issues, reported in almost all of the papers on threshold determination, are known to potentially hamper the assessment of this link.As reported in many studies, the triggering instants available from real landslide inventories are imprecise.For instance, Guzzetti et al. (2007Guzzetti et al. ( , 2008) ) reported that in a global database of 2626 landslides, the vast majority (68.2 %) had no explicit information on the date or the time of occurrence of slope failure; for most of the remaining events only the date of failure was known, and more precise information was available for only 5.1 % of landslides.These issues are confirmed with reference to an updated dataset of landslides occurred in Italy (Peruccacci et al., 2017).In their analysis, only information with an accuracy at least of 1 day was retained from the larger available dataset.Still, for this trimmed dataset, triggering instants were available with high precision (minute or hour) for only 37.3 % of the data, with the day or part of it available for the majority (27.6 and 35.1 %, respectively).
Other data artifacts include (i) rainfall measurement delays related to manual collection of data, (ii) different criteria to identify rainfall events; (iii) lack of completeness of landslide catalogues, and (iv) imprecise location of landslides, or precipitation measurements available at a significant distance apart from the location of failure.Though there is general agreement that these factors affect the accuracy of landslidetriggering thresholds, a quantification of their influence has only partially been carried out in the literature.In particular, to the authors knowledge, only the effect of rain gauge location and of the density of rainfall networks (point iv) has been analysed (Nikolopoulos et al., 2014), showing that the use of rainfall measured at some distance from debris flow location can lead to an underestimation of the triggering thresholds.
Quantitative assessments of the influence of the sources of errors listed above are difficult to be based on observational datasets, since it cannot be ensured that these are immune of errors.In this paper we capitalize on the synthetic rainfalllandslide dataset of a preceding study (Peres and Cancelliere, 2014), to quantify the effects of the imprecise identification of triggering rainfall on the assessment and performances of landslide-triggering thresholds.The dataset is in principle "error-free" in the sense that the instants of landslidetriggering are exactly known, as well as the triggering rainfall time history.We then fictitiously introduce errors in the triggering instants and in the rainfall series based on hypothetical scenarios of landslide data retrieval and analysis, and analyse the implications on the accuracy of ID thresholds.The quality of information available in real datasets is generally intermediate of that corresponding to the hypothesized scenarios.These scenarios are combined with different criteria for event rainfall identification, and different aggregations of rainfall data (hourly and daily, and daily in the presence of a shift due to manual collection of data), so the effects of these other two sources of uncertainty are analysed as well (items i and ii of the above list).The synthetic data used for our analyses are based on characteristic for hillslopes in the landslide-prone region of Peloritani Mountains, in northeastern Sicily, southern Italy. 2 Dataset: generation of synthetic rainfall and landslide data We refer to the dataset developed in Peres and Cancelliere (2014).Here we provide a basic description of the methodology used for its generation, which includes the following steps: -Synthetic generation of hourly rainfall time series: a seasonal Neyman-Scott rectangular pulses (NSRP) stochastic rainfall model (Cowpertwait et al., 1996;Rodríguez-Iturbe et al., 1987a, b) is used for the generation of 1000 years of hourly rainfall data.The model is calibrated on approximately 9 years of hourly observations from the Fiumedinisi rain gauge located in the area (Fig. 1).
-Computation of hillslope pressure-head response: a two-state hydrological model is used for the computation of pressure head.State 1 and 2 are activated separately during rainfall events and during no-rain intervals, respectively.Rainfall events are defined as a section of the rainfall series preceded and followed by no rainfall for a minimum time interval of 24 h.Within state 1 the TRIGRS-v2 model (Baum et al., 2010) is applied, which is based on the Richards' equation for mono-dimensional vertical infiltration with a Gardner negative exponential soil water characteristic curve.This is the least simplified form of the Richards' equa- tion for which an analytical solution has been derived so far (Srivastava and Yeh, 1991).A leakage flux at the soil-bedrock interface is considered, assuming the vertical hydraulic conductivity of the bedrock strata c D = 0.1 times the saturated conductivity K S of the pervious soil layer.Within state 2 a linear reservoir water table recession model is activated to simulate subhorizontal drainage and used to compute water table height at the beginning of the next passage to state 1.A linear reservoir scheme computes a drainage flow that depends on the water table level, determining a negative-exponential decay of pressure head at the bottom of the regolith layer, with recession constant τ M .
-Derivation of virtual landslide occurrence times: an infinite slope model to compute the factor of safety F S for slope stability is applied.For this schematization, failure surface coincides with the regolith-bedrock interface.The time instants at which a downward crossing of F S = 1 occurs are assumed to be the instants at which landslides are triggered.
The dataset is generated considering soil hydraulic and geotechnical properties shown in Table 1 that can be considered representative of hillslopes in the Peloritani Mountains landslide-prone area (see Fig. 1).Application to a hillslope of definite characteristics enables us to isolate the impact of triggering rainfall identification uncertainty.Regional determination of thresholds also contains factors of uncertainty related to the heterogeneity of landslide characteristics.However, the assessment of this combined uncertainty is out of the scope of our present analysis.The Peloritani area has been affected several times by catastrophic shallow landslide phenomena in the past, including the 1 October 2009 disaster, which has been analysed and described in several studies (Cama et al., 2017;Schilirò et al., 2015aSchilirò et al., , b, 2016;;Stancanelli 2014).Nevertheless, for the purposes of this study, we focus our analysis mainly on the hypothetical case of no pressure head memory (τ M = 0), so that the main source of uncertainty considered in threshold determination is that related to identification of triggering rainfall events.In other words, in the "ideal" simulations described above, the only uncertainty present is that of rainfall intra-event intensity variability, which is relatively small, so that a landslide-triggering threshold expressed in terms of rainfall duration and intensity performs almost perfectly (Peres and Cancelliere, 2014).For completeness, however, we present a secondary analysis including antecedent rainfall memory with τ M = 2.75 days.Table 2 shows some characteristics of the 1000-year-long synthetic databases, which do not change among the different scenarios illustrated in the following section.

Simulation of uncertainty in triggering rainfall identification
As already mentioned, the available triggering instants from real landslide inventories are seldom precise.On the other hand, the instants at which landslides are triggered are known exactly (on hourly resolution) for the synthetic series illustrated in previous Sect.2. We then introduce errors into this synthetic dataset by hypothesizing the way such an information may be retrieved from newspapers, and similar resources (blogs and fire brigade reports), which are the main primary sources available to build landslide historical inventories (e.g.Guzzetti and Tonelli, 2004).We suppose that only the date of the landslide is reported, with some delay (See Fig. 2).For a landslide to be reported on day D, it has to be spotted within a time interval we denote as the "observers' day" D .Then the user of the landslide archive (the analyser), makes an interpretation of the available information, i.e. chooses an instant of the reported day of landslide occurrence to search backwards for the trig- , which may induce a random error e i = t i − t i in landslide-triggering instants.In particular, a landslide that occurs within the observers' day is reported at day D and attributed to the end of the same day (small delay reporting scenario, RS1) or to its beginning (anticipated reporting scenario, RS3).It can also be reported at day D + 1 and then attributed to the end of it (large delay reporting scenario RS2).These scenarios can be described in terms of two parameters: T O , the ending hour of the observers' day, and T A , the triggering instant, referred to hours 00:00 of day D, assumed by an analyser who interprets the newspaper-like information.
gering rainfall.In particular, the ith landslide observed at t i within the observers' day D , i.e. hours [T O − 24 h, T O ] of day D, is assumed by the analyser to be triggered T A hours after the start of day D (civil day D starts at 00:00).The observer day is made of the hours in which observers can report a landslide on day D. We assume that the observer day is given by hours going from 18:00 of day D − 1 to 18:00 of day D(T O = 18 h); this choice is an attempt to resemble usual working hours, and the fact landslides occurring by night may be reported the morning after.The analyser time, T A , is the instant of landslide triggering as considered by whoever analyses the data (the "analyser") to derive landslidetriggering thresholds, counted from the beginning of day D. This way to process the data introduces a sampling error and a shift between the actual instant at which the generic landslide i is triggered, t i , and that assumed by whoever analyses the data, t i .Hence, the error for the ith landslide is given by e i = t i − t i . (1) These errors are implicitly random, since though t i are deterministically chosen, the actual instant t i varies in an aleatory fashion according to rainfall time history.
A positive error can be in general considered as more likely than a negative, since landslides are typically reported some time after they have occurred (Guzzetti et al., 2007(Guzzetti et al., , 2008;;Peres and Cancelliere, 2013).This, however, does not exclude the possibility of a significant number of negative errors, because of temporal shifts in rainfall data, as will be discussed later.
The two parameters T O and T A can be set to simulate a range of scenarios, for which real situations may represent intermediate cases.We perform our analysis based on four scenarios (which include the "ideal" one), hereafter referred to as landslide information "reporting scenarios" (RS), and illustrated in Fig. 2: -Ideal scenario, RS0 (T O = 0, T A = 0; e i = 0 for all landslides).This is the error-free scenario (described in Sect.2) that is considered for definition of the actual instants of landslide triggering, t i .
-Small delay reporting, RS1 (T O = 18 h, T A = 24 h; random in the range 0 ≤ e i ≤ 30 h).A landslide occurring within the interval from night hours of D-1 until the evening of day D (i.e.within the observers' day D ) will be reported at day D.Here we suppose that the analyst attributes the landslide at the end of day D (T A = 24 h), i.e. searches the triggering event backwards from that instant.
-Large delay reporting, RS2 (T O = 18 h, T A = 48 h; random in the range 0 ≤ e i ≤ 54 h).This scenario is similar to the previous, but here larger errors are hypothesized.
We suppose that the landslide occurring during the observers' day D is reported on day D + 1, which is also erroneously assumed by the analyser as the day at which the landslide was triggered.The observer then attributes the landslide at the end day D + 1(T A = 48 h).These timing errors may also be likely when landslides occur on weekends.
-Anticipated reporting, RS3 (T O = 18 h, T A = 0 h; random in the range -18 ≤ e i ≤ 6 h): this case is the same of RS1, but here the analyst searches backwards for the triggering event from the beginning of day D, i.e. at 00:00 (instead of that at 24:00).
Within the context of sampling errors, another point is related to the way rainfall data are collected, specifically for daily data manually measured until some decades ago.A significant amount of papers derive landslide-triggering thresholds using daily rainfall data (Berti et al., 2012;Leonarduzzi et al., 2017;Li et al., 2011;Terlien, 1998).In an ideal situation rainfall intensity should be aggregated from 00:00 to 23:59, i.e. over a civil "calendar day", as illustrated in Fig. 3.With reference to manual collection of rainfall data, this requires that rain gauge be read at midnight of each day, which is an uncomfortable hour.Manual collection of daily data is usually carried out at easier hours.For instance, in Italy, where the widest source of information is the hydrological bulletins (locally known as Annali Idrologici), the operator would measure the rainfall collected in the rainfall bucket every day at 09:00.Thus, daily rainfall in a given day is the amount of rainfall that occurred in the 24 h preceding 09:00 of the same day.As illustrated in Fig. 3, in this case the reported daily  rainfall amounts can be dramatically different than the actual amounts (see also Caracciolo et al., 2017).Identification of triggering rainfall is uncertain also because of the different criteria that one can apply to isolate rainfall events from a continuous time series -Table 3 lists a range of criteria adopted in the literature.Here we analyse how the different criteria can impact the identification of triggering rainfall, both in the case that uncertainty in the triggering instants is present (datasets RS1-RS3) and the case in which it is not (dataset RS0).
The automatic procedure we adopt for isolating events is as follows (see diagram in Fig. 4).First, a minimum rainfall threshold s min is applied to all rainfall pulses at the fixed temporal aggregation.This means that from the original series a new one is obtained, where precipitation pulses less than s min are replaced by zeros.In the diagram, these pulses are coloured in light grey.Afterwards, rainfall events are singled out when separated by zero-rain intervals longer than u min .This parameter is the most important parameter for the identification of rainfall events.With the aim of quantifying how the impact of the errors implied by the different reporting scenarios changes with rainfall identification criteria, various pairs of s min and u min have been set (see Table 4).The described algorithm defines the rainfall event regardless of whether it is associated or not with a landslide.For attributing a rainfall event to a landslide, the cases where the triggering instant is within a dry or a wet period should be analysed separately.In the first case, the landslide is associated with the whole closest event occurring before the landslide; in the other case it is associated with the part of the rainfall event occurring before the triggering instant.Automatic procedures have the advantage of being objective and reproducible and thus more scientifically sound than subjective judgement (Melillo et al., 2015;Vessia et al., 2014); nevertheless, algorithms are suitable to reproduce the latter with a certain level of fidelity (Berti et al., 2012).
Finally, triggering rainfall identification uncertainty is simulated by combining the reporting scenarios, different parameters of the rainfall event identification algorithm, and three rainfall aggregation schemes (hourly, daily correct and daily shifted).This results in 28 combinations for each recession constant value τ M (see Table 4).

Threshold definition, calibration and testing performance
Seventeen  1 More precisely "the algorithm scans a rainfall time series and detects the rainfall events using a moving-window technique: a new event starts when the precipitation cumulated over D T days exceeds a certain threshold E T , and ends when it goes below this threshold.For instance, if D T = 3 days and E T = 2 mm, the rainfall event starts when the cumulative rainfall exceeds 2 mm in 1, 2, or 3 days (that is if 2 mm are exceeded on the first day, the rainfall starts at day 1).Then, the rainfall event stops when it rains less than 2 mm in 3 days; the end of the event is defined as the last of the three days in which the rainfall is greater than zero".D T = 3 days and E T = 5 mm were chosen. 2 The algorithm can be only approximately expressed in terms of s min and u min .In particular, the algorithm additionally excludes "sub-events" with a total event rainfall below a seasonally variable threshold Table 4. Set-up of the numerical experiments.Each set of algorithm parameters is considered for the four hypothesized landslide reporting scenarios.

Aggregation Event identification algorithm parameters
Hourly u min = 24 h, s min = 0.2 mm u min = 12 h, s min = 0.2 mm u min = 6 h, s min = 0.2 mm Daily correct and daily shifted u min = 1 day, s min = 0 mm (Italian database) u min = 1 day, s min = 5 mm ported at rainfallthresholds.irpi.cnr.it, last date accessed 15 January 2018).In spite of this variety, the most widely used threshold is rainfall intensity-duration (ID), as 96 out of 125 (about 77 %) thresholds are of this type, if one includes equivalent rainfall depth-duration (ED) thresholds.Therefore, our analysis adopts this threshold type, which may be defined as follows: where I (L T −1 ) is the mean rainfall event intensity, and D (T) is the rainfall event duration (both defined according to scheme of Fig. 4); α, β>0 are respectively the intercept and slope parameters of the threshold.ED thresholds are equivalent to IDs, since rainfall intensity I is the ratio between event rainfall E (the total depth of a rainfall event) and its duration D; thus, they can be converted into the ID type by simply subtracting 1 from the exponent of duration.
The procedures for the identification of best threshold parameters have historically increased their complexity over time.Earlier works considered lower boundary curves of the triggering events traced with subjective criteria (Caine, 1980).Then more objective procedures have been then proposed, still based on the triggering events only, such as the so-called "frequentist" method (e.g.Brunetti et al., 2010).More advanced approaches are currently used, and these are derived from the analysis of both triggering and nontriggering events.These procedures are more transparent than methods based on triggering events only, as the uncertainty of the thresholds can be assessed through indices based on the confusion matrix or the receiver-operating characteristics (ROCs), that is, in terms of the count of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN) (Table 5).More importantly, these methods are also more robust, since the presence of non-triggering data points makes the choice of the threshold less sensitive to possible errors in the attribution of triggering rainfall event duration and intensity.Here we use these methods of recent application, implicitly assuming that the impact of the uncertainty under analysis is likely to be higher on thresholds derived from procedures based on triggering rainfall only.
Best thresholds can be calibrated by maximizing their performances expressed in terms of suitable metrics.One widely used metric is true skill statistics (Ciavolella et al., 2016;Peres and Cancelliere, 2014;Staley et al., 2013) originally proposed by Peirce (1884): An apparently alternative approach is given by Bayesian analysis (Berti et al., 2012).Indeed, this approach can be interpreted as a special case of the ROC analysis, since Table 5. Confusion matrix for evaluation of landslide-triggering thresholds (assumed here to be of the ID type: I = f (D)).

Actual
where P (L|R) is the probability of landslide occurrence given rainfall exceeding the threshold (a posteriori probability), N T is the total number of rainfall events (triggering and non-triggering), P (R) = (TP + FP)/N T is the probability of rainfall events exceeding the threshold, P (L) = (TP + FN)/N T is the (a priori) probability of landslide occurrence, and P (R|L) = TP/(TP + FN) is the probability of a rainfall event exceeding the threshold, given that a landslide has occurred (known as the likelihood).
Different papers discuss advantages and disadvantages of various indices proposed in natural-hazard forecasting, as one single index is not sufficient to fully describe the confusion matrix (Frattini et al., 2010;Murphy, 1996;Stephenson, 2000).Nevertheless, the choice of a single index is essential to keep the calibration procedure simple, i.e. a singleobjective optimization problem.Hence, here we calibrate thresholds by maximizing the TSS.One advantage of the TSS is that it includes all the entries of the confusion matrix, and thus its maximization yields thresholds that result in a good trade-off between correct and incorrect warnings/nonwarnings.
Once thresholds for each RS scenario are derived, the TSS and the confusion matrix provide a measure of the uncertainty inherent in the data, as assessable by whoever derives the threshold and is not aware of the errors.On the other hand, it is also of interest to test how a threshold derived from erroneous data may perform when, after its determination, it is applied to precise monitored data, and is thus potentially free of the errors present in the threshold calibration dataset.In order to do this, the calibrated thresholds are applied to the error-free synthetic dataset (Sect.5).The performances in this test are indicative of the impacts of errors when thresholds are actually used.
4 Impact of uncertain identification of triggering rainfall on threshold calibration

Hourly data
Results relative to the use of hourly data are shown in Fig. 5, for a given separation algorithm (s min = 0.2 mm, u min = 24 h).
For the reference dataset RS0, there is a negligible overlapping between triggering and non-triggering events (Fig. 5a), due to intra-event rainfall intensity variability.In fact in this case the best ID threshold (I = 101 D −0.80 ) performs almost perfectly, with a TSS of 0.99 (for u min = 24 h).The presence of small delay reporting errors (RS1) has little impact on the position of triggering rainfall points (Fig. 5b), which in general are shifted slightly down along the intensity axis; this is related to the higher durations produced by positive errors in triggering instants, combined with an induced decrease in mean rainfall event intensities -a general behaviour exhibited by extreme events (cf. the negative slope of wellknown rainfall intensity-duration-frequency curves; see Bogaard and Greco, 2018).Only two rainfall events (2.5 % of triggering events) are highly impacted, Only two rainfall events are highly impacted: see the two events in Fig. 5b whose duration moved to 1 h.The latter, but mainly the former effect, contributes to slightly flattening the threshold for TSS maximization (decrease in β to 0.7).When high delay sampling errors are present (RS2), the effects may not be negligible as in the previous case, as more highly impacted rainfall events are present, now also for significant durations (up to 24 h in the plot, Fig. 5c).These erroneous data points are difficult for an analyser to identify, and thus their impact on threshold determination can be significant and can lead to a lower slope and intercept, i.e. an underestimation of the threshold, which changes to I = 19 D −0.50 (the reference is I = 101 D −0.80 ).The impact of these errors may be more dramatic when thresholds are assessed, making use of triggering rainfall events only, following "traditional", less robust, approaches.
Negative errors, introduced by an anticipation of the real landslide instant (RS3), can have very high impacts, as can be seen from the relative plot in Fig. 5d, and the loss of the correct position of many of the triggering points.The best threshold corresponds to TSS = 0.49, which reflects the high degree of uncertainty implied by these kinds of data errors.

Daily data
Shallow landslides can be triggered by rainfall events that are only few hours long (Bogaard and Greco, 2016;Highland and Bobrowsky, 2008;Sidle and Ochiai, 2013), and various studies have shown that the impact of small-scale intra-event rainfall intensity variability can have a significant effect on landslide triggering (D'Odorico et al., 2005;Peres andCancelliere, 2014, 2016).Hence, apart from the errors in the dataset, it is of interest to see how the change from hourly to daily data may affect threshold determination.This can be done by comparing thresholds determined from the hourly and daily datasets.
Figure 6 shows the results of calibration obtained with correctly aggregated daily rainfall data and s min = 0 and u min = 1 day.As can be seen from the plots, the impact of delayed reporting of landslides (errors RS1 and RS2) is less significant than with hourly data.In fact, though α and β are lower than those determined from hourly data, the threshold determined from daily data passes more or less in the same zone for durations in their range of validity, D>1 day.This is because the smaller slope β in the log-log plane compen-sates for the smaller intercept α.The effect of anticipating landslide time location (RS3) here also has high impacts on the thresholds (Fig. 6d).
Figure 7 plots the results relative to daily rainfall data affected by a delay in the aggregation interval, as present for instance in Italian datasets, and related to availability of data from non-automatic rain gauges.The impacts of this systematic rainfall error can be high (Fig. 7a, b, and d).There is, however, the possibility that the errors due to rainfall aggregation and reporting landslide time interval compensate for each other, as in the case of scenario RS2 (delayed reporting of landslides), Fig. 7c (note that this plot is similar to Fig. 6b).If analysers are aware of the rainfall-aggregation shift, then they should correct as much as possible for this error -in this specific case by shifting the entire daily rainfall dataset 1 day forward.

Possible effects of rainfall separation criteria and antecedent rainfall
Table 6 shows the results obtained by setting the parameters of rainfall event separation algorithms, in the hourly, daily correct, and daily shifted aggregation cases.From the TSS values obtained for hourly data, it can be seen that the impact of RS1 and RS2 increases with decreasing minimum interar- Table 6.Threshold calibration results for all simulations, in the case of nulled effects of antecedent precipitation (τ M = 0).rival value u min .For RS3, differences obtained with different u min are not relevant, since the performances are poor in general (TSS about 0.5).In the case of daily data, the importance of different criteria for separating events (values of the minimum daily rainfall threshold s min ) is relatively lower than in the hourly data case.Though differences in the TSS are not significant, this may not be true for the threshold parameters, which can vary significantly.In fact, higher thresholds are obtained from an increase in s min , because of the decrease in the number of days counted as rainy.
The behaviour related to hourly data is related to the fact that, by choosing lower u min , events generally become shorter, and thus it is more likely that a landslide event is attributed to only a part of the actual triggering event.In this case the effect of preceding rainfall events cannot be neglected in general.In other words, our analysis suggests that the choice of the u min is crucial and must be based on the timescales of the hydrological processes governing landslide triggering, in terms of long-and short-term responses (Iverson, 2000).This means that the effect of different criteria for rainfall separation is somehow related to that of antecedent precipitation.The effects of antecedent precipitation are specifically taken into account performing Monte Carlo simulations with τ M = 2.75 days (results shown in Table 7).For this simulation, regardless of the rainfall separation time interval, the initial water table height measured from the bottom of the soil column is in general greater than zero, becoming negligible after a dry interval of 3τ M = 3 × 2.75 = 8.5 days (exponential decay).As can be seen, the results are qualitatively similar to the no-memory case; the main difference is that lower TSS values are obtained for the added uncertainty due to antecedent conditions, and the thresholds are lower, since less event rainfall is needed on average to trigger a landslide because of non-zero initial wetness conditions.

Impact of uncertain identification of triggering rainfall on threshold use
Thresholds determined based on historical datasets are then meant to be used for early warning systems when, consequently, more detailed meteorological and landslide monitoring is set up.This means that it is reasonable to hypothesize that after thresholds are determined, they are subsequently applied to high-quality datasets, which suffer less from the limitations and errors present in datasets used for threshold calibration, which are generally not conceived for that spe-cific purpose.This might induce modification of the thresholds in view of the new data, but this is a process whose implementation may take several years.Hence, with the aim of determining which would be the consequences of building an early warning system with thresholds derived from historical data with errors, Fig. 8 shows a visual comparison between the thresholds determined in the various numerical experiments and the ideal hourly dataset, for results related to the hourly (Fig. 8a) and daily datasets (Fig. 8b).For the sake of clarity, it may be worthwhile to remember that the dataset of triggering and non-triggering points has been used in calibrating the thresholds only for the RS0 scenario, with hourly data and u min = 24 h and s min = 0.2 mm (the related threshold is shown in Fig. 8 as a thick black line).Thus, the other thresholds are tested against this ideal dataset, which differs from the one used for their calibration.
The plots show that the presence of errors can induce a significant variability of thresholds which is completely unrelated to the different characteristics of a site (i.e. the geomorphological, hydraulic, geotechnical and land use characteristics).This allows for speculation that a significant part of the variability of landslide-triggering thresholds reported in the literature (cf.Guzzetti et al., 2007) may be due to the  sources of uncertainty discussed here.As a consequence, it is challenging to search for links between the variability of physio-geographical characteristics and that of thresholds, as determined from different sites.
The presence of errors in the landslide dataset yields thresholds that are in general underestimated, i.e. lower than the correct ones.Many thresholds in Fig. 8 are significantly lower than the correct ones, and the number of false positives can be relatively high and not balanced by true positives.A good trade-off between correct and wrong predictions is essential for the success of an early warning system, since with a high number of false alarms the so-called cry-wolf effect may occur, inducing the populations to not take precautionary actions when warnings are issued (Barnes et al., 2007).

Conclusions
We have analysed and discussed the possible effects of uncertain triggering rainfall identification on the assessment of empirical landslide early warning ID thresholds, capitalizing on a synthetic rainfall-landslide dataset generated by Monte Carlo simulation.To this aim, we have investigated the effect of a set of hypothesized scenarios of landslide information retrieval and interpretation which can induce errors in the identification of instants of landslide occurrence.Moreover, we have analysed how the impact of reasonable scenarios may vary depending on rainfall aggregation (hourly or daily) and rainfall event identification criteria.Real situations may be a mixture of the considered scenarios, and thus the impacts are presumably intermediate between the ones hypothesized.
The errors in the time instants can be, in an algebraic sense, positive or negative, according to whether a landslide is reported after its actual occurrence or before, respectively.According to the literature, positive errors are more likely than negative, since a landslide is typically reported some time after its actual occurrence.Our analyses have shown that if such errors are limited to less than 30 h (about 1 day), their impacts on the threshold may be relatively low; yet if the delay is higher, impacts can be significant.Negative errors, though less probable, can also exist, based on how an analyst interprets the information retrieved from landslide historical archives.The impact of these errors can be dramatic, as the location of triggering events in the logD -logI plane can be completely altered.Errors in landslide-triggering instants can lead to triggering events that are shorter than the actual ones, so that their effect is to induce an incorrect identification of triggering rainfall for short durations.For higher durations (> 1 day), the location of triggering events seems to be more robust, except when negative errors are present.This behaviour induces a flattening of the ID thresholds (i.e. a lower slope β) and an underestimation of the position parameter of the threshold (i.e. a lower intercept α).
The impact of reporting errors can change significantly depending on the algorithm adopted for rainfall event identification.Specifically, a shorter "maximum dryness" interval for event separation induces an increase in the impacts of all reporting scenarios.
From our analysis no significant impacts seem to be induced by the use of daily data; however, it is of fundamental importance to check, and correct where possible, for the presence of delays in the rainfall accumulation interval, that is, if precipitation reported for a given day is the total amount that occurred in a shifted period (e.g.within the 24 h preceding 09:00 of that day rather than before midnight).Such a shift affects, for instance, the Italian Hydrological Annual Reports, which constitute the largest rainfall data collection in Italy.The impacts of these shifts are potentially dramatic.
Overall, the presence of reporting errors in landslidetriggering instants yields underestimated thresholds, making them less suitable for setting up landslide early warning systems, as they can lead to a high number of false alarms, generating distrust by populations that are expected to benefit from their implementation.Similar effects have been found as a consequence of rainfall measurement uncertainty on thresholds (Nikolopoulos et al., 2014).These two sources of errors -always present in observed datasets -are alone enough to generate an uncertainty in threshold assessment that is of significant magnitude.These results bring us to the conclusion that the uncertainty inherent in the available data can jeopardize the possibility to find a physically based rationale underlying the variability of empirical landslidetriggering thresholds across different sites.In other words, with the quality of current available data, attempts to relate thresholds to climate and other regional characteristics can be very difficult.An improvement in landslide and rainfall monitoring -e.g.rainfall, soil moisture and landslide satellite data, as well as landslide data crowdsourcing (Guzzetti et al., 2012;Strozzi et al., 2013;Wan et al., 2014) -may be a step forward for overcoming these problems.Once accurate rainfall-landslide data are available, standardized methodologies must be implemented to derive the thresholds in order to allow their comparisons and to link their variability to site-specific landslide susceptibility factors.
Special issue statement.This article is part of the special issue "Landslide early warning systems: monitoring systems, rainfall thresholds, warning models, performance evaluation and risk perception".It is not associated with a conference.
Acknowledgements.David J. Peres was supported by post-doctoral contract on "Studio dei processi idrologici relative a frane superficiali in un contesto di cambiamenti climatici" (Analysis of landslide hydrological processes in a changing climate), at University of Catania.Part of the work was developed during his three-month stay as a visiting researcher at the Water Resources Section of TUDelft.The authors thank the two anonymous reviewers for their comments, which helped to considerably improve the paper.
Edited by: Luca Piciullo Reviewed by: two anonymous referees

Figure 1 .
Figure 1.Location of the Peloritani Mountains area in Sicily, Italy, and of the Fiumedinisi rain gauge.

Figure 2 .
Figure 2. Diagram illustrating simulation of uncertainty in triggering instants likely present in landslide inventories built from newspapers or similar sources.The black numbered circles indicate one of the reporting scenarios (RS), which may induce a random error e i = t i − t i in landslide-triggering instants.In particular, a landslide that occurs within the observers' day is reported at day D and attributed to the end of the same day (small delay reporting scenario, RS1) or to its beginning (anticipated reporting scenario, RS3).It can also be reported at day D + 1 and then attributed to the end of it (large delay reporting scenario RS2).These scenarios can be described in terms of two parameters: T O , the ending hour of the observers' day, and T A , the triggering instant, referred to hours 00:00 of day D, assumed by an analyser who interprets the newspaper-like information.

Figure 3 .
Figure3.Aggregation of rainfall data from hourly to daily timescale: daily rainfall depths on the top row result from correct aggregation; those on the bottom row are from shifted aggregation, as occurs for the Italian hydrological bulletins (Annali Idrologici).The shift is due to manual collection of data in early decades of operation of the monitoring network; the presence of the shift is still continued, in spite of installation of automatic rain gauges, to preserve homogeneity of the entire historical time series.

Figure 4 .
Figure4.Sketch illustrating the algorithm for the identification of triggering and non-triggering rainfall events, and relative parameters s min and u min .When a landslide is triggered in a dry period, it is attributed to the whole event preceding it; otherwise, only the part of the event preceding the landslide-triggering instant is considered.For non-triggering rainfall (the first one in the diagram), duration and intensity are computed considering the entire rainfall event.
probability is equivalent to the ROCbased precision (PRE):

Figure 5 .
Figure 5. Scatter plot, in the double-logarithmic rainfall duration-intensity plane, of triggering and non-triggering events for hourly data and separation algorithm parameters u min = 24 h, s min = 0.2 mm.Thresholds correspond to the maximum performance in terms of true skill statistic.The plots show outcomes relative to (a) reference RS0, and (b-d) various erroneous reporting scenarios (RS1, RS2, RS3).

Figure 6 .
Figure 6.Scatter plot, in the double-logarithmic rainfall duration-intensity plane, of triggering and non-triggering events for daily data and separation algorithm parameters u min = 1 day, s min = 0. Thresholds correspond to the maximum performance in terms of true skill statistic.The plots show outcomes relative to (a) reference RS0 and (b-d) various erroneous reporting scenarios (RS1, RS2, RS3).

Figure 7 .
Figure 7. Scatter plot, on the double-logarithmic rainfall duration-intensity plane, of triggering and non-triggering events for daily data with aggregation shift as in the Italian rainfall databases.Separation algorithm parameters are u min = 1 day, s min = 0 mm.Thresholds correspond to the maximum performance in terms of true skill statistic.The plots show outcomes relative to (a) reference RS0 and (b-d) various erroneous reporting scenarios (RS1, RS2, RS3).

Figure 8 .
Figure 8.Comparison of thresholds, calibrated in the various scenarios and event identification parameters, with the correct hourly dataset.Thresholds determined with (a) hourly and (b) daily data (both correct and with aggregation shift) are distinguished.Correct thresholds are relative to the following event identification parameters: u min = 24 h, s min = 0.2 mm, and u min = 1 day, s min = 0 mm, for hourly and daily data, respectively.These plots are representative of how thresholds calibrated with uncertain information of triggering rainfall data may perform in early warning systems that use high-quality rainfall and landslide monitoring.

Table 2 .
Some characteristics of the ideal Monte Carlo simulation dataset.

Table 3 .
Some rainfall event identification algorithms found in the literature.

Table 7 .
Threshold calibration results for all simulations, when antecedent precipitation memory is present (τ M = 2.75 days).