We present a new method to generate spatially coherent river discharge peaks over multiple river basins, which can be used for continental event-based probabilistic flood risk assessment. We first extract extreme events from river discharge time series data over a large set of locations by applying new peak identification and peak-matching methods. Then we describe these events using the discharge peak at each location while accounting for the fact that the events do not affect all locations. Lastly we fit the state-of-the-art multivariate extreme value distribution to the discharge peaks and generate from the fitted model a large catalogue of spatially coherent synthetic event descriptors. We demonstrate the capability of this approach in capturing the statistical dependence over all considered locations. We also discuss the limitations of this approach and investigate the sensitivity of the outcome to various model parameters.

The author's copyright for this publication is transferred to HR Wallingford, Deltares and GFZ.

Flood events cause a large amount of damage worldwide

Widespread flooding can potentially cause large amount of damage in a short time
window. Continental events and, for instance, maximum probable damage are of
interest. In particular, the (re)insurance industry wants to know the chance
of a widespread portfolio of assets being affected in a short time window

River discharge waves may cause the exceedance of bankfull conditions or may cause dykes to fail. They are dynamic, i.e. show a wave-like behaviour. Travel times of discharge waves in large river basins can be long, i.e. time lags between discharge peaks at different locations can be large. With large travel times, a new discharge wave may be generated upstream, while the previous discharge wave has not yet reached the river mouth. Furthermore, discharge waves in river basins are triggered by atmospheric events that may span across multiple river basins. Finally, discharge waves in different river basins may be related to a single atmospheric event but do not occur at the same time, since catchments have different response times. With an increasing spatial domain, dynamic events start overlapping in time and merge into a space–time continuum. For a continental FRA, the challenge arises of how to define observed continental river discharge events and how to simulate synthetic continental river discharge events while retaining the observed statistical properties in space (spatial dependence/coherence).

We distinguish between two groups of event identification methods: methods based on time blocks and methods based on dynamic events. Using blocks, events are defined within fixed time windows and described by their statistical properties, e.g. annual maximum discharge. The main advantage of the blocks method is its simplicity, allowing statistical properties to be rapidly captured. Dynamic events are defined as events with spatially varying time windows, which are based on the discharge values. As described above, for large spatial domains small dynamic events at different locations may overlap in time and form one single long-lasting spatio-temporal event. Hence, a practical definition of dynamic space–time windows is required.

We analysed pan-European discharge waves in the space–time continuum, which are characterised by significant time lags between peaks at distant locations. We applied a new method of dynamic event identification where we aimed to capture discharge events in each major European river basin, after which we used a block-based time window method to merge them to spatially coherent, pan-European events. We described the pan-European discharge events by their peaks, with which we parameterised a stochastic event-based generator of event descriptors. Using the generator, we simulated synthetic descriptor sets, after which we compared the statistical properties of the synthetic sets to those of the observed. Finally, we discussed the main limitations of the methodology and the choice of parameter settings.

We used a gridded discharge reanalysis data set covering major river networks
in Europe, which was obtained with the well-established LISFLOOD model

In order to keep the computational costs reasonable, the network was reduced
to the major streams and tributaries. This means that, although the input
data were 2-dimensional in space (

In this study there were two objectives: first, to capture the spatial dependence structure between peaks of discharge events at different locations spread throughout Europe (OBJ1) and, second, to generate a large catalogue containing synthetic discharge peaks, filling up the observed distributions while retaining the observed dependence structure (OBJ2).

The applied framework, which comprises three steps. First, events are identified in the observed data. Second, the observed events are described, providing a matrix of observed descriptors. Third, multivariate statistics is applied to generate a large matrix of coherent synthetic descriptors.

The network of major European rivers and a subset of 298 representative locations.

The framework for the generation of synthetic peak sets consisted of three consecutive steps; see Fig.

We considered the following as the key features for the quality of the generated catalogue of synthetic event descriptors. First, it should contain descriptions of a much larger variety of hypothetical (synthetic) events than the events identified in the observed data (KF1). Second, the dependence structure of the synthetic catalogue needs to agree with that of the observed, since the catalogue of observed event descriptors should be a likely subset of the synthetic catalogue (KF2).

When using the popular peaks-over-threshold method (POT) per location, all
events below a particular threshold are dropped. This is appropriate for
event identification only when events show a homogeneous
extremeness per location. However, when studying discharge waves moving
through the river network by extremeness per location, a heterogeneous
behaviour can be expected. Relatively extreme events upstream may become less
extreme while moving downstream when the lower part of the river basin is not
activated. Or, in contrast, relatively non-extreme events at different
upstream branches can generate a relatively extreme event at confluences
downstream due to wave superposition. To address the heterogeneity, we
developed a new noise removal algorithm to capture local events, which manages
to eliminate small local peaks that are part of a bigger event (noise) while
retaining small events that may be spatially connected to larger events
upstream or downstream. This is a key feature of the wave tracking, which
will be introduced in Sect.

The procedure of NR is as follows. First, all local minima

Define a series

Either calculate the NR value window

Find

Define a series

Either calculate the NR time window

Find

We set the NR value window fraction relatively low

River discharge waves propagate through the network in downstream direction,
introducing time lags between the moments the waves pass at different
locations. Time lags are difficult to estimate, because the celerity of river
discharge waves can be highly non-linear. The wave celerity is a function of
the hydraulic depth and changes in a non-linear way when overbank flow occurs
and floodplains become inundated. When grouping local events to events that span multiple locations, time lags are
typically addressed using time windows. The gridded data set used in this
study allowed us to try a new method of combining local events to river basin
events, which we refer to as “wave tracking”. Each location in the river
network is physically connected to its neighbouring locations, which allows
waves to be tracked throughout the entire river network. Wave tracking is
robust to non-linearities in the wave celerity, and therefore it allows us to
better address time lags, so that, when we compare peaks at different
locations in Sect.

To track river discharge waves, we applied the following procedure. First, we
separated local events by applying NR to time series at every location in the
river network, where of each local event we retained the day of the peaks

Precipitation events, which are the main driving source of river discharge events, span across different river basins. Therefore, large discharge events in adjacent river basins are likely to be correlated. To account for this correlation, we had to define events that included discharge waves across different river basins (in this study pan-European events). Since discharge waves do not span across different river basins (by definition), such events should be connected to each other in a different way. Discharge waves in different basins are not synchronised, which adds additional complexity. In order to obtain a method to construct pan-European events, which on the one hand considers discharge waves in river basins and on the other hand accounts for trans-basin dependence, we propose a combined approach of wave tracking and global time windows.

The following procedure was adopted. First, we set up subsequent global time
windows with a length of

Daily snapshots of a Pan-European event with a large spatial extent. Date format is yyyy-mm-dd.

We aimed to describe the pan-European events by their peak discharge at 298
representative locations on the river network. However, the pan-European
events did not yield discharge peaks at all representative locations for each
event, i.e the observed descriptor matrix had gaps. To be able to capture the
spatial dependence structure in Sect.

We applied the following procedure. At locations where an event occurred, we extracted the discharge peak. Where no event occurred (36 % of the entries in the observed descriptor matrix), we filled the gaps using auxiliary values. Per representative location (i.e. column-wise), we set up a number of local time windows in between the peaks of identified events, corresponding to the number of gaps between those respective peaks. Within each of these local time windows, we selected the maximum value as auxiliary value. This procedure resulted in a (complete) observed descriptor matrix.

Figure

Correlation of descriptors at all representative locations vs. descriptors at Vienna (black dot).

In order to align with the corresponding literature in statistical models for
multivariate extreme values, in Sect.

We fitted generalised Pareto distributions (GPDs)

To be able to capture the dependence between sets of descriptors (i.e. rows
in the observed descriptor matrix), we started by transforming the marginals
to the uniform space. This transformation is applied in many other analyses,
e.g. copulas

The dependence structure of the non-extreme part was captured using a
non-parametric, multivariate kernel density model with Gaussian kernels. We
transformed the (entire) uniform marginals to the normal space, with the mean

To capture the dependence of the extreme part we chose the model of

To fit HT04, we transformed the (entire) uniform marginals to the Laplace
space

We split the observed uniform descriptor matrix into a non-extreme part
and an extreme part. Each row in which not a single descriptor exceeded an
extremal simulation threshold

Using multivariate extreme value analysis, we extended the observed
descriptor matrix with synthetic data, obtaining a (large) synthetic
descriptor matrix. The patterns in the larger synthetic descriptor matrix had
to match the patterns found in the smaller observed descriptor matrix. We
focused on two main patterns: marginal distributions (a column-wise pattern)
and dependence structure (a row-wise pattern). To respect the fitted marginal
distributions and simultaneously retain the dependence structure is
challenging. There is no perfect method for these two objectives. We chose to
respect the distributions fitted to the observed marginals, for which we
transformed the synthetic marginals to follow the corresponding observed
distributions, as described in Sect.

To further investigate the dependence structure, Fig.

Figure

Observed (purple) vs. synthetic (yellow) descriptors at three locations. In the diagonal, distributions of observed and synthetic descriptors per location are compared using box plots. Below the diagonal, pairwise scatter plots are displayed. Above the diagonal, pairwise correlations are displayed.

Spatial correlation of the observed descriptors vs. the synthetic
descriptors, summarised by pairwise Spearman correlation. Panel

Following up on the general check for correlations between the entire
distributions of descriptor sets, we specifically checked if we managed to capture the
tail-end correlations. Figure

Spatial extremal correlation of the observed descriptors vs. the
synthetic descriptors. For a selection of high quantiles we counted the
fraction

Historically, observations have been made at specific locations, e.g.
discharge gauge stations at certain locations along rivers. Therefore, most
event identification methods are designed for local frequency analysis of
discharge waves, starting with the identification of local events, i.e.
events at certain locations, based on temporal dynamics

In Sect.

Figure

Sensitivity of

In Sect.

Figure

Sensitivity of

A recent, more comprehensive study of the sources of uncertainty in a
probabilistic flood risk model was provided by

The generated synthetic descriptor catalogue can be used to drive an
event-based chain of models, which may cascade from a hydraulic model of the
river network coupled with an inundation model to damage and/or life safety
models. To drive an inundation model, synthetic discharge events have to be
reconstructed from the synthetic descriptor sets in the catalogue, which
corresponds to what would be step 4 in Fig.

However, these difficulties are not specific to this analysis but apply for all analyses in which an event-based approach is combined with descriptors per location. The catalogue of synthetic discharge event descriptors was provided at 298 locations for a synthetic period of 10 000 years (with stationary climate conditions). Both the number of locations and the number of synthetic years can be expanded to provide a more detailed coverage. This catalogue will be used to generate discharge hydrographs to drive a pan-European inundation model for continental, event-based flood risk assessment.

We used a new “noise removal” and “wave tracking” method with which we successfully identified discharge waves in all major European river basins. Using global time windows, we clustered these river basin events to pan-European events. With a mixture multivariate dependence model, we managed to capture the dependence structure between discharge peaks of daily discharge at 298 different locations on the river network of major European rivers. We created a catalogue of spatially coherent synthetic event descriptors, containing 10 000 years of synthetic discharge peaks with a dependence structure that is similar to that in the observed data, thereby showing spatially coherence. This catalogue is a starting point for the exploration of the range of possible scenarios of pan-European flooding and associated probabilities, which is the foundation of flood risk assessment.

This research was applied to a modeled data set that is not publicly available. The focus of this paper is on the methodology, which we think is generic and will work on similar data sets.

The video
supplement (

DD did the research. He developed the main ideas brought forward in this paper and wrote the R code to obtain results and figures. YL was involved in all aspects of the research, with a specific focus on the statistical section. He contributed significantly to the writing of this paper. BG devised the idea of applying the current methodology for joint discharge peaks to a continental domain. FD provided general feedback and in particular provided the method used to check the extremal dependence. SV provided general feedback, helped improve the manuscript's positioning and significantly helped clarify the applied methodology.

The authors declare that they have no conflict of interest.

This article is part of the special issue “Global- and continental-scale risk assessment for natural hazards: methods and practice”. It is a result of the European Geosciences Union General Assembly 2018, Vienna, Austria, 8–13 April 2018.

This research is part of the System-Risk project. We thank the European Joint Research Centre for providing the European discharge data set. We thank Dominik Paprotny, Brenden Jongman and Hessel Winsemius for their critical remarks, which increased the quality of the manuscript.

This research has been supported by the Horizon 2020 (grant SYSTEM-RISK (676027)).

This paper was edited by Hessel Winsemius and reviewed by Dominik Paprotny and Brenden Jongman.