The purpose of this study is to propose the Bayesian network (BN) model to estimate flood peaks from atmospheric ensemble forecasts (AEFs). The Weather Research and Forecasting (WRF) model was used to simulate historic storms using five cumulus parameterization schemes. The BN model was trained to compute flood peak forecasts from AEFs and hydrological pre-conditions. The mean absolute relative error was calculated as 0.076 for validation data. An artificial neural network (ANN) was applied for the same problem but showed inferior performance with a mean absolute relative error of 0.39. It seems that BN is less sensitive to small data sets, thus it is more suited for flood peak forecasting than ANN.
Floods are the most threatening natural disaster across the world (Hénonin et al., 2010). Studies show that over 80 % of the cities of Iran are at the risk of flooding (Chitsaz and Banihabib, 2015). Flood warning is an efficient way to reduce the flood damage. However, many flood forecasting systems in the world rely on observed rainfall, and thus the lead time of these systems is often short for small basins (Banihabib and Arabi, 2016). Numerical weather prediction (NWP) models can be used to increase the lead time of flood warning by using in advance forecasts of rainfall. Although the combination of NWP and hydrological models can significantly increase the flood warning lead time rather than using observed rainfall, the deterministic weather prediction does not reflect the existing uncertainties. Thus, in the last decades, many operative and research on flood forecasting systems around the world are increasingly employing ensembles of NWPs instead of single deterministic forecasts, which have considerable uncertainties (Goodarzi et al., 2019). Ensemble methods are considered to be an effective way to estimate the probability of future states of the atmosphere by addressing uncertainties present in initial conditions and in model approximations (Tennant et al., 2007). Various approaches have been developed to produce atmospheric ensemble forecasts including perturbing the initial conditions, perturbing the input parameters of the model, using multi-model ensembles and using different parameterization schemes (Yang et al., 2012).
One of the most important parameterization schemes is the cumulus parameterization. NWP models often use cumulus parameterization schemes (CPSs) to consider the effects of cumulus clouds which are not represented in modelling as they are much smaller than the model grid size (Pennelly et al., 2014). Common CPSs are presented in Table 1.
Common cumulus parameterization schemes.
Kerkhoven et al. (2006) compared various CPSs for a summer monsoon in east China and found that the Kain–Fritsch scheme is the best scheme at simulating moderate rainfall depths. Pennelly et al. (2014) applied the Weather Research and Forecasting (WRF) model with diverse cumulus parameterization schemes for three flood events in Alberta, Canada, and they showed that the Kain–Fritsch and explicit cumulus parameterization schemes were the most accurate for simulating the rainfall. Other studies indicated that ensemble forecasting is promising for predicting heavy rainfall (Deb et al., 2008; El Afandi, 2013; Li et al., 2014).
Ensemble meteorological forecasting is widely coupled with a hydrological model to predict stream flow ensembles. Li et al. (2017) coupled the WRF model with a distributed hydrological model for flood forecasting in a large watershed in southern China. The results suggest that the simulated floods are rational and could benefit the flood management communities due to their longer lead time. Rogelis and Werner (2018) assessed the potential of NWP models for flood early warning in tropical mountainous watersheds. The results showed that the streamflow forecasts resulted from a hydrological model forced by post-processed rainfall using the WRF, and added value to the flood early warning systems.
Only few case studies report how flood hydrographs derived from atmospheric ensemble forecasts (AEFs) can be converted into warning decisions during a flood event. Li et al. (2017) and Abebe and Price (2005) used the exceedance of critical thresholds. Dietrich et al. (2009a) used the quantile of the predicted flow ensemble. Yang et al. (2016) integrated ensemble rainfall forecasts, rainfall thresholds and a real time data assimilation method. Leandro et al. (2019) reduced the ensemble to the upper and lower range of the uncertainty band. Other concepts of deriving a single (deterministic type) warning indicator from ensembles are weighting of ensemble members, e.g., averaging by Bayesian model average (Raftery et al., 2005), by machine learning (Doycheva et al., 2017) or by reduction of members to create a multi-model sub-ensemble (Dietrich et al., 2009b).
According to previous studies, converting the ensemble forecasts into warnings and also deriving a single warning indicator from ensembles are not yet adequately considered and remain a challenging question in ensemble-based flood warning. The main objective of this study is to propose the Bayesian network (BN) model to estimate the flood peak from a meteorological ensemble forecast without employing a hydrological model. BN has been widely used by researchers in many water resources fields. Applications of BN in water resources can be found in studies of Mediero et al. (2007), Sharma and Goyal (2016) and Shin et al. (2016). Phan et al. (2016) reviewed 111 BN applications in water resources management but only four were in the domain of river flow, five were in operational decision making context and none in operational flood warning. BN application in ensemble flood forecasting has not been reported yet to our best knowledge.
In previous studies, meteorological ensemble forecasts are coupled with a hydrological model to predict a set of flood hydrographs with different peak discharge. Ensemble decision making according to a range of possible flood peaks is a challenging issue especially in case of equal likelihood of each ensemble member. In the present study, the hydrological model is replaced by a Bayesian network for deriving a single warning indicator from atmospheric ensemble forecasts.
The purpose of the present study is therefore to predict the flood peak addressing the uncertainties and the probability of occurrence of each ensemble member. Floods are rare extreme events that occur with low frequency in the studied area. Thus, one of the problems in flood modelling is small data size. In the present study, we try to deal with small data size by using Bayesian network, which is less sensitive to small data size (Zhang and Bivens, 2007). As a case study, flood peaks were forecasted in a relatively small mountainous basin, Kan basin, Tehran, Iran. The Weather Research and Forecasting (WRF) model was used to simulate 14 historic precipitation events using five different cumulus parameterization schemes. Then atmospheric ensemble forecasts were coupled to the BN to estimate the flood magnitude for an ensemble forecasting, from which flood warnings could be derived. Forecasting performance of the BN was compared with the results obtained from an artificial neural network (ANN), which is a widely used data-based model.
The case study of this research is Kan basin, Tehran, Iran with an area of
197 km
Precipitation and streamflow data.
Location of study area, rainfall and flow stations.
Flow chart of the flood forecast approach in this research.
The Weather Research and Forecasting (WRF) model was used to simulate 14 historic heavy precipitation events that caused floods in the study area. In this study, WRF version 3.8 was employed with three domains and 1 h temporal resolution. The horizontal resolutions of the domains are 45, 15 and 5 km, respectively. Figure 3 shows the WRF domain setup using an interactive nested domain inside the parent domain. The outer (the coarsest) domain covers Iran, the middle domain covers the northern part of Iran and the inner domain covers the study area, and only the meteorological information from this domain was used for forecasting of flooding in the study basin.
The NCEP Global Forecast System (GFS) final analysis (FNL) data were used as the initial conditions of the WRF. The model settings were based on the Noah land surface model (Chen and Dudhia, 2001), the Rapid Radiative Transfer Model (RRTM) longwave radiation scheme (Mlawer et al., 1997), the Dudhia shortwave radiation model (Dudhia, 1989), the Yonsei University (YSU) planetary boundary layer scheme (Hong et al., 2006) and the WRF Single-Moment (WSM) three-class microphysics scheme (Hong et al., 2004). Because of the importance of cumulus parameterization for hydrological purpose, an ensemble was created by using five cumulus schemes including KF, BMJ, GR3D, MSKF and GDE cumulus scheme. The atmospheric ensemble forecasts were fed into the Bayesian network to estimate flood peak flow.
WRF domain setup using an interactive nested domain inside the parent domain.
This study proposes a probabilistic model to generate the flood forecasts
and to estimate the flood magnitude based on Bayesian networks (BN) for an
ensemble forecasting. BNs are a class of probabilistic graphical models
composed by a set of random variables and directed acyclic graphs (DAGs) to
show the potential dependence between variables (Scutari, 2017). The node at
the start of an arrow is a casual or preceding event that is called the parent
node, and the node at the head is an outcome event that is called the child node.
Each node is labelled with a conditional probability table (CPT) based on
prior information or statistically observed correlations that show the
strengths of the influences of the parent nodes on the child node. In
general, assuming random variables with domain size
The goal is to calculate the posterior conditional probability distribution
of each of the possible unobserved causes given the observed evidence, i.e.
However, in practice we are often able to obtain only the converse
conditional probability distribution of observing evidence given the cause,
An example of a graphical Bayesian network.
The graph containing nodes and arrows is called BN structure (BS). Learning a Bayesian network includes two aspects: structure learning and parameter learning.
Our proposed ensemble forecasting using a BN model has the following four
steps:
selecting relevant variables and spatial units, creating training data set for the model, learning the model using the HUGIN software (version 8.3) and evaluating the performance and accuracy of the model. NCEP (GFS During the above-mentioned period, a small number of actual flood events
occurred in the study area, since the basin is located in a semi-arid
region. There is a lack of flood data because of flood damage to
hydrometry equipment in some floods.
In the present study, the flood peak is the response variable that is
influenced by some predictor variables including ensemble rainfall
forecasts, base flow of the river and antecedent soil moisture. Base flow of
the river is the normal day to day discharge. Antecedent recharge flow was
used as the base flow of the river. The catchment's antecedent soil moisture
represents the relative wetness prior to the flood event and can have an
important influence on flood response. Because of the lack of soil moisture
data in the Kan basin, antecedent rainfall was used to represent the soil
moisture. Antecedent rainfall is the
total precipitation amount that occurred in the 24 h before the start
of the event. This study was performed on 14 historical storms. It should be
noted that approx. 70 % of the available data (10 storm events) are
allocated for training and the remaining (4 storm events) data are used for
validation. The data sample is relatively small due to the following
reasons.
Considering the relatively small sample size, we proposed using the BN that
is less sensitive to small data set size in comparison with ANN. Some
advantages of BN are as follows.
Artificial neural networks (ANN) are used as an alternative of statistical models in different aspects including clustering analysis, estimation, sample recognition etc. (Mammadov et al., 2005). An ANN model is basically an engineering method of biological neurons. It is constructed by input, output and hidden layers. ANN consist of a large number of simple processing elements, which are interconnected with each other and also layered (Sharma et al., 2012).
Typically, there are four distinct steps in developing an ANN model. The
first step is data transformation or scaling. The input and output variables
are first normalized linearly in the range of 0 and 1 using the following
equation:
The second step is the network architecture definition in which the number of hidden layers, the number of neurons in each layer and the connectivity between the neurons are determined. The number of neurons and hidden layers is problem dependent and is estimated by the trial and error technique or expert experience. A synaptic weight is allocated to each link to represent the relative connectivity strength of two nodes at both ends in predicting the input–output relationship (Raju et al., 2011). A typical ANN architecture is presented in Fig. 5. In this study, the output from the model is the flood peak, and the input variables are atmospheric ensemble forecasts, base flow of the river and antecedent rainfall. The third step is using a learning algorithm to train the network with known data. There are several learning algorithms. In the present study, the most widely used feed-forward error back-propagation algorithm was used for training because of the good performance of this algorithm in previous studies (Raju et al., 2011; Banihabib et al., 2015; ASCE, 2000; Sarkar and Kumar, 2012). The success of an ANN application depends on the quality and also the quantity of the available data (Cheng et al., 2017). The final step is the validation, in which the performance of the trained ANN model is evaluated using statistical criteria (Sarkar and Kumar, 2012).
Typical ANN architecture.
In the present study, mean absolute relative error (MARE), mean relative
bias error (MRBE) and the regression coefficient (
In this section, the comparison between the observed and predicted precipitation obtained from the WRF model is addressed. As mentioned earlier, the WRF model was used to simulate 14 historic precipitation events, and the results for some events are presented here. Figure 6 illustrates the predicted cumulative rainfall and the observed cumulative rainfall for these events. In general, the results show that the WRF model was able to capture the heavy rainfall events. The uncertainties in the predicted rainfall lead to a large spread of the ensemble members, and this is why the uncertainty in rainfall forecasting becomes important.
The ensemble precipitation illustrates that both overestimation and underestimation of precipitation occurs using various schemes. Overestimation is very noticeable for the early hours of forecasting, while for the last period of the event underestimation occurs in some schemes.
From the case study, the results of precipitation forecast using different cumulus schemes by the WRF model can be significantly different. Therefore, it is necessary to forecast precipitation by implementing various physics schemes, especially different microphysical schemes. Furthermore, it can be inferred that the difference between observed and predicted rainfall is mainly caused by the initial condition in the NWP models, thus the atmospheric ensemble forecasts can be produced also by perturbing the initial conditions.
The ensemble forecasted precipitation and the observed cumulative precipitation.
The atmospheric ensemble forecasts were fed into the BN to estimate flood
peak flow. Ten various models were developed using various combinations of
predictors. In all of the combinations, flood-peak discharge is the
predicting variable. Table 3 shows the accuracy of the model for different
combinations of predictors to compare the performance of the prediction. The
performance of the model was evaluated by MARE and
According to Table 3, the best results were obtained for combination no. 5.
The proposed structure of this combination is composed of eight nodes as
shown in Table 4. Atmospheric ensemble forecasts, base flow of the river
and antecedent rainfall are the parent nodes, and flood peak is the child
node. It can also be seen that the base flow is influenced by antecedent
rainfall. The mean absolute relative error was calculated at 0.076 for the
validation data set in the combination no. 5. The coefficient of
determination (
Moreover, Bayesian cluster analysis could also provide probabilistic results for flood early warning, but since the data sample is relatively small in this study, cluster analysis cannot be achieved. This method can be also tested in basins with sufficient historical hydrological data in future works.
Performance of the Bayesian network for different combinations of predictor variables.
The cause–effect relationships among the variables in the proposed structure of the Bayesian network.
The performance of the BN model is compared with the results obtained from an ANN model as a benchmark. The comparison is conducted using the same data set for training and validation. These results are presented in Sect. 3.3.
The first step in developing an ANN model is to determine the input and output variables. The output of the model is the magnitude of flood peak discharge. The input variables are the same as those used for the BN with the best performing combination of predictor variables (Table 3, combination no. 5). The feed-forward error back-propagation algorithm has been employed as the training algorithm in this study. A difficult task in working with ANN is the selection of parameters such as the number of hidden nodes. There is no established algorithm until now to determine how many hidden nodes are required to approximate any given function. Here, we use the common trial and error method to choose the number of hidden nodes, which are varied from two to six according to previous studies (Banihabib et al., 2015). Error index is usually used to select the best performance of the network model compared to observed data. The accuracy of the model for different numbers of nodes in the hidden layer is presented in Table 5. It was found that four hidden nodes give the best results. The mean absolute relative error (MARE) was calculated as 0.39 for the validation data set while this index was calculated 0.076 in BN. The comparison shows that BN offers better accuracy. Although our data set was relatively small, the result of BN model was accurate enough. Therefore, it seems that BN is less sensitive to small data set size, so it is more suited for rare events such as floods, where the available data are limited due to the high return period of such events.
MARE and
This study proposed a probabilistic model to address the uncertainties of flood forecasts using the Bayesian networks (BNs) and to estimate the flood peak in an ensemble flood forecasting. This is the first attempt to use BN in ensemble flood forecasting. The Weather Research and Forecasting (WRF) model was used to simulate some historic precipitation rainfall events using five various cumulus parameterization schemes. The results showed that there is no considerable decrease in accuracy by deleting the Multi-scale Kain–Fritsch scheme, thus it can be concluded that is the least accurate cumulus scheme. It also was found that Kain–Fritsch is the most efficient cumulus parameterization scheme. Atmospheric ensemble forecasts were coupled with the Bayesian network to estimate the flood magnitude in an ensemble forecasting. Results of the BN are compared with the results obtained from an artificial neural network as a widely used model to show the performance of BN. The comparison is conducted using the same data set for validation and training. The results showed that the BN is an efficient method for flood forecasting based on ensemble rainfall forecasts and offers better accuracy than ANN. We showed that BN is less sensitive to small data set size in comparison with other models, thus it is more suited for rare events such as floods. The results of this study indicate that BN might be a suitable tool for a fast computation of peak flow and flood warnings from numerical ensemble weather predictions. Our study is a proof of concept at the current stage that flood warnings can be done by evaluating hydrological pre-conditions and meteorological ensembles by a trained BN instead of a hydrological model. However, further studies are required to confirm the applicability of BN. The present study was conducted with a lead time of 1 d before the observed event in a small basin. Future studies may test BN for other catchments and for larger lead times.
For this study, we used the software HUGIN Educational, version 8.5
(
LG had a role in the conceptualization, formal analysis, investigation, methodology and writing of the original manuscript draft. MEB performed the conceptualization, supervision and validation. AR and JD were involved in the conceptualization and advising. All authors were involved in the writing, review and editing processes.
The authors declare that they have no conflict of interest.
This article is part of the special issue “Hydroclimatic extremes and impacts at catchment to regional scales”. It is not associated with a conference.
The authors are thankful for the constructive comments of the reviewers, which helped to improve our paper. The Institute of Hydrology and Water Resources Management supported the first author during her sabbatical.
The University of Tehran provided financial support for the first author during her sabbatical. The publication of this article was funded by the open-access fund of Leibniz Universität Hannover.
This paper was edited by Fernando Domínguez-Castro and reviewed by two anonymous referees.