Estimation of insurance premiums for coverage against natural disaster risk: an application of Bayesian Inference

This study applies Bayesian Inference to estimate flood risk for 53 dyke ring areas in the Netherlands, and focuses particularly on the data scarcity and extreme behaviour of catastrophe risk. The probability density curves of flood damage are estimated through Monte Carlo simulations. Based on these results, flood insurance premiums are estimated using two different practical methods that each account in different ways for an insurer’s risk aversion and the dispersion rate of loss data. This study is of practical relevance because insurers have been considering the introduction of flood insurance in the Netherlands, which is currently not generally available.


Introduction
The potential impacts of flooding in the Netherlands are large because high values of economic assets and many people are situated in low-lying areas exposed to flooding (Klijn et al., 2007;Aerts and Botzen, 2011;de Moel and Aerts, 2011). Although flood protection standards are high, new flood risk management strategies are currently being discussed to accommodate projected trends that increase flood risk, such as climate change and socio-economic growth (Kabat et al., 2005). An example is flood insurance, and since flood risk is not generally covered by insurance in the Netherlands, questions have been raised among insurers and policy makers as to whether flood insurance is feasible (Botzen and van den Bergh, 2008).
One of the issues related to flood insurance in the Netherlands is the uncertainty related to both the probability and the consequences of extreme (low probability) flood events (Froot, 2001). Insurers are reluctant to offer insurance coverage for a risk that is ambiguous, and for which sufficiently accurate premiums cannot be priced through actuarial calculations (Jaffee and Russell, 1997;Kunreuther and Michel-Kerjan, 2007). Moreover, insurers are risk-averse to catastrophe risk, which implies that they refuse coverage or are only willing to offer catastrophe insurance if premiums are sufficiently above the expected loss (Duncan and Myers, 2000). Standard statistical methods and tools for flood risk estimation use historical data and other empirical information (Behrens et al., 2004). This is often difficult due to a lack of empirical data, and the limited loss data available are not always reliable and consistent, so they inaccurately represent actual damage (Grossi and Kunreuther, 2005).
Several studies have applied flood assessment and hydrological models to estimate (future) flood damage and its probability for the Netherlands (Vrijling, 2001;Jonkman et al., 2003Jonkman et al., , 2008Van der Most and Wehrung, 2005;Bouwer et al., 2010). Two relevant studies are "Veiligheid Nederland in Kaart" (VNK), also known as the FLORIS study (TAW, 2000;Wouters, 2005) and "Aandacht voor Veiligheid" (AVV) (Aerts et al., 2008;Aerts and Botzen, 2011). Both studies provide estimates of current and future flood probabilities and damage under various scenarios of climate change and economic development. A shortcoming of these studies is that they only provide a single estimate of the flood probability and potential flood damage, and not the complete probability density function of damage. Moreover, because extreme events such as catastrophe damage generally follow an asymmetric distribution process with a fat right-tail, the losses located in this part of the damage distribution need to be included in risk estimates and insurance premiums. The average risk estimates from the VNK and AVV projects may therefore lead to either "upwards" or "downwards" biased risk and premium estimates. As catastrophe risks are generally assumed to follow a fat-tail distribution process, and risk estimation has to rely on limited empirical data, it is essential to take these two aspects into account in the risk assessment process. Bayesian Inference (BI) is a useful statistical tool for assessing risk, especially in applications with a lack of historical information on risk, as is the case for low-probability floods in the Netherlands (Coles and Powell, 1996;Cooley et al., 2007). BI can be used to update the probability estimate for a hypothesis (the "prior beliefs", i.e. based on AVV flood risk information) with additional empirical information (the "likelihood", i.e. based on VNK flood risk information), especially when it is unreliable to make statistical inference based on only a small amount of homogeneous data, such as either VNK or AVV flood risk information. The updated outcome (the posterior statistics) can be used to simulate hazard events and fit a risk distribution, i.e. to estimate a new probability density function for flood damage. The BI method used in this study deals with data scarcity and provides insights into uncertainties in the estimated flood damage.
Under the assumption that flood risk follows a Pareto distribution with a right fat-tail, this study aims to apply Bayesian techniques combining the VNK and AVV data to estimate the tail parameter of the flood damage distribution. In the first phase, the tail parameter for flood damage is estimated using BI, in which the prior belief is assumed to originate from a Gamma conjugate 1 family while the observations are assumed to follow a Pareto distribution. Subsequently, in the second phase, flood damage is simulated based on the estimated tail parameter, and the corresponding probability density functions are fitted for all 53 areas in the Netherlands which are exposed to flooding ("dyke ring areas"). Kunreuther et al. (2009) propose that premiums that reflect risks are an important condition for designing a natural disaster insurance system that is financially viable and provides adequate incentives for risk mitigation. Although we realize that flood insurance premiums in practice may not be fully differentiated with respect to flood risk, our analysis provides insights into the level of flood insurance premiums as if the condition of risk-based premiums were applied in the Netherlands. Using the loss information derived from the damage simulation and density functions, the risk-based insurance premiums (Actuarial-equivalence (AE amount)) will be estimated for all 53 dyke ring areas. These premiums include an extra surcharge for insurer's aversion to catastrophe risk, as proposed by Kaas et al. (2004). Finally, the AE premiums are compared with the Empirical method proposed 1 Conjugacy can be defined as follows: if F is a class of sampling density functions p(x|θ ) and P is a class of prior distributions for θ , then the class p is conjugate for F if p(θ |x) ∈ p for all P (.|θ) ∈ F and p(.) ∈ P (Juneja et al., 2006). by Kunreuther et al. (2011), which takes a modest rate for insurer's aversion to (fat-tailed) catastrophe risk into account.
The remainder of this paper is structured as follows. Section 2 describes the data and the statistical methods used to estimate the parameters of the probability density functions of flood damage, and the two methods used to calculate insurance premiums. Section 3 presents the results of the flood risk estimates and flood insurance premiums, and compares these results with existing studies. Section 4 provides discusses our findings and Sect. 5 concludes. Figure 1 shows the overall research methodology described in detail from Sect. 2.1 to Sect. 2.4. In Sect. 2.1, the available data from AVV and VNK are described, and, if necessary, further processed and adapted for use. In Sect. 2.2, the BI is introduced and applied to estimate the tail (shape) parameter of the flood damage distribution under the assumption that flood damage follows a Pareto distribution process. Next, in Sect. 2.3, a Monte Carlo simulation of flood damage is performed using the updated tail parameters from Sect. 2.2, and subsequently probability density curves of flood damage are fitted for all 53 dyke ring areas. These sections explain in detail the probability distributions that are used in these steps. Finally, in Sect. 2.4, the resulting simulated flood damage information is used for deriving flood insurance premiums. Premiums are estimated using two different methods that respectively use the variance of risk (derived from Sect. 2.3) in a different way, and take a different account of the insurer's risk aversion to catastrophic flood risk.

Data
The low-lying areas in the Netherlands are divided into 53 dyke ring areas. Many low-lying lands (which are often called "polders") have been reclaimed from former lakes. Each dyke ring area has its own closed flood protection system of dykes, dams, and sluices that protect it from floods caused by rivers and the sea. A dyke ring is an individual administrative unit under the Water Embankment Act of 1995, which guarantees a particular level of protection against flood risk for each dyke ring area (Aerts and Botzen, 2011). For instance, a dyke ring with a safety standard of 1/1250 (a flood "return period" of once in 1250 yr) 2 has been constructed in such a way that it may withstand a flood with a probability of 1/1250. Figure 2 shows a map of the Netherlands that depicts the 53 dyke ring areas and their safety standards, which range between 1/10 000, and 1/1250. The data 2 The return or recurrence period is an estimate of the average time internal between two occurrences of an event, in particular, of a catastrophe, such as a flood, an earthquake or a peak river discharge. The probabilities provided by the VNK and AVV projects are an indication of the approximated return period of a dyke breach. 6 (derived from Sect. 2.3) in a different way, and take different account of the insurer's risk aversion to catastrophic flood risk.

Data
The low-lying areas in the Netherlands are divided in 53 dyke-ring areas. Many low-lying lands (which are often called 'polders') have been reclaimed from former lakes. Each dyke-ring area has its own closed flood protection system of dykes, dams, and sluices that protect it from floods caused by rivers and the sea. A dyke-ring is an individual administrative unit under the Water Embankment Act of 1995, which guarantees a particular level of protection against flood risk for each dyke-ring area (Aerts and Botzen, 2011 used in this paper originate from two main studies, the AVV and VNK projects, on flood risk in the Netherlands. The VNK project provides current flood risks, and the results have been reported by Wouters (2005), and by Bouwer et al. (2009) Table 1). VNK estimated detailed flood damage according to various flood scenarios for dyke-rings 7, 14 and 36 (Fig. 2), while a more global approach was used to estimate flood probabilities and potential flood damage for the other dyke ring areas. The study included the assessment of flood probability and damage from dyke failure mechanisms, hydraulic pressure, and multiple dyke breach scenarios (Wouters, 2005). The AVV study provides insights into the potential effects of climate and socio-economic change on flood risk over a long time horizon, as compared with the current situation. The AVV project estimates current flood risk and future flood risk was simulated using (long-term) land use and climate change scenarios of increased river discharges and sea level rise (Aerts et al., 2008;Aerts and Botzen, 2011). Appendix A provides all the data from the AVV and VNK projects, which have been used as input for the Bayesian model and flood damage simulations. The flood damage estimates from both projects are used for the BI method, while the AVV probabilities are used for the estimation of annual flood risk and premiums. Throughout this paper, the AVV flood damage data represents prior knowledge about the probability distribution of flood damage within the Bayesian framework, and the VNK flood damage data represents the flood damage likelihood (See Fig. 1). Table 1 provides the basic information provided by these two main studies on flood risk in the Netherlands for three representative dyke ring areas. Dyke ring areas 7 and 36 are representative for most of the dyke ring areas that have flood protection levels between 1/4000 and 1/1250 per year. The dyke ring Zuid Holland (along with Noord Holland) is one of the two dyke-rings with the lowest flood probability in the Netherlands (1/10 000). This dyke ring is located along the densely populated coastline, and has a high concentration of economic assets.
Data was processed (for details, see Appendix B) to derive minimum and maximum damage per dyke ring area, which are used in the flood damage simulations (Sects. 2.2 and 2.3). The assumption of a minimum and maximum amount of flood damage per dyke ring area is consistent with realty. If there is a flood event in the Netherlands, then there will always be a minimum amount of damage, while the maximum theoretical damage cannot exceed the total economic value that is exposed to flooding within a dyke ring area. The AVV and VNK data provide expected damage estimates for 740 Y. Paudel et al.: Estimation of insurance premiums for coverage against natural disaster risk 7 Figure 2 shows a map of the Netherlands that depicts the 53 dyke-ring areas and their safety standards, which range between 1/10,000, and 1/1,250. The data used in this paper originate from two main studies, the AVV and VNK projects, on flood-risk in the Netherlands. The VNK project provides current flood risks, and the results have been reported by Wouters (2005), and by Bouwer et al. (2009) Table 1). VNK estimated detailed flood damage according to various flood 3 The return or recurrence period is an estimate of the average time internal between two occurrences of an event, in particular, of a catastrophe, such as a flood, an earthquake or a peak river discharge. The probabilities provided by the VNK and AVV projects are an indication of the approximated return period of a dike breach.

Applying Bayesian Inference for estimation of the tail parameter
Assume θ is an unknown parameter − which reflects the state of our knowledge about the flood damage data before we observed this data − defined as an element of a space which denotes all possible states of an outcome, also called the parameter space. When experimenters aim to obtain more information about this unknown parameter, the flood damage observations, x 1 , . . . . . . x n , per flood return period n are assumed to be independent and identically distributed from a Pareto process where the process scale parameter (β) is assumed to be known, and the process tail (shape) parameterθ is unknown (see Sect. 2.2). BI is a statistical technique which attempts to estimate parameter θ by combining the prior beliefs (this means that we have some idea about it before we have seen the data) with the information observed from experiments or practice. This Bayesian technique can be represented by the Bayes theorem: where p (θ) on θ ∈ is the prior distribution which includes our prior knowledge about θ (the AVV information); p(x|θ ) is the sampling or data distribution; and p(x) is the marginal distribution of x. The observed data (the VNK information) combined with the prior knowledge is given by the likelihood function L (θ | x 1 , . . . . . . x n ) = p(x|θ), which can be regarded as a density with respect to x, where x is fixed to a particular value (the observation). The normalized product in Eq.
(1), which is the marginal distribution of x that can be represented as p (x) = p (θ)L (θ | x) dθ, integrates over all possible values of θ for given x. Because p (x) is a scalar value and not a function, we can omit the denominator in Eq.
(1). The non-normalized 3 posterior distribution can be given as one of proportionality: The main steps in BI to derive the posterior density (the updated shape parameter for the flood damage density) for θ (Eq. 1) are: Step 1. Formulate the likelihood or data model for the observations (the VNK damage data) and the corresponding likelihood function for the unknown parameters of the flood damage distribution; Step 2. Specify the prior density distribution (the AVV damage data) to quantify the uncertainty about the values of the unknown parameters identified in Step 1; Step 3. Combine the likelihood function from Step 1 with the prior distribution from Step 2 to determine the posterior distribution. The posterior distribution quantifies the joint probabilities of the values of unknown parameters of the flood damage distribution using the information from observations; Step 4. Calculate the updated parameters of the probability density function of flood damage for each of the 53 dyke ring areas from the posterior distribution by repeating the previous steps. The mean of the parameter (the parameter of interest) can be easily derived from the posterior distribution by calculating the expectation.
We have made two main assumptions in this study to apply BI. First, it is assumed that flood loss data are independent and identically Pareto-distributed (bounded with a lower and upper threshold) continuous random numbers (See Eq. 3 in Step 1), with an unknown shape and the known scale parameter (see Appendix C). Second, for computational convenience we take the prior and posterior models from a conjugate family to estimate the shape-parameter. Conjugate in our case means that the prior and posterior distributions are from the same distribution family, which is the case in this research. To briefly elaborate on this topic: even though the Bayesian theorem is mathematically simple, due the normalizing factor (the denominator in Eq. 1) it can be a difficult task to find an analytical or numerical solution. Because the Bayesian theorem is a product of the prior and the likelihood functions, it is not always guaranteed that this product can be integrated over the relevant domain. One way to avoid this problem is by using conjugate priors. Based on this concept one can derive pairs of likelihood functions and prior distributions with appropriate mathematical properties that result in tractable closed-form solutions to the integrals (Arnold and Press, 1983). Since conjugate priors have computational advantages, we have chosen pairs of the likelihood function and the prior distribution from an informative 4 conjugate family, as will be explained below.
Step 1: Data model and the likelihood function We first apply the Pareto method as it is useful for modelling low-probability high-impact risks, because the total aggregated damage is to a large extent determined by large losses in the right-tail of the density, which are also called "righttail risks" (Hsieh, 2004). For this application, we assume that the VNK flood loss data (the data model) x 1 , . . . . . . x n are an independent and identically distributed Pareto random loss variable per flood return period that is defined with two main parameters: namely, the process scale x m , which is the minimum of x, and the process shape θ > 1. It is assumed that 742 Y. Paudel et al.: Estimation of insurance premiums for coverage against natural disaster risk the first parameter is known and that the shape parameter is unknown, which we aim to update using BI in this study. The Pareto distribution can be written as (Davis, 2001) p where θ > 1. 5 The likelihood function associated with the unknown shape parameter is (Bermudez and Kotz, 2010) The sufficient convenient statistics 6 are based on the number of observations, n, and the data product Step 2: Prior density Once the data model has been specified, the prior density function for the unknown model parameter needs to be specified. The prior density describes our beliefs about the uncertainty about the model parameters, without incorporating the information from the observations. Since it is assumed that the model data follows a Pareto density with known scale and unknown shape parameters, the conjugate prior p(θ ) is proportional to the Gamma density function. This Gamma function of θ is defined by the hyper-parameters a (shape) and b (scale) and is given by (Fink, 1997): where a and b are, respectively, the shape and scale estimates of the hyper-parameters, which are derived from the AVV flood damage estimates (see Appendix C).
Step 3: Posterior density Because we have chosen a prior density that is conjugate for the likelihood, the posterior density consists of a combination of Eqs. (4) and (5), and follows the same density as the prior, namely: p (θ | x) = gamma (α, β) = L(θ|X) · p (θ) (Vilar-Zanon and Lozano-Colomer, 2007). Substituting Eqs. (4) 5 In practice, in case of sufficient data, the shape parameter of the data model can be estimated numerically from the VNK estimates with the formula θ = 2· The scale parameter can be estimated with x m (scale) = inf(X), with X > m. (See Appendix C for more details). 6 A sufficient statistic has the property of sufficiency in terms of the related statistical model and its unknown parameters. and (5) in Eq. (2) yields the following specification of the posterior flood damage density (see Appendix D): where α = a + n and β = (Arnold et al., 1998).
Step 4: Estimation of the parameters of interest From the posterior distribution function specified in Step 3, the parameter of interest (the mean) of the posterior distribution of flood damage is given as follows 7 : To estimate this parameter of interestθ, we first need to simulate the posterior distribution. As discussed by Gelman (2004), this can be simulated in different ways. Because the prior distribution in this paper is assumed to be from a conjugate family, we opt for a direct simulation method, which implies the drawing of random numbers from the target (i.e. posterior) distribution (Gelman et al., 2004) 8 .

Simulation of flood damage data and the fitting of probability density curves of flood damage
Once the updated tail parameters are estimated (the parameter of interest in Eq. 7), they are used, along with the updated scaling parameters for minimum and maximum damage (see Appendix B), to simulate flood damage for all 53 dyke ring areas for 250 000 flood return periods, as well as to estimate the corresponding probability density curves of flood damage. Assuming that the flood damage distribution follows the Pareto distribution as in Sect. 2.2, the probability density for each of the 53 dyke ring areas can be simulated with the following model: where f (x) is a bounded Pareto distribution (see Appendix E for more details) with shape parameterθ = E [P (θ | x)], the lower bound scale parameter x l = min (x) and the upperbound scale parameter x u = max?(x), which are estimated from the VNK data for dyke ring areas 7, 14 and 36 (for more details see Appendices A and B) (Kerman and Gelman, 2007) 9 . These models were implemented in the statistical software Matlab version 2012a to perform a Monte Carlo analysis of flood damage for each dyke ring area for 250 000 flood return periods. The resulting data are used to construct probability density curves for flood damage and to estimate average flood damage over all flood return periods for each of the 53 dyke ring areas. The expected AE average amount for flood damage (the average overall flood return periods) can be represented with the standard expectancy expres-

Estimation of flood insurance premiums
Since floods are rare events, the estimates for annual flood insurance premiums can be approximated by the annual expected value of flood damage. Therefore, based on the simulated probability density curves, we can estimate average flood insurance premiums for each dyke ring area. A more refined premium differentiation on the household level is not possible, given the available information on flood risk. Hence, the premiums are partially risk-based, since they differ per dyke ring area, but not per individual insurance policy.
Since floods are rare events, the estimates for annual flood insurance premiums can be approximated by the annual expected value of flood damage, and should consider the level of insurer's risk aversion towards the extreme nature of the risk. This risk aversion is reflected as a surcharge on the premium above the expected value of the loss. This surcharge depends on the variability (variance or standard deviation) of the expected flood damage. A higher risk variance implies a higher probable maximum loss, which leads to an extra premium surcharge. In general, insurers charge a higher premium if the variability of losses is greater, because a relatively high variability indicates a high likelihood of suffering very large losses for which large cash reserves or reinsurance coverage is needed, which is costly for the insurer. This surcharge has the effect to increase the cash surplus of the insurer which protects the insurer against the possibility of insolvency (Kunreuther et al., 2011). We apply two different methods for estimating premiums that address the impact of the insurer's risk aversion attitude towards the ex-treme (catastrophic) nature of flood risk. The first method, AE, provides premium estimates based on the Modern Actuarial Risk Theory discussed in Kaas et al. (2004). The main emphasis of this method is on the extreme nature of damage, because it takes the full loss variance into account in the premium calculation. Moreover, a moderate degree of insurer's risk aversion rate is included in the premium, which is quite common for non-life insurance products, but may provide an underestimation in an application to heavy-tailed catastrophe losses, like those due to floods. The premium amount π to be paid by the policyholder for insurance coverage for risk X, given the exponential utility function u (x) = −αe −αw with the parameter of insurer risk aversion α= 0.005, is given by (for details, see Kaas et al., 2004) where π (W ) is the amount of premium necessary to insure property value W ; E [X] is the loss expectancy that can be deduced from the adjusted Pareto distribution (Eq. 8), σ 2 is the variance of the same adjusted Pareto distribu- is the insurer's risk aversion coefficient, which is in this case equal to α, and w stands for the amount to be insured; and u(.) is the utility function of the insurer. Equation (8) The second method is what we call the Empirical method, which has been proposed by Kunreuther et al. (2011). This method includes a surcharge of the standard deviation on the premium, which has been derived from an extensive empirical analysis of catastrophe insurance premiums in the USA for a period of 24 yr (see Chapt. 7 in Kunreuther et al., 2009). This method is a modified version of Eq. (9), and it takes the insurer's risk aversion toward the extreme nature of risk into account by making the premium dependent on the standard deviation (SD) of damage. According to the Empirical method, the premium can be calculated as follows: where σ is the standard deviation of loss; and δ is the Empirical insurer's risk aversion rate for catastrophe risk. The coefficient δ is equal to 0.55 (Kunreuther et al., 2011).

Results
Descriptive statistics of the probability density functions for flood damage are provided for all 53 dyke ring areas (see Table 2). Columns 2 and 3 in Table 2 show, respectively, the estimated minimum and maximum flood damage amounts; columns 4, 5 and 6 are, respectively, the expected flood damage, the standard deviation of the simulated flood damage and the parameter of interest. Although differences between dyke ring areas in the shape parameters in column 6 appear to be small, it should be noted that the distribution function used is sensitive to such small differences. Column 4 in Table 2 shows that expected flood damage amounts differ significantly between dyke ring areas. The expected flood damage provided by VNK and AVV (see Appendix A) is significantly higher compared with our estimates. The reason for this is that the damage estimates by AVV and VNK are expected flood damage amounts that are associated with what are called an "exceedance flood probability" of a very severe flood, which do not include damage amounts that fall below this extreme level. Our expected flood damage amounts are lower than the AVV and VNK expected flood damage amounts, because our estimates are based on the full probability density functions of damage between the specified minimum and maximum amounts, which include several small damage amounts. Due to space limitations, it is unfeasible to discuss the detailed results of all 53 dyke ring areas, which is why the results for three representative dyke ring areas -the Noordoostpolder (7), Zuid Holland (14), and Land van Heusden/de Maaskant (36) -are discussed here in more detail.
Figures 3, 4 and 5 show, respectively, the resulting probability densities of flood damage for the three selected dyke ring areas. The corresponding statistics are provided in Table 3 below. The three markers in the figures indicate some representative data percentiles (the 50 per cent, the mean, and the 97.5 per cent percentiles). The loss densities are truncated on the left and right sides, in accordance with the estimates of minimum and maximum flood damage (see Appendix B). The probability of observing a damage amount is depicted on the left vertical axis, and flood frequencies for 250 000 flood return periods are shown on the right vertical axis, while the damage amounts are shown on the horizontal axis in millions of euros. The frequency densities show that the majority of loss observations in each dyke ring area are concentrated on the left-side of the curve, while every frequency curve has a long fat right-tail, which indicates a high dispersion of the loss data. As an illustration, the statistical mean of all three probability density functions of flood damage are located around the 67.9 per cent data percentile, which indicates that the loss data behave asymmetrically. This is consistent with our selection of the Pareto distribution which is fat-tailed and asymmetrical and corresponds with practical experience that flood damage is an extreme event. Table 3 summarizes the descriptive statistics of the simulated flood damage for the three selected dyke ring areas, and the AVV and VNK flood damage estimates. The coefficient of variability 10 (also called R 2 ) is about 68 per cent, which suggests that the simulated flood damage has a high variance. Since the corresponding skewness 11 for all three 10 The R 2 is a normalized measure of the dispersion of a probability distribution, which is defined as the ratio of the standard deviation to the statistical mean. 11 The skewness measures the asymmetry of a distribution or sample data relative to the standard normal distribution, which has a curves is less than 2 and positive, the tail on the right side is longer compared with the left side, and the bulk of the loss values lie to the left of the mean. Furthermore, the losses on both sides are truncated with the maximum and the minimum damage. This is consistent with the fact that flood damage cannot be infinitely large and justifies the data truncation. The kurtosis 12 , which indicates the peakedness of a density function, is about 8, which is higher than the kurtosis of 3 of the standard normal density. The positive kurtosis shows that the flood damage amounts are peaked and not flat compared with the standard normal density. Table 4 shows the number of houses (column 2) per dyke ring area; the Empirical, AE and AVV estimates of flood insurance premiums (respectively, in columns 3, 4 and 5) and their ratios (columns 6 and 7). The premiums that have been estimated with the Empirical and AE methods (see Sect. 2.4) are compared with the premiums that Aerts and Botzen (2011) have estimated for all dyke-rings. This comparison is of interest, since Aerts and Botzen (2011) have estimated the flood insurance premiums using only the AVV data of a single estimate of the flood probability and potential damage per dyke ring area (premium = probability*damage), while the Empirical and AE estimates are based on the mean damage that are estimated from the complete probability density of flood damage that has been derived with BI, and account for the insurer's risk aversion to the catastrophe risk (see Sect. 2.4). The data are presented in descending order with respect to the ratio of the AE and AVV premiums.

Flood insurance premiums for all dyke ring areas
The annual Empirical premium estimates (column 3) take the standard deviation of damage and the insurer's risk aversion rate for catastrophe risk into consideration by means of a surcharge on the expected flood risk that has been derived from actual insurance markets (Kunreuther et al., 2011). The Empirical premiums are generally close to 70 per cent of the AVV premiums (column 6), which indicates that the Empirical method results in a scaling of the AVV premiums. The empirical premiums are lower, even though these include a surcharge for the rate of risk aversion which is not accounted for in the AVV premium estimate by Aerts and Botzen (2011). The mean damages per dyke ring used as input for the AE premiums are significantly lower compared with the AVV mean damages (see Table A1 in Appendix A), while the AE premiums are higher for some dyke ring areas. This can be explained by the premium surcharge of the risk aversion rate which depends on the risk variance. The AE skewness of zero, and any data that has an asymmetric distribution has a skewness that differs from zero. 12 The kurtosis measures whether the data are peaked or flat with respect to the normal distribution. The kurtosis of the standard normal distribution is 3 (in the case of "excess kurtosis", it is 0); a positive kurtosis indicates a peaked distribution; and a negative kurtosis implies a flat distribution.   premiums (column 4) include a surcharge on the expected flood risk that is based on the full loss variance of the flood damage density (instead of the SD in the Empirical method), which results in substantial differences compared with the AVV estimates for some dyke ring areas. The differences between the AE and AVV premiums are largest for the dyke ring areas with a high expected damage and correspondingly high variance (e.g. dyke-rings 14, 16, and 43).

Discussion
The estimates of flood insurance premiums in Table 4 will be discussed with respect to three main aspects: the main differences between the premiums; how the BI method contributes to these findings; and the main implications of the results for insurers.

Main differences between the flood insurance premiums
The estimated flood damage densities per dyke ring area lie at the core of the estimations of the flood insurance premiums. Overall, the average flood damage per dyke ring, obtained through Bayesian statistical modelling, is lower if compared with the expected flood damage estimates obtained in the AVV and VNK projects (Wouters, 2005;Aerts et al., 2008). This can be attributed to using the full probability distribution from our BI approach as compared with, for example, AVV-based premiums that only use a single probability and  an extreme flood scenario with high flood damage as a basis. Furthermore, other factors, such as the choice of the density functions of the prior and likelihood information modelling, parameter uncertainty, and the Monte Carlo simulations used to fit the loss-probability curves, might also partially contribute to the difference in results of flood damage estimates. Along with the differences in flood damage estimates, there are also significant differences in the premiums that are estimated using the AE and Empirical methods. For instance, the Empirical method emphasizes the insurer's risk aversion attitude to catastrophe risk by adding a surcharge to the expected flood risk in the premium estimate based on the standard deviation, while this is not applied in the AVV-based premiums (column 5 of Table 4). The Empirical premiums are approximately 70 per cent of the AVV premiums, which appears to be approximately constant for all dyke ring areas. This implies that the impact of using the loss standard deviation as a surcharge, on top of the risk-based premiums, is not very large if the Empirical method is used. In contrast, the AE method for calculating premiums adds a riskaverse surcharge to the expected flood risk that is based on the loss variance, which results in much higher premiums (up to 178 per cent) for some dyke ring areas compared with the Empirical and the AVV methods. Hence, these large differences, which particularly occur in those dyke ring areas with a large amount of expected damage (the first 8 dyke-rings in Table 4), can be explained by the corresponding large variances of flood damage. Such a large surcharge does not occur when the standard deviation is used for modelling insurer's risk aversion as the Empirical method does. The surcharge Table 4. Results of annual flood insurance premiums per homeowner per dyke ring, according to the Empirical method, the Actuarial Equivalence (AE) method, and the AVV method. on the AE premium is smaller for the last 45 dyke ring areas, shown in Table 4. This is caused by the higher weight that the AE method places on the flood damage variance, which is smaller for the last 45 dyke ring areas. In agreement with Friedman (1974), the AE premium estimates confirm that insurers are considerably risk-averse to damage with a high loss variance, and they see this type of risk either as uninsurable or as a gamble that needs a significantly high expected return.
It should be noted that Kunreuther et al. (2011) calculated the risk aversion surcharge, used in the Empirical method, based on historical surcharge information that US insurers have charged for providing coverage against hurricane damage. However, this surcharge may not completely reflect risk aversion to extreme flood events in the Netherlands, which can have a catastrophic character and result in very high losses which could ruin the insurer. A higher surcharge for risk aversion to the high amounts at stake may be applied for such events. This, for example, is done in the AE method, but we have no empirical data specific for the Netherlands on which this surcharge could be based, since empirical estimates of insurer's risk aversion to providing coverage for flood risk in the Netherlands are not available.

Bayesian Inference (BI) method
For several reasons, it can be argued that the BI method applied in this study is more suitable for estimating flood insurance premiums in the Netherlands compared with the methods that only use a single estimate of the flood probability and potential damage (the AVV and the VNK projects). First, BI provides statistical estimates of flood risk that take its stochastic and extreme nature into account by deriving the complete probability density of flood damage, and modelling the tail of this density (Bayarri and Berger, 2004). Second, BI is a suitable method for representing probabilistic relationships between different sources of information, such as the AVV and VNK data (Heckerman, 2008). However, in some BI applications there may be concerns about the reliability of the prior and likelihood data sources, and how this influences the results (Raftery et al., 1997;Malakoff, 1999;Hájek, 2007;Gelman, 2008;Chaudhuri and Ghosh, 2011). For example, the inclusion of prior information does not always lead to better results, especially when it is based on subjective beliefs. To overcome this issue, only objective data are used in this study that share similar statistical features as input. Third, the damage estimations in this study are based on extensive data simulations, which enable more detailed statistical information to be provided for the flood damage assessments. Simulation is the only way to incorporate probabilistic scenarios in the estimation of risk of low-probability floods, where there is little historical information available on such floods (Juneja et al., 2006). In contrast, VNK and AVV, as well as other flood damage studies, do not consider the probabilistic nature of risk and cannot provide any statistical inferences for damage beyond a certain level. Fourth, the BI procedure applied in this study allowed for the derivation of the full probability density of flood damage, its mean and standard deviation, which are all important inputs for the calculation of flood insurance premiums. Even though simulation provides useful information that is needed to estimate flood insurance premiums, it also has its limitations. For example, a simulation attempts to mimic the damage of a flood event based on known facts and assumptions by means of a conceptual computational environment, which results in uncertainties (Robert and Casella, 2011). Large uncertainties associated with rare events are, in our application, somewhat narrowed by truncating the loss data at the best estimates of minimum and maximum flood losses per dyke ring area.

Implications for insurers
From the findings of this study it becomes clear that insurance for flood risk is a complex product to price because of the extremely low flood frequency that entails large uncertainties. This study is the first in-depth study of the pricing of flood insurance in the Netherlands that uses the full probability density of flood damage in all 53 dyke-areas. Therefore, it provides a useful basis for insurers who are considering introducing flood insurance in the Netherlands. Our premium estimates show that flood insurance premiums can be considerably above the expected value of the flood loss in some dyke ring areas because of the risk aversion of the insurer for the catastrophic nature of flood risk. Because the risks located on the right-tail of the damage density are much more expensive to insure compared with the risks of lower damage on the left side, insurers may be reluctant to provide insurance for extreme flood losses in some high-risk areas unless they can charge sufficiently high premiums. Nevertheless, our estimated flood insurance premiums are lower than household willingness-to-pay (WTP) for flood insurance in most dyke ring areas. Botzen and van den Bergh (2012) estimate that average individual WTP for flood insurance in the current situation of flood risk is about C250 per year, which is higher than our estimated flood insurance premiums in most dyke ring areas (Botzen and van den Bergh, 2012). However, actual flood insurance premiums are in practice likely to be higher than the premiums provided in this paper due to administrative costs and a profit margin for insurers, which are not included in our estimate.
Our study follows the proposal by Kunreuther et al. (2009) to determine flood insurance premiums on the basis of estimate of actual flood risks. Nevertheless, we realize that in practice flood insurance premiums may not be fully differentiated with respect to actual flood risk, for example, because bundled coverage is provided or because it entails costs for insurers to determine and charge different premiums for every specific policy. Our analysis provides insights into the level of flood insurance premiums as if they were riskbased and assesses flood risks and flood insurance premiums for the Netherlands on a dyke ring level, which provides a relatively simple basis for premium differentiation. Overall, the estimated flood insurance premiums show large differences between dyke ring areas in the Netherlands, which is mainly due to the difference in dyke failure probabilities between these areas. This suggests that premiums should be differentiated at least on a dyke ring level if an insurance system with risk-based premiums were to be introduced. Insurance costs would differ considerably between the different low-lying areas in the Netherlands if flood insurance premiums were to reflect risk.

Conclusions and recommendations
This study has applied Bayesian Inference to assess the stochastic nature of flood risk and provide estimates of the probability density of flood damage for all 53 dyke ring areas in the Netherlands. Subsequently, these probability densities of flood damage have been used to estimate flood insurance premiums for these areas. While previous studies have derived a single estimate of the flood probability and expected flood damage for the low-lying areas in the Netherlands, our study has estimated the full probability density of flood damage, which allows for a more accurate estimation of flood insurance premiums. In particular, the premiums estimated in this study account for the insurer's risk aversion to the extreme nature of flood risk. This study is of practical relevance for insurers who are considering introducing flood insurance in the Netherlands.
The methodological process followed in this paper to estimate premiums for damage emerging from rare events, such as catastrophic and man-made disasters, appears to be of great relevance, as it is able to cope with the lack of empirical evidence on the corresponding expected damage. Using a practical example, this paper showed that the widespread uncertainties about flooding should be included in premium calculations by taking into account the relevant risk indicators, such as risk variance and insurer's aversion against catastrophe risk insurance. Furthermore, as we notice from the premium results, the choice of a particular method appears to make a significant difference for their levels. Therefore, it is important that the method used to estimate insurance premiums should correctly represent the real-world problem, and thus reflect the true nature of the corresponding risk. Data pre-processing with respect to consistency, reliability, and completeness is a vital part in the risk-estimation process because, regardless of the type of method used, the soundness of results can, for a large part, be assigned to input. Usually, it is assumed that the consequences of catastrophic events are unlimited. However, in practice the damage is usually limited, and lies between two extremes. In such cases, the range of possible outcomes of rare events can be somewhat narrowed if unnecessary and unrealistic information is excluded in models by truncating loss data at below a predetermined threshold, which results in more realistic premiums.
Further in-depth research is necessary to explore and analyse different aspects of Bayesian techniques tailored for rare events. This study has provided insights into uncertainty of estimated flood damage, while it should be acknowledged that another important source of uncertainty is the flood frequency. In this respect, future research could focus on obtaining better insights into uncertainties of the real probability of dyke failure. Furthermore, as it allows the integration of expert judgment and other third party information, it would be advisable to refine the prior assessment process carefully by integrating subjective information that may be of great value. Controversy arises because prior information is generally assumed to be subjective, and can have a significant impact on the final results. However, this can partly be compensated with a cross-validation of the information, as long as it is properly carried out. The risk aversion rate used for the premium calculation should reflect the actual risk in the dyke ring areas rather than those estimated for different areas. Therefore, more research will be necessary on insurer's risk aversion to catastrophe risk in the context of flood risk in the Netherlands. Table A1 provides the flood risk estimates from the VNK and AVV projects per dyke ring area which have been used as input data in the Bayesian model. Column 2 provides the flood return period per dyke ring area and columns 3 and 4 show, respectively, the expected flood damage that corresponds to these return periods.

Estimation of hyper-parameters and sufficient statistics
The estimation of hyper-parameters needs to be conducted in two steps because the estimation of hyper-parameters a and b of the prior distribution is based on information about θ, while θ itself needs to be estimated from the flood damage data x (from AVV). First, we estimate the weighted average of θ, and create several data points using flood damage information from AVV as follows: θ = n i=1 x i·w i ? σ x i·w i 2 , where n i=1 x i·w i is the weighted average of our prior beliefs about flood damage estimated from AVV flood damage data, with n i=1 w i = 1, i = 1, 2, 3, which are the three dyke ring areas for which detailed damage information (i.e. expected, minimum, and maximum amounts) is available; and σ x i·w i is the standard deviation of damage estimated from the minimum, maximum, and expected damage (see Appendix B).
Second, based on the information about θ obtained in the first step, the necessary hyper-parameters can be estimated as a = n i=1 θ i·w i ? σ θ 2 and b = 1.

Deriving the posterior distribution from the Pareto likelihood and Gamma prior
The posterior density is derived from the prior and likelihood functions (Arnold, 1998): p (θ | x) ∝ L (θ | X) · p(θ) . . . . . . . . . . . . , where p (θ) is a Gamma distribution with two parameters α and β (see Sect. 2.2 about the prior distribution). The likelihood function for damage observation is given by L (θ|x) = θ n x θn written in a shorter form as follows: ( ̂ ) where, ̂ are respectively, the shape parameter (obtained from Equation 1.6) and two parameters (the lower and upper boundaries). Equation E1, which is used to simulate flood d each of the 53 dyke-ring areas using the bounded Pareto distribution, can be rewritten as: where U is a uniformly distributed random number between 0 and 1.

Appendix E The bounded Pareto distribution
Equation (E1) gives the formula of a bounded Pareto distribution (Weisz and Brown, 2001), and Fig. E1 provides a conceptual sketch of such a distribution, which is bounded with a minimum and maximum amount of flood damage.
Flood damage can be simulated by drawing random numbers from Eq. (E1). Equation (8) can be written in a shorter form as follows: Flood damage (x) = f −1 x;θ, x l , x u . . . . . .
where,θ, x l , x u are respectively, the shape parameter (obtained from Eq. 7) and two scale parameters (the lower and upper boundaries). Equation (E1), which is used to simulate