Process-based modelling to evaluate simulated groundwater levels and frequencies in a Chalk catchment in south-western England. Natural

. Chalk aquifers are an important source of drinking water in the UK. Due to their properties, they are particularly vulnerable to groundwater-related hazards like ﬂoods and droughts. Understanding and predicting groundwater levels is therefore important for effective and safe water management. Chalk is known for its high porosity and, due to its dis-solvability, exposed to karstiﬁcation and strong subsurface heterogeneity. To cope with the karstic heterogeneity and limited data availability, specialised modelling approaches are required that balance model complexity and data availability. In this study, we present a novel approach to evaluate simulated groundwater level frequencies derived from a semi-distributed karst model that represents subsurface heterogeneity by distribution functions. Simulated groundwater storages are transferred into groundwater levels using evidence from different observations wells. Using a percentile approach we can assess the number of days exceeding or falling below selected groundwater level percentiles. Firstly, we evaluate the performance of the model when simulating groundwater level time series using a spilt sample test and parameter identiﬁability analysis. Secondly, we apply a split sample test to the simulated groundwater level percentiles to explore the performance in predicting groundwater level exceedances. We show that the model provides robust simulations of discharge and groundwater levels at three observation wells at a test site in a chalk-dominated catchment in south-western England. The second split sample test also indicates that the percentile approach is able to reliably predict groundwater level exceedances across all considered timescales up to their 75th percentile. However, when looking at the 90th percentile, it only provides acceptable predic-tions for long time periods and it fails when the 95th percentile of groundwater exceedance levels is considered. By modifying the historic forcings of our model according to expected future climate changes, we create simple climate scenarios and we show that the projected climate changes may lead to generally lower groundwater levels and a reduction of exceedances of high groundwater level percentiles.


Introduction
The English Chalk aquifer region extends over large parts of south-western England and is an important water resource aquifer, providing about 55 % of all groundwater-abstracted drinking water in the UK (Lloyd, 1993).As a carbonate rock the English Chalk is exposed to karstification, i.e. chemical weathering (Ford and Williams, 2007), resulting in particular surface and subsurface features such as dolines, river sinks, caves and conduits (Goldscheider and Drew, 2007;Gutiérrez et al., 2014;Stevanovic, 2015).Consequently, karstification also produces strong hydrological subsurface heterogeneity (Bakalowicz, 2005).The interplay between diffuse and concentrated infiltration and recharge processes, as well as fast flow through karstic conduits and diffuse matrix flow, results in complex flow and storage dynamics (Hartmann et al., 2014a).Even though Chalk has a tendency for less intense karstification, for instance compared to limestone, its karstic behaviour has increasingly been recognised (Maurice et al., 2006(Maurice et al., , 2012;;Fitzpatrick, 2011).

S. Brenner et al.: Process-based Chalk groundwater modelling
Apart from the high water quality, favourable infiltration and storage dynamics which make chalk aquifers a preferred source of drinking water in the UK, their karstic behaviour also increases the risk of fast drainage of their storages by karstic conduit flow during dry years.This also increases the risk of groundwater flooding as a result of fast responses of groundwater levels to intense rainfalls due to fast infiltration and groundwater recharge processes.Groundwater flooding, i.e. when groundwater levels emerge at the ground surface due to intense rainfall (Macdonald et al., 2008), tend to be more severe in areas of permeable outcrop like the English Chalk (Macdonald et al., 2012) as also experienced repeatedly in other karst areas in Europe (Parise, 2003(Parise, , 2010;;Bonacci et al., 2006;Jourde et al., 2007;Gutiérrez, 2010;Naughton et al., 2012;Parise et al., 2015).Groundwater drought indices tend to be more related to recharge conditions in Cretaceous Chalk aquifers than in granular aquifers (Bloomfield and Marchant, 2013).Due to the fast transfer of water from the soil surface to the main groundwater system, chalk aquifers tend to be more sensitive to external changes, as for instance shown by Jackson et al. (2015), who found significant groundwater level declines in 4 out of 7 chalk boreholes in a UK-wide study using historic groundwater level observations.Climate projections suggest that the UK will experience increasing temperatures, with less rainfall during the summer but warmer and wetter winters (Jenkins et al., 2008).This may stress these groundwater resources and increase the risk of groundwater droughts and potentially winter groundwater flooding.For those reasons, assessment of potential future changes in groundwater dynamics, concerning groundwater droughts, median groundwater levels as well as groundwater flooding is broadly recommended and is the subject of current research around the world (Naughton et al., 2012(Naughton et al., , 2015;;Jackson et al., 2015;von Freyberg et al., 2015;Jimenez-Martinez et al., 2016;Moutahir et al., 2017;Perrone and Jasechko, 2017).
However, present approaches mostly rely on statistical distribution functions to express groundwater dynamics and groundwater level exceedance probabilities (e.g.Bloomfield et al., 2015;Kumar et al., 2016) and it is questionable whether the shapes of these distribution functions remain the same when climate or land use change.Physics-based hydrological simulation models that incorporate hydrological processes in a relatively high detail can be considered to potentially provide the most reliable predictions, especially under a changing environment.However, there are considerable limitations in obtaining the necessary information to estimate the structure and the model parameters, especially for subsurface processes, and this inevitably increases modelling uncertainties (Perrin et al., 2003;Beven, 2006).
The definition of appropriate model structures and parameters from limited information becomes problematic when modelling karst aquifers.In order to achieve acceptable simulation performance they have to include representations of karstic heterogeneity in their structures.Distributed karst modelling approaches are able to simulate groundwater levels on a spatial grid but their data requirements mostly limit them to theoretical studies (e.g.Birk et al., 2006;Reimann et al., 2011) or well explored study sites (e.g.Hill et al., 2010;Jackson et al., 2011;Oehlmann et al., 2015).Lumped karst modelling approaches consider physical processes on the scale of the entire karst system.Although they are strongly simplified, they can include karst peculiarities such as different conduit and matrix systems (Maloszewski et al., 2002;Geyer et al., 2008;Fleury et al., 2009).Since they are easy to implement and do not require spatial information, they are widely used in karst modelling (Jukić and Denić-Jukić, 2009).Simple rainfall-run-off models with more than 5-6 parameters are often assumed to end up in equifinality (Wheater et al., 1986;Jakeman and Hornberger, 1993;Ye et al., 1997); i.e. their parameters lose their identifiability (Wagener et al., 2002;Beven, 2006).For that reason, recent research took advantage of auxiliary data, such as water quality data or tracer experiments (Hartmann et al., 2013b;Oehlmann et al., 2015).These studies showed that adding such information allows the necessary model parameters to be identified, therefore enabling the model to reflect the relevant processes.
Up to now, most lumped karst models have been applied for rainfall-run-off simulations.Groundwater levels were simulated in quite a few studies (Adams et al., 2010;Ladouche et al., 2014;Jimenez-Martinez et al., 2016) but mostly relied on very simple representations of karst hydrological processes and disregarding the scale discrepancy between borehole (point scale) and modelling domain (catchment scale) at which they were applied.
In this study, we present a novel approach to predict and evaluate groundwater level frequencies in chalk-dominated catchments.It uses a previously developed semi-distributed process-based model (VarKarst, Hartmann et al., 2013b) that we further developed to simulate groundwater levels.To assess groundwater level frequencies we formulated a percentile of a groundwater-based approach that quantifies the probability of exceeding or falling below selected groundwater levels.We exemplify and evaluate our new approach on a Chalk catchment in south-western England that had to cope with several flooding events in the past.Finally we apply the approach to simple climate scenarios that we create by modifying our historic model forcings to show how changes in evapotranspiration and precipitation can affect groundwater level frequencies.

Study site and data availability
Located in West Dorset in the south-west of England the river Frome drains a rural catchment with an area approximately 414 km 2 (Fig. 1).The catchment elevation varies from over 200 m a.s.l.(a.s.l.) in the north-west to sea level in the south-east.The topography is very flat with a mean slope of 3.9 % and a mean height of approximately 111 m a.s.l.The climate can be defined as oceanic with mild winters and warm summers (Dorset County Council, 2009).Howden (2006) characterised the Frome as highly groundwater dominated.During the summer months, discharge of the Frome is typically very low, hardly reaching 5 m 3 s −1 (Brunner et al., 2010).The geology is predominated by the Cretaceous Chalk outcrop which underlays around 65 % of the catchment.The headwaters of the Frome include outcrops of the Upper Greensand, often overlain by the rather impermeable Zig Zag Chalk (Howden, 2006).The middle reaches of the Frome traverse the Cretaceous Chalk outcrop followed by Palaeogene strata in the lower reaches, eventually draining into Poole Harbour.The major aquifer Chalk appears mainly unconfined.However, in the lower reaches it is overlain by Palaeogene strata, resulting in confined aquifer conditions.The region around the Frome catchment is known for the highest density of solution features in the UK (Edmonds, 1983) which can be mainly observed in the interfluve between the Frome and Piddle (Adams et al., 2003).Loams over chalk, shallow silts, deep loamy, sandy and shallow clays constitute the primary types of soils occurring in the study area (Brunner et al., 2010).The soils of the upper parts of the catchment are mainly shallow and well drained (NRA, 1995).In the middle and lower reaches the soils are becoming more sandy and acidic due to waterlogged conditions caused by either groundwater or winter flooding (NRA, 1995;Brunner et al., 2010).Due to its geological setting, the area is prone to groundwater flooding.It has occurred several times at different locations, for example in Maiden Newton during winter 2000/2001 (Environment Agency, 2012) and in Winterbourne Abbas during summer 2012 (Bennett, 2013).

Methodology
In order to consider karstic process behaviour in our simulations we use the process-based karst model VarKarst introduced by Hartmann et al. (2013b).VarKarst includes the karstic heterogeneity and the complex behaviour of karst processes using distribution functions that represent the variability of soil, epikarst and groundwater and was applied successfully at different karst regions over Europe (Hartmann et al., 2013a(Hartmann et al., , 2014b(Hartmann et al., , 2016)).We use a simple linear relationship that takes into account effective porosities and base level of the groundwater wells (see Eq. 1) enabling the model to simulate groundwater levels based on the groundwater storage in VarKarst.Finally, a newly developed evaluation approach is used by transferring simulated groundwater level time series into groundwater level frequency distributions and comparing them to observed behaviour at a number of monitored wells.

The model
The VarKarst model operates on a daily time step.Similarly to other karst models, it distinguishes between three subroutines representing the soil system, the epikarst system and the groundwater system but it also includes their spatial variability, which is expressed by distribution functions that are applied to a set of N = 15 model compartments (Fig. 2).Pareto functions as distribution functions have shown to perform best in previous work (Hartmann et al., 2013a, b), as well as the number of 15 model compartments (Hartmann et al., 2012).Including the spatial variability of subsurface properties in this manner, the VarKarst model can be seen as a hybrid or semi-distributed model.All relevant model parameters are provided in Table 1.For a detailed description of VarKarst see the appendix or Hartmann et al. (2013b).
The model was driven by two input time series (precipitation and potential evapotranspiration, PET), and the 13 variable model parameters (see Table 2) were calibrated and evaluated by four observed time series (discharge and the three boreholes,; see Sect.3.3).Similarly to Kuczera and Mroczkowski (1998) we use a simple linear homogeneous relationship which translates the groundwater storage (mm) into a groundwater level (m a.s.l.): The related parameters are h gw (m) and p gw (-).h gw is the difference between the base of the contributing groundwater storage (that is simulated by the model) and the base of the well that is used for calibration and evaluation.p gw represents the average porosity of the rock that is intersected by the well.

Data availability
The daily discharge data for gauge East Stoke were obtained from the Centre for Ecology & Hydrology (CEH, http://nrfa.ceh.ac.uk/) and dates back to the 1960s.The borehole data were provided by the Environment Agency (EA) and obtained via the University of Bristol.The total data used for modelling in this study can be seen in Table 1.The three boreholes (Ashton Farm, Ridgeway and Black House) comprised high-resolution raw data which had been collected at 15 min intervals.For further analysis, the data were aggregated to daily time averages.The potential evapotran-spiration has a strong annual cycle.Since most recent data from 2009 to 2012 was missing, representative PET years were calculated on the basis of the last 50 years.Climate projections were obtained from the UK Climate Projections User Interface (UKCP09 UI, http://ukclimateprojections-ui.metoffice.gov.uk/).For more information about the UKCP see Murphy et al. (2009).

Model calibration and evaluation
We use the shuffled complex evolution method (SCEM) for our calibration, which is based on the Metropolis-Hastings algorithm (Metropolis et al., 1953;Hastings, 1970) and the shuffled complex evolution algorithm (SCE, Duan et al., 1992).The Metropolis-Hastings algorithm uses a formal likelihood measure and calculates the ratio of the posterior probability densities of a "candidate" parameter set that is drawn from a proposal distribution and a given parameter set.If this ratio is larger than or equal to a number randomly drawn from a uniform distribution between 0 and 1, the candidate parameter set is accepted.This procedure is repeated for a large number of iterations.If the proposal distribution is properly chosen, the Markov chain will rapidly explore the parameter space and it will converge to the target distribution of interest (Vrugt et al., 2003).In the SCEM algorithm, candidate parameter sets are drawn from a self-adapting proposal distribution for each of a predefined number of clusters.Again a random number [0, 1] is used to accept or discard candidate parameter sets.The SCEM algorithm was applied in default mode using the Gelman-Rubin convergence criteria (Vrugt et al., 2003).In our study, we use the Kling-Gupta efficiency (KGE; Gupta et al., 2009) as the objective function, which can be regarded as an informal likelihood measure or more generally as a monotonically increasing performance metric of model skill (Smith et al., 2008).It was chosen by trial and error, comparing the simulation performances during calibration and validation obtained with different objective functions (RMSE and others).We found that we obtain the most robust results with the KGE.To decide whether to accept or discard a parameter set, we compare the KGEs of the candidate and the given parameter sets.This procedure was already applied in various studies (Engeland et al., 2005;Blasone et al., 2008;McMillan and Clark, 2009) and is possible if the error functions monotonically increase with improved performance.We achieved this in the SCEM algorithm by defining KGE as with r as the linear correlation coefficient between simulations and observations, and σ s σ o and µ s µ o as the means and SDs of simulations and observations respectively.The posterior parameter distributions derived from SCEM provide information about the identifiability of the parame- ters.The more they differ from a uniform posterior distribution the higher the identifiability of a model parameter.We present different calibration distributions to show the use of auxiliary data for parameter identifiability.Parameter ranges were chosen following previous experience with the VarKarst model (Hartmann et al., 2013a(Hartmann et al., , b, 2014b(Hartmann et al., , 2016)).Besides the quantitative measure of efficiency, a split sample test (Klemeš, 1986) was carried out.Our data covered precipitation, evapotranspiration, discharge and groundwater levels from 2000 to the end of 2012.We calibrated the model for the period 2008-2012 and used the period 2003-2007 for validation.We chose this reversed order to be able to include the information on three boreholes that was only available for 2008-2012.Three years were used as warm-up for each of calibration and validation.During calibration, the most appropriate of the N = 15 groundwater compartments that represent each groundwater well was found by choosing the compartment with the best correlation to the groundwater dynamics of the well.
This procedure was repeated for each well and each Monte Carlo run and finally provided the three model compartment numbers that produce the best simulations of groundwater levels at the three operation wells and the best catchment discharge according to our selected weighting scheme.During calibration, we used a weighting scheme which was found by trial and error, as we stepwise added borehole data to our discharge observations.Discharge and the borehole at Ashton Farm were both weighted as one-third as Ashton farm is located in the lower parts within the catchment, while the other two boreholes were located at higher elevation at the catchment's edge and weighted as one-sixth each.In order to explore the contribution of the different observed discharge and groundwater time series during the calibration, we use SCEM to derive the posterior parameter distributions using (1) the final weighting scheme, (2) only discharge, (3) only Ashton farm, and (4) only the other two boreholes (equally weighted).Posterior parameter distributions are plotted as cumulative distributions.The more parameters that show sensitivity, the more information is contained in the selected calibration scheme.

The percentile approach
Even though the VarKarst model includes spatial variability of system properties by its distribution functions, its semidistributed structure does not allow for an explicit consideration of the locations of groundwater wells.Its model structure allowed for an acceptable and stable simulation of groundwater level time series of the three wells (see Sect. 4.1), but for groundwater management, frequency distributions of groundwater levels calculated over the timescale of interest are commonly preferred.For that reason we introduced a groundwater level percentile-based approach.Other than Westerberg et al. (2016), who transferred discharge time series into signatures derived from flow duration curves, we calibrate directly with the discharge and groundwater time series in order to evaluate the performance of our approach for selected time periods (see evaluation below).Similarly to the calculation of standardised precipitation or groundwater indices (e.g.Lloyd-Hughes and Saunders, 2002;Bloomfield and Marchant, 2013), we create cumulative frequency distributions of observed groundwater levels and the simulated groundwater levels from the previously evaluated model.Now, the exceedance probability or percentile for a selected observed groundwater level (for instance, the groundwater level above which groundwater flooding can be expected) can be used to define the corresponding simulated groundwater level, and the number of days exceeding or falling below the chosen groundwater level can directly be extracted from the frequency distributions (Fig. 3).Note that this procedure is performed after the model is calibrated and validated with KGE as described in the previous subsection.We avoided a calibration directly to the flow percentiles, as temporal information would have been removed, which would have resulted in a lower prediction performance of the model.
As the approach is meant to be applied in combination with climate change scenarios, we perform an evaluation on multiple timescales and flow percentiles.We assess the 5th, 10th, 25th, 50th, 75th, 90th and 95th percentiles on temporal resolutions of years, seasons, months, weeks and days.The deviation between modelled and observed number of exceedance days of these different percentiles is quantified by the mean absolute deviation (MAD) between simulated exceedances (SE) and observed exceedances (OE): where x stands for the timescale (years, months, weeks, days) and p is the respective percentile.To better compare the deviations for different percentiles we normalise the MAD to a percentage of mean absolute deviation (PAD) with the total number of days of the chosen timescale: where dp x is a normalising constant standing for the total number of days of the respective timescale and percentile.For example, if we take the timescale of months and the 75th percentile of exceedances we get a dp x of (100-75) % ×(365.25/12)days.To evaluate the prediction performance of the approach, percentiles are derived from the daily data of the calibration period and then applied to the validation period, similarly to the split sample test in Sect.3.3.In this way we are able to evaluate our model over different thresholds and in terms of temporal resolution.

Establishment of simple climate scenarios and assessment of groundwater level frequency distributions
Given the model performance assessment above, we use our approach to assess future changes of groundwater level frequencies at our study site.We derive projections of future precipitation and potential evapotranspiration by manipulating our observed "baseline" climate data.We extract distributional samples of percentage changes of precipitation and evaporation from the UK probabilistic projections of climate change over land (UKCP09) for (1) a low-emission scenario and (2) a high-emission scenario for the time period of 2070-2099.This enables us to capture, in a pragmatic and computationally efficient approach, for the two emission scenarios the general range of changes for the most pertinent variables that we think will most impact changes to monthly seasonal GW responses.We focus on projected median delta values for change in mean temperature ( • C) and precipitation (%), as well as the respective 25th and 75th percentile from the probabilistic projections, and apply them to our input data.For our model input we transfer projected temperatures into evapotranspiration via the Thornthwaite equation (Thornthwaite, 1948).In this way, we obtain 3×3 projections (3× precipitation and 3× evapotranspiration) for each of the emission scenarios that also address the uncertainty associated with the projections.The resulting simulations will pro-vide an estimate of possible future changes of groundwater level frequencies for the two emission scenarios including an assessment of their uncertainty.

Model calibration and evaluation
Table 2 shows the optimised parameter values as well as the model performance.The simulation of the discharge shows KGE values of 0.73 and 0.58 in the calibration and validation periods respectively.The borehole simulations show high KGE values and only slight deteriorations in the validation period.The parameters are located well within their predefined ranges.Mean soil storage V mean, S and mean epikarst storage V mean, E are 2015.6 and 1011.7 mm respectively.The porosity parameter at Ashton Farm is the highest, followed by the borehole at Black House.Ridgeway shows the smallest porosity value.For Ashton Farm and Blackhouse the calibration chose the groundwater storage compartment 7, while for Ridgeway it chose the compartment number 8. Figure 4 plots the observations against simulations for the calibration and validation periods.Modelled discharge generally matches the seasonal behaviour of the observations.However, some low-flow peaks are not depicted well in the simulation.When looking at the groundwater levels, the simulation of Ashton Farm appears to be most adequate.However, there are considerable periods when differences from the observations can be found for all wells.Simulations at Ridgeway and Black House show moderate performance in capturing peak groundwater levels.Notably, the simulation at Black House is slightly better in the validation period.The cumulative parameter distributions derived by SCEM indicate that the model parameters were easily identifiable when we use all available data (Fig. 5), while some parameters remain difficult to identify when only parts of the available data were used for calibration.Here identifiability of parameters is simply the extent to which the cumulative parameter distributions span the sampled parameter limits, where highly constrained parameter distributions are classed as identifiable.For instance, when only discharge was used for calibration (green lines), the parameters related to groundwater (porosity p GW and groundwater level offset h) happen to be unidentifiable.In addition, the groundwater parameters are only identifiable when their respective time series is considered (i.e. the yellow and blue lines at p GW, A and h GW, A ).In turn, the epikarst storage V mean, E is not identifiable when only the groundwater well data are used (yellow and red lines).We also note, as we would expect, that the final cumulative parameter distributions occur in different parts of the parameter space depending on the combination of performance metrics from different observations.

The percentile approach
When simulated peak values of groundwater levels are compared to the observations, we find a rather moderate agreement.Using the percentile approach we find different thresholds that exceed our selected groundwater level percentiles.This is elaborated for the 90th percentile of simulated and observed groundwater levels of Ashton farm (Fig. 6).
Table 3 shows the mean observed and modelled exceedances of all selected thresholds (the 5th, 10th, 25th, 50th, 75th, 90th and 95th percentiles) at all temporal resolutions in the validation period.By comparing matches in the number days of exceedance we evaluate our model with different percentiles and timescales.The left value is the mean absolute deviation (MAD) and the right value is the percentage of absolute deviation (PAD).We can see that the higher the percentile, the larger the deviation between observed and mod-  elled exceedances.The same is true for the PAD when moving from lower to higher temporal resolutions.The MAD decreases with higher temporal resolution.

Impact of simulated climate changes on groundwater level distributions
The results of applying the two climate projections to the model can be found at Table 4 and in Fig. 7.They display the mean model outputs (Qsim, AET) and mean exceedances per year, calculated on the basis of our modelled time series.Both emission scenarios (low and high) lead to increased modelled actual evapotranspiration and to decreased discharge simulations.In addition, both emission scenarios show a substantial reduction in exceedances of high percentiles.We also find that the standard error of the exceedances and non-exceedances of high-emission scenario tends to be higher than the standard error of the low-emission scenario.

Reliability of the simulations
A decrease in the simulation performance in the validation period is normally to be expected because there is always a tendency to compensate for structural limitations and observational uncertainties during the calibration.The low decrease in model performance from 11 % (groundwater prediction at Black House, KGE GW, B ) to 21 % (discharge prediction, KGE Q ) during the validation period indicates a certain robustness of the calibrated parameters and is comparable to split sample tests in other studies (e.g.Parajka et al., 2007;Perrin et al., 2001), although we have to acknowledge that for other applications a higher degree of robustness may be required.In addition, it is corroborated by their generally mainly high identifiability derived by SCEM for the final calibration scheme that used all four available observed discharge and groundwater level time series.By applying the shuffled complex evolution Metropolis algorithm and stepwise increasing the calibration data (only discharge, only groundwater, all together), we show that dis- Table 4. Model output and (non-)exceedances of percentiles in the reference period and the two scenarios (borehole: Ashton Farm, time period 2070-2099).
Scenario Qsim AET 5th 10th 25th 50th 75th 90th 95th mm a −1 mm a −1 non exc a −1 non exc a −1 non exc a −1 exc a −1 exc a −1 exc a −1 exc a charge data alone, as well as groundwater data alone, do not provide enough information to identify all of our model parameters, as the posteriors of some of the model parameters remain close to a uniform distribution.This is similar to the work of Schoups and Vrugt (2010), who found unidentifiable parameter values with their models calibrating only against discharge.The different calibration schemes visualised in the cumulative parameter distributions show that initially unidentifiable parameters become identifiable when the related time series is considered.Using all information, all model parameters are identifiable, which is in accordance with preceding research that showed the usefulness of multi-objective approaches.For instance, Kuczera and Mroczkowski (1998) demonstrated that a combination of groundwater and discharge observations can reduce parameter uncertainty.As we were mostly focussing on the difference among the calibration steps with increasing data, the use of KGE as an informal likelihood measure seems justifiable.
A look at the parameter values reveals an adequate reflection of reality.However, V mean, S and V mean, E are quite high considering that initial ranges for these parameters were 0-250/0-500 mm (Hartmann et al., 2013a, c).As previous studies took place in fairly dry catchments, the ranges were extended substantially to deal with the wetter climate in southern England.A high a SE indicates high variability in soil and epikarst thicknesses, favouring lateral karstic flow concentration (Ford and Williams, 2007).Butler et al. (2012) notes that the unsaturated zone of the Chalk is highly variable, ranging from almost zero near the rivers to over 100 m in interfluves.
Additionally, the mean epikarst storage coefficient K mean, E is quite low, indicating fast water transport from the epikarst to the groundwater storage which is in accordance to other studies (e.g.Aquilina et al., 2006).The value of parameter a f sep indicates that a significant part of the recharge is diffuse.A moderately high conduit storage coefficient K C and a high a GW indicate that there is a significant contribution of slow pathways by the matrix system.A rather low value that is sensitive to K C was found when calibrating only by discharge operations, indicating some interactions of K C with other model parameters (Saltelli et al., 2008).This is in accordance with the findings by Jones and Cooper (1998) as well as by Reeves (1979), who reported 30 and 10-20 % of the recharge occurring through (macro-) fissures in Chalk catchments.Although groundwater flow in the chalk is dominated by the matrix, given antecedent wet conditions, fracture flow can increase significantly (Lee et al., 2006;Ireson and Butler, 2011;Butler et al., 2012).Overall, a split-sample test, parameter identifiability analysis, realistic values of parameters and plausible simulation results provide a strong indication of reliable model functioning.

Performance of the percentile approach
Based on the idea of the standardised precipitation or groundwater indices (Lloyd-Hughes and Saunders, 2002;Bloomfield and Marchant, 2013) our percentile approach permits the performance of the model to be improved to reflect observed groundwater level exceedances.It yields acceptable performance for years to days up to the 90th percentile.A reduction of precision with the timescale is obvious but in an acceptable order of magnitude when the validation period is considered.Although deviations are considerable both in the calibration and validation periods, they are stable demonstrating certain robustness but also the limitations of our approach.Although the variable model structure of the VarKarst model was shown to provide more realistic results than commonly used lumped models (Hartmann et al., 2013b) it still simplifies a karst system's natural complexity.This can be seen in the simulated time series at Ashton Farm and Black House, which indicate an overestimation of high levels and an underestimation of low levels.The reason for this behaviour might be due to the modelling assumption of a constant vertical porosity, despite the knowledge that there can be a strongly non-linear relation between chalk transmissivity and depth.Several studies acknowledge that hydraulic conductivity in the Chalk follows a non-linear decreasing trend with depth (Allen et al., 1997;Wheater et al., 2007;Butler et al., 2009).This is mainly attributed to the decrease in fractures because of the increasing overburden and absence of water level fluctuations (Williams et al., 2006;Butler et al., 2012).Hydraulic conductivities in the Chalk can span several orders of magnitude (Butler et al., 2009) and are particularly enhanced at the zone of water table fluctuations (Williams et al., 2006).In addition, cross-flows occurring in the aquifer can lead to complicated system responses in the Chalk (Butler et al., 2009).For the sake of a parsimonious model structure, these characteristics were omitted in this study but their future consideration could help to improve the simulations if information about the depth profile of permeability is available.A decrease in performance was also found for standardised indices that use probability distributions instead of a simulation model (Vicente-Serrano et al., 2012;Núñez et al., 2014;Van Lanen et al., 2016).To im-prove the approach's reliability for higher groundwater level percentiles, a model calibration that is more focussed on the high groundwater level percentiles may be a promising direction.A consideration of the time spans above the 90th percentile will allow for a better simulation quality.This could be further evaluated by using different percentile weighting schemes and stepwise increasing the weight on the target percentile.

Applicability and transferability of our approach
We prepared two scenarios by manipulating our input data using probabilistic projections of annual changes in precipitation and potential evaporation at 2070-2099 for a lowand a high-emission scenario.This may neglect some of the changes on climate patterns predicted by climate projections but it is based on local and real meteorological values of the reference period, therefore avoiding problems that arise when historic and climate projection data show pronounced mismatches during their overlapping periods.Our results revealed that both scenarios lead to fewer exceedances over higher percentiles and more non-exceedances of lower percentiles, indicating a higher risk of groundwater drought at our study site.However, one problem that arises from our approach is that we do not consider changes in the seasonal patterns of our input variable, for example the increase in winter precipitation.If this increase was considered, the results would probably yield more exceedances of higher percentiles, as for instance found by Jimenez-Martinez et al. (2016).The purpose of the simple climate scenarios was to provide an application example of the new methodology, which is rather hypothetical considering the large uncertainties of current climate projections.We believe that our nine realisations are sufficient to show that different possible future changes have a non-linear impact on groundwater level frequencies.Although quite simplistic, our results are qualitatively in accordance with previous studies indicating increased occurrence of droughts in the UK (Burke et al., 2010;Prudhomme et al., 2014).The risk of drought occurrences might increase depending on the magnitude of change in evapotranspiration.However, more research and the application of more elaborated scenarios are necessary to completely understand the consequences of the change in groundwater frequency patterns in the UK chalk regions.
As the VarKarst model is a process-based model that includes the relevant characteristics of karst systems over range of climatic settings (Hartmann et al., 2013b), our approach can to some extent be used to assess future changes of groundwater level distributions and also be applied in other regions.This may bring some advantage concerning approaches that used transfer functions (Jimenez-Martinez et al., 2016) or regression models (Adams et al., 2010) for estimating groundwater levels if enough data for model calibration and evaluation are available.
As has been noted by Cobby et al. (2009), the likelihood and depth of groundwater inundations is one of the major challenges for future research of groundwater flooding.Since it is a lumped approach it may provide, after Butler et al. (2012), "a good indication of the likelihood of groundwater flooding, but do[es] not indicate where the flooding will take place".A spatial determination of the groundwater table such as that in Upton and Jackson (2011) would be possible but only in catchments where the borehole network is extensive.Thereby, the possibility to model several boreholes with one single calibration, due to compartment structure in VarKarst, might be also an advantage.Butler et al. (2012) noted that the parameterisation of the unsaturated zone is a major difficulty in the Chalk region.Since this study also struggles with the porosity, future work should take a closer look at this subject.

Conclusions
We used an existing process-based lumped karst model to simulate groundwater levels in a chalk catchment in southwestern England.Groundwater levels were simulated by translating the modelled groundwater storage into groundwater levels with a simple linear relationship.To evaluate our approach we analysed the agreement of observed and simulated groundwater level exceedances for different percentiles.Finally, a simple scenario analysis was undertaken to investigate the potential future changes of groundwater level frequencies that affect the risk of groundwater flooding as well as the risk of groundwater droughts.The model performance for discharge and the groundwater levels was satisfying and showed the general adequacy of the model for simulating groundwater levels in the chalk.It also revealed shortcomings concerning higher groundwater levels.This was corroborated by the percentile approach that showed a robust performance up to the 90th percentile.A scenario analysis using UKCP projections on expected regional climate changes showed that expected changes may lead to an increased occurrence of low groundwater levels due to increasing actual evaporation.Overall, our study shows that semi-distributed process-based modelling can be a valuable tool for simulating and predicting groundwater frequencies in Chalk regions where information is too limited for the application of distributed models.Here, a thorough model evaluation is essential for obtaining reliable and consistent results.In order to obtain more reliable results we recommend collecting more data about the hydrogeological properties of our study site to improve the structure of our model regarding the porosity and the unsaturated zone.In addition, longer time series and an adapted calibration approach which, in particular, emphasises the > 90th percentiles of groundwater levels could significantly improve our simulations.In addition we propose applying the method to other catchments to test the transferability of our approach and to quantify the variability of climate change impacts over a wide range of Chalk catchments across the UK.
Data availability.The sources of all underlying data are described in Table 1.All precipitation, discharge and potential evaporation data are freely available from the CEH website.Groundwater level observations and climate delta values can be accessed via request at EA and UKCP, respectively.

Appendix A
Within the VarKarst model, the parameter V mean, S (mm) and the distribution coefficient a SE (-) define the variation of soil storage capacities across the N model compartments.They are used to calculate the soil storage capacity V S,i (mm) for every compartment i by Eqs. ( 3) and (4) in Table A1.We apply the same distribution coefficient a SE when we derive the epikarst storage distribution by the mean epikarst depth V mean, E (mm) (Eqs.6 and 7 in Table A1).We determine actual evapotranspiration from each soil compartment.E act,i is calculated by reducing potential evapotranspiration using the Thornthwaite equation (Thornthwaite, 1948) and the soil saturation deficit (Eq. 1 in Table A1).Surface run-off is found by the excess of soil and epikarst storage of the previous model compartment (Eq. 2 in Table A1).With surface runoff and actual evapotranspiration, the stored water volume at each soil compartment V Soil,i (mm) can be calculated by simply applying water balance.
The recharge from the soil to the epikarst R Epi,i (mm) is calculated by the excess of the soil storage (Eq. 5 in Table A1), while the epikarst outflow follows a linear storage assumption (Eqs.8 and 9 in Table A1).Again, water balance allows the stored water V Epi,i (mm) to be determined at each time step t and each epikarst compartment i.The downward flux from the epikarst considers diffuse (R diff,i , mm) and concentrated groundwater recharge (R conc,i , mm) component that are found by a variable separation factor f C,i (-) and a distribution coefficient a f (-) (Eqs.10, 11 and 12 in Table A1).The diffuse component recharges the groundwater compartments beneath the respective epikarst layers (i = 1. ..N − 1).The concentrated component flows laterally to compartment i = N and therefore recharges the conduit system.
Similarly to the epikarst compartment, variable groundwater storage coefficients K GW,i (d) are calculated (Eq.15 in Table A1) and applied to calculate the discharges of the matrix system (Eq.13 in Table A1) and the conduit system (Eq.14 in Table A1), which together sum up to the entire discharge of the system (Eq.15 in Table A1).Knowing groundwater recharge and groundwater discharge for each model compartment i again allows the stored volume of water to be determined within the groundwater compartment V GW,i at time step t, which is used to simulate the groundwater levels (Eq. 1 in Sect.3.1).

Figure 1 .
Figure 1.Location map with an overview of the Frome catchment.

Figure 3 .
Figure 3. Schematic description of the percentile approach.

Figure 4 .
Figure 4. Modelled discharge (m 3 s −1 ) and groundwater levels (m a.s.l.) at the boreholes Ashton Farm, Ridgeway and Black House.

Figure 5 .
Figure 5. Cumulative parameter distributions (blue) of all model parameters; strong deviation from the 1 : 1 (dark grey) indicates good identifiability.

Figure 6 .
Figure 6.Illustration of the percentile approach.Time series of the observed (grey dots) and modelled (green line) groundwater levels at Ashton Farm.The dotted lines represent the respective 90th percentile.

Figure 7 .
Figure 7. Mean model input (mm a −1 ), mean modelled output (mm a −1 ) and mean (non-)exceeded percentiles (number a −1 ) in the reference period and both scenarios (borehole: Ashton Farm; future period: 2070-2099).The circles indicate the spread among the nine realisations for each of the two scenarios.

Table 1 .
All available data used in the study.

Table 2 .
Model parameters, descriptions, ranges and optimised values.

Table 3 .
Deviations of simulated to observed exceedances of different percentiles in the validation period (borehole: Ashton Farm).The left value is the mean absolute deviation MAD (d), the right value is the deviation percentage PAD (%). −1