Numerical rainfall simulation with different spatial and temporal evenness by using a WRF multiphysics ensemble

The Weather Research and Forecasting (WRF) model is used in this study to simulate six storm events in two semi-humid catchments of northern China. The six storm events are classified into four types based on the rainfall evenness in the spatial and temporal dimensions. Two microphysics, two planetary boundary layers (PBL) and three cumulus parameterizations are combined to develop an ensemble containing 16 members for rainfall generation. The WRF model performs the best for type 1 events with relatively even distributions of rainfall in both space and time. The average relative error (ARE) for the cumulative rainfall amount is 15.82 %. For the spatial rainfall simulation, the lowest root mean square error (RMSE) is found with event II (0.4007), which has the most even spatial distribution, and for the temporal simulation the lowest RMSE is found with event I (1.0218), which has the most even temporal distribution. The most difficult to reproduce are found to be the very convective storms with uneven spatiotemporal distributions (type 4 event), and the average relative error for the cumulative rainfall amounts is up to 66.37 %. The RMSE results of event III, with the most uneven spatial and temporal distribution, are 0.9688 for the spatial simulation and 2.5327 for the temporal simulation, which are much higher than the other storms. The general performance of the current WRF physical parameterizations is discussed. The Betts–Miller–Janjic (BMJ) scheme is found to be unsuitable for rainfall simulation in the study sites. For type 1, 2 and 4 storms, member 4 performs the best. For type 3 storms, members 5 and 7 are the better choice. More guidance is provided for choosing among the physical parameterizations for accurate rainfall simulations of different storm types in the study area.


Introduction
Precipitation is a crucial element in the hydrological cycle at regional or global scales.With the characteristics of high intensity, short duration, uneven distribution and sudden occurrence, the precipitation easily causes floods, with a high peak in semi-humid regions, which is tricky for forecasting accurately (Nikolopoulo et al., 2010).The quantitative precipitation forecast (QPF) is an effective method to avoid flood disasters and help flood risk management (Kryza et al., 2013).With the development of computer technology and atmospheric physics, numerical weather prediction (NWP) has become an efficient method for QPF (Yang et al., 2012).
As the latest-generation mesoscale NWP system, the Weather Research and Forecasting (WRF) model can apply to the regions across scales from tens of meters to thousands of kilometers.Not only the rainfall quantity but also the spatial and temporal patterns of rainfall can be captured by the WRF model with high resolution.Though it has been confirmed by many studies that the WRF model performs better than the fifth-generation Penn State/NCAR (National Center for Atmospheric Research) Mesoscale Model (MM5), rainfall is still one of the most difficult variables to simulate and predict (Collischonn et al., 2005;Bruno et al., 2014;Lee et al., 2015).Because of the complicated processes of storm formation and development, the WRF model provides Published by Copernicus Publications on behalf of the European Geosciences Union.
various physical parameterizations to be applied in different cases.Each physical parameterization emphasizes on different physical processes and has its unique structure and complexity, which may have great influence on the rainfall simulations.That is why numerous sensitivity studies of the WRF parameterizations are carried out in different regions of the world (Klein et al., 2015).Three categories of the parameterizations have been mostly discussed and identified as the main influencing factors for rainfall simulation, i.e., microphysics, planetary boundary layer (PBL) and cumulus parameterizations.Different physical parameterizations are found to be efficient for different rainfall events in different regions (Jankov et al., 2011;Madala et al., 2014;Pennelly et al., 2014).
It is an increasingly difficult task to determine the optimal combination of physical parameterizations due to the development of the WRF model with more and more choices of parameterizations.Although many studies show that the best physical parameterization combination can be determined by many simulations for a certain rainfall event, it is difficult to tell the characteristics of the future rainfall events for realtime rainfall prediction.In order to consider the uncertainties associated with the selection of physical parameterizations, it has become a common method to use the ensemble in numerical rainfall prediction (Evans et al., 2011).Flaounas et al. (2011) studied an ensemble with six members over West Africa, which was produced by two PBL and three cumulus parameterizations.An ensemble containing 18 members was investigated in the south-central United States, which was created by three microphysics, three PBL and two cumulus parameterizations (Jankov et al., 2005).And an ensemble with 36 members was tested for a series of rainfall events at the south-east coast of Australia, which contained two PBL, two cumulus, three microphysics and three radiation parameterizations (Flaounas et al., 2011).These studies show that no single physical parameterization combination performs the best for all rainfall events.
In this study, 16 physical parameterization combinations are designed from two microphysics, Purdue-Lin (Lin) and WRF Single-Moment 6 (WSM6), two PBLs, Yonsei University (YSU) and Mellor-Yamada-Janjic (MYJ), and three cumulus parameterizations, Kain-Fritsch (KF), Grell-Devenyi (GD) and Betts-Miller-Janjic (BMJ).Lin is a sophisticated parameterization which contains five classes of hydrometeors, and it is suitable for high-resolution simulations (Lin et al., 1983).WSM6 reveals an improvement in the high cloud amount and surface precipitation, which adds graupel microphysics based on the works of Lin et al. (1983) and Rutledge and Hobbs (1983).MYJ PBL is appropriate for all stable or slightly unstable flows (Janjic, 1994).YSU PBL improves the performance of intense convection based on the Medium Range Forecast (MRF) PBL (Hong et al., 2006).KF is a classic cumulus parameterization and has been used successfully for years in many scientific institutions (Kain, 2004).GD is an ensemble cumulus parameterization and can be used in high resolution models (Grell and Freitas, 2014).BMJ can adjust instabilities in the environment by generating deep convection and has been used extensively throughout the globe (Janjic, 2000).
Two medium sized catchments, the Fuping and Zijingguan, are chosen as the study sites, which are respectively located in the south and the north reaches of the Daqinghe catchment in North China.With the characteristics of high intensity, short duration, uneven distribution and sudden occurrence, the storm events in the study sites are representative for the semi-humid region with temperate continental monsoon climates.The aim of this study is to determine the potential performance of the WRF model for different types of storm events in semi-humid regions.Six storm events are chosen from the study sites and classified into four different types based on the rainfall evenness in the spatial and the temporal dimensions.The 16 designed combinations of physical parameterizations are treated as the ensemble for rainfall simulation, and the results regarding both the cumulative rainfall amounts and the spatiotemporal patterns are verified.

WRF model configuration and designed physical ensemble
Version 3.6 of the WRF model is used in this study.WRF is a fully compressible, nonhydrostatic, meteorological model, and it features physics, numerics, advanced dynamics and data assimilation.The model manual (Skamarock and Klemp, 2008) shows more detailed information of the WRF model.Two-way nesting is allowed for the communication between multiple domains at different grid resolutions, and three nested domains are centered over the Fuping and Zijingguan catchments respectively.In general, high-resolution rainfall products downscaled by the WRF model are more appropriate to be used as the input of the hydrological models (Cardoso et al., 2013;Chambon et al., 2014).Therefore, horizontal grid spacing of the WRF innermost domain is set to be 1 km, and the downscaling radio is set to be 1 : 3 (Givati et al., 2012;Yang et al., 2012).The center of the domain is at lat 39  1.There are 40 vertical levels for three domains, and the top level is set at 50 hPa (Aligo et al., 2009;Qie et al., 2014).The WRF model is initialized from the six-hourly global analysis data provided by the 1 • × 1 • grids of the NCEP (National Centers for Environmental Prediction) Final (FNL) operational model.The integration step of WRF follows the "6 × dx" rule where dx is the grid spacing, and the integration step is 6 s for the in- nermost domain (Skamarock and Klemp, 2008).The time step of the WRF model output is set to 1 h.The spin-up period is necessary for the WRF model to develop the smaller scale convective features, and the widely used lengths are 6 h (Givati et al., 2012), 12 h (Hu et al., 2010) and 24 h (Wang et al., 2012).Different spin-up lengths were tried for the six storm events in this study, whereas the results did not show obvious differences regarding the simulated rainfall.In order to improve the calculation efficiency for further hydrological use (i.e., flood warning), a 6 h period is chosen to spin up the model.That is to say, the start of the model integration is 6 h earlier than the storm start time, and the end time of the model integration is consistent with the storm end time.
The setting of the WRF model is very important before it is used to simulate the meteorological factors, especially the physical parameterizations.As shown by Table 1, a WRF physical ensemble is constructed by combining different choices of the physical parameterizations to simulate the storm events in the study areas.The selection of the parameterizations is based on their good performance in semihumid regions of China (Givati et al., 2012;Qie et al., 2014;Di et al., 2015).In order to learn the physical parameterizations more comprehensively, the different complexity and mechanisms are also considered.WSM6 is the most complex in the series of WSM schemes, which is revised based on Lin (Hong and Lim, 2006).YSU is a non-local closure scheme, while MYJ is a local closure scheme (Evans et al., 2011).The KF is a simple cloud model which can be triggered when air parcel temperature at its lifting condensation level is larger than the environmental air (Pennelly et al., 2014).The GD can run effectively within each high resolution grid (Grell and Freitas, 2014).The BMJ scheme is more suitable for convective weather because it can adjust the model profile of temperature and moisture (Janjic, 2000).Some studies have indicated that the cumulus parameterizations may be invalid with fine horizontal resolutions, while the threshold of the resolution is unknown (Argüeso et al., 2011;Evans et al., 2011;Pei et al., 2014).Many studies use cumulus parameterizations with about 1 km resolution for weather simulation.For example, Shepherd et al. (2016) explored the effect of simulation for tropical cyclones by four cumulus parameterizations, including KF, BMJ, G-3 and TD, with the nested domains 1.33, 4 and 12 km.Remesan et al. (2015) studied the WRF model sensitivity to the choice of parameterizations: 4 nested domains (1, 3, 9 and 27 km) are used, and the cumulus parameterizations of GD, BMJ, KF1 and KF2 are investigated.In order to make the study more rigorous, members 13, 14, 15 and 16 are also tested and compared with the members containing cumulus parameterizations.Many studies indicate that the simulation of precipitation is insensitive to the land surface model (LSM) and short-and long-wave radiation parameterizations, so Noah for LSM, the RRTM and Dudhia schemes for long wave and shortwave radiation are used in this study, which are most frequently applied to precipitation simulation (Guo et al., 2014;Chen et al., 2014).3 Storm events and evaluation statistics  2 shows the duration and accumulative rainfall amounts of the six storm events.The six storm events are categorized into four types based on the rainfall evenness of the spatiotemporal distribution (Liu et al., 2012).The variation coefficient C v is used to evaluate the uneven level: (1) For the spatial distribution, x i is the 24 h rainfall accumulation at rain gauge i, and x is the average of x i ; N is the number of rain gauges.For the temporal distribution, x i is the hourly areal rainfall at time i, and x is the average of x i ; N is the number of hours.
The higher C v is, the more uneven the rainfall is.In order to learn the spatial and temporal evenness of the rainfall in the two catchments, both spatial and temporal C v of the storm events from 1985 to 2015 are calculated.In reality, rainfall in northern China is much more uneven than the south, and it is impossible to find absolute even rainfall in both space and time.Therefore, we chose a threshold of 5 %, which is also considered in other statistical analyses in the same area, as the critical value to separate even and uneven rainfall events.With the threshold, we found the two critical values of 0.4 for the spatial C v and 0.6 for the temporal C v .That is to say, the storm events with a spatial C v below 0.4 or with a temporal C v below 1.0 account for 5 % of the total storm events from 1985 to 2015 in the study area.Table 3 shows the spatial and temporal C v of observations for the six storm events.Storm type 1 is characterized by even spatiotemporal distribution of rainfall.For storm type 2, rainfall is even for spatial distribution, but the temporal distribution is uneven.Storm type 3 and type 4 are characterized by an uneven distribution of rainfall in both space and time, while the rainfall of type 4 is highly concentrated in space and time.Due to the temperate continental monsoon climate in the study sites, there is no storm event with even rainfall and continuous in time but unevenly distributed in space.

Verification indices for rainfall simulations
For evaluating the accuracy of rainfall simulation, both the accumulated areal rainfall and the spatiotemporal distribution of the rainfall are important.The accumulated areal rainfall is evaluated by the relative error (RE): where P is the simulated value, which is the average value of all the grids inside the study area, and Q is the observed value, which is calculated by the Thiessen polygon method based on the observations of the rain gauges (Sivapalan and Blöschl, 1998;Jarvis et al., 2013).The spatial and temporal distributions of the rainfall are evaluated by a two-dimensional verification scheme.Both in spatial and temporal dimensions, some categorical and continuous indices are selected and calculated (Liu et al., 2012).The categorical verification indices are chosen as the probability of detection (POD), the frequency bias index (FBI), the false alarm ratio (FAR) and the critical success index (CSI).The calculation of the categorical indices depends on whether it rains or not, as shown in Table 4.It should be mentioned that the insignificant precipitation (less than 0.1 mm h −1 ) is regarded as no rain.For verification in the spatial dimension, the comparison is made between the observations of the rain gauges and the simulations of the WRF model at each time step i, and then the average values are calculated by the categorical indices at all the time steps for the final results.As shown by the Eqs.( 3)-( 6), N is the total number of time steps of the WRF model output, which is 24 in this study.For the temporal dimension, the time series data of simulation and observation are used to calculate the four indices at each rain gauge i, then the average values are calculated by the indices at all the rain gauges for the final results.This time N is the number of the rain gauges of the Fuping and Zijingguan catchments respectively in Eqs. ( 3 For the four categorical indices, POD indicates the percentage of correct simulation for the observed rainfall.FBI shows whether the WRF model has a tendency to overestimate (FBI > 1) or underestimate (FBI < 1) rainfall occurrences, while FBI cannot show closeness of the simulation and the observation.FAR represents the ratio of false alarms, and CSI indicates the percentage of correct simulation between the simulated and observed rainfall.The perfect scores of POD, FBI, FAR and CSI are 1, 1, 0 and 1, respectively.Besides the categorical indices, three continuous indices including the root mean square error (RMSE), the mean bias error (MBE) and the standard deviation (SD) are adopted for a more quantitative evaluation of the simulated rainfall distributions in space and time.The calculations of the three continuous indices are expressed by Eqs. ( 7)-( 9).
For the spatial dimension, P j and Q j are the simulation and observation of 24 h rainfall accumulations at each rain gauge j .M is the number of the rain gauges, which is 8 for the Fuping catchment and 11 for the Zijingguan catchment.For the temporal dimension, P j and Q j are the average areal rainfall simulation and observation at each time step j .This time M is 24, which represents the number of the time steps.The final values of the three indices represent the mean magnitude of error, the average cumulative error and the variation of the simulation error of MBE, respectively.The perfect score of all the three indices is 0. In order to compare the simulations for different storm events, the final values of the three continuous indices in both two dimensions are represented as percentages of the corresponding average observations.

Simulations of the 24 h rainfall accumulations
The simulation results of the cumulative rainfall amounts from the 16 members of the physical ensemble are shown in Table 5 and ranked according to REs.Members 5, 4 and 2 rank in the top three for event I (storm type 1), with relatively lower REs.For type 2 events, members 4 and 12 show more stable performances, ranking in the top five for both events II and VI.For type 3 events, members 5 and 7 are better choices, with top 5 rankings for events IV and V.The top four members for event III (type 4) are members 4, 2, 16 and 3.It can be seen that the performances of the 16 members are quite distinct for different types of storm events.In addition, the difference among the 16 members varies significantly for a certain storm event.For example, the difference of REs for member 8 (18.44 %) and member 9 (−37.69%) reaches up to 56.13 % for event I.While for event V, the largest difference of RE among all the 16 members is only 9.10 %.There are great uncertainties for the simulation of the different storm events using the WRF model with different combinations of the physical parameterizations.It's hard to tell which parameterization combination is the best, but only to find the one with the best general performance.In this study, member 4 could be the best choice considering its stable top rankings for storm types 1, 2 and 4, while members 9, 10, 11 and 12 have a worse performance for storm types 1 and 4. For type 3 events, members 5 and 7 are better choices.However, in real-time rainfall prediction, there is a necessity to use a physical ensemble since it is always tricky to tell the exact characteristics of the future storm before it happens, and the use of a determined combination of parameterizations which performs generally well cannot always lead to the best results.According to Table 5, the four members without cumulus parameterization have a quite different performance for different events.For example, member 15 performs the best for event IV; nevertheless, it performs the worst for event V. Comparing the members containing cumulus parameterization, members 13, 14, 15 and 16 have no significant advantages or significant disadvantages for rainfall simulation.Taking event I as an example, the best one (member 16) of the 4 members without cumulus parameterization ranks 4th out of the 16 members, whereas the worst one (member 13) ranks 12th.However, few members without cumulus parameterization rank in the top four, which means that it is necessary to use cumulus parameterization for the simulation of rainfall accumulation.
In order to measure the magnitude of error for different storm types, all the REs use absolute values in the following analysis to calculate the average relative error (ARE) of the 16 members of the physical ensemble.The AREs of the 16 members for the four storm types are shown in Table 6.It's interesting to note that the ranking of the model performance is type 1 > type 2 > type 3 > type 4, from the best to the worst.It means that the WRF model performs best for the storm events with even spatiotemporal distribution, while the type of storm events with highly uneven spatiotemporal distribution is hard for WRF to handle.The cumulative curves of the simulated and observed rainfall for the six storm events are shown in Fig. 3. Except for event I, the cumulative curves of the members are all below the observed ones for the other storm events.The shapes of 16 simulated cumulative curves are consistent with the observed ones for events I, II and VI (type 1 and type 2 events), indicating that the simulated rainfall occurrences always keep step with the observations.While for events IV, V and III (type 3 and type 4 events), the simulated starting and ending times of the rainfall durations are quite different from the observations.It can be determined that type 1 and type 2 events have even rainfall distributions in space, while the spatial rainfall is unevenly distributed in space for type 3 and type 4 events.It seems that storms with rainfall evenly distributed in space tend to have better simulation results in the temporal patterns of rainfall accumulations.

Simulations of the spatial rainfall distributions
In order to compare the simulation results of the different storm types in detail, seven verification indices are first calculated to evaluate the simulated rainfall distributions in space.Figures 4 and 5 respectively show the values of the categor- ical indices and continuous indices for the six storm events with the 16 members of the physical ensemble.
It can be seen in Fig. 4 that PODs of storm types 1 and 2 (events I, II and VI) are all above 0.70 for the 16 members, which means that the events with even distributions regarding the rainfall occurrences in space can be accurately simulated.For the other two storm types, event IV, with relatively lower C v , performs better than events V and III.However, PODs of in space because of the high FARs (near 1.0).Storm type 3 outperforms storm type 2, with relatively lower FARs.CSI can be considered as a comprehensive description of accuracy.Storm type 1, with the highest CSIs, performs the best of all the 16 members, while CSIs of storm type 4 are all close to zero, showing that the simulation results are unreliable.CSIs of the other two storm types have few differences as a whole, but the index values are a little bit higher for events with more evenly distributed rainfall in space.
Figure 5 shows that the values of RMSE have great change in different members for a certain event.RMSE is always regarded as the key quantitative index to estimate errors.Storm event II, with the lowest C v , always has the lowest RMSE for the 16 members, which means that the WRF model per-forms the best for storm event II in simulating the spatial rainfall distributions.Except for members 1 and 4, event III has the highest RMSE, and the values of eight members exceed 100 %.For the other four events, there is little difference between RMSEs in the 16 members.The MBE index contains the directions of errors, but in Fig. 5   bers, which indicates that there are always some members performing better than the 4 members without cumulus parameterization.It is helpful to use appropriate cumulus parameterization for the simulation of the spatial rainfall distribution.
The average values of the 16 members for all the seven indices are calculated to quantitatively analyze the performance of the WRF model in spatial dimension for the four storm types.As shown in Table 7, the value of POD for storm type 1 is higher than storm types 3 and 4. In addition, the value of CSI for storm type 1 is the highest, and the value of FAR is the lowest in the four storm types.The lower values of RMSE and MBE for storm type 1 also indicate that the WRF model performs well for storm type 1.The simulations of type 3 events are worse than type 2 events, showing lower POD and higher RMSE values, though the FARs of the type 2 events are a little higher than type 3 events.The lowest POD and CSI and the highest FAR and RMSE can be found with storm type 4, which indicates that the WRF model can hardly capture this kind of storm accurately in space.Since the index of RMSE shows the actual magnitude of errors without canceling out the positive and negative errors, a correlation analysis is further carried out between RMSE and the spatial evenness indicator C v .It's interesting to find that RMSE and C v have a good linear relationship and the correlation coefficient of the linear regression (R 2 ) can reach up to 0.8899 (shown by Fig. 6).This means that the WRF simulation error increases with the increase of the spatial rainfall unevenness in the study sites.

Simulations of the temporal rainfall patterns
The seven indices are also calculated in the temporal dimension to evaluate the simulated rainfall patterns in time.The values are respectively shown in Figs.7 and 8.In Fig. 7, PODs of storm types 1 and 2 are all above 0.70 and much higher than storm types 3 and 4 in the 16 members.It indicates that storm types 1 and 2 can be accurately simulated with regards to the rainfall occurrence in the temporal dimension, while the WRF model fails with storm type 4, with all PODs of the 16 members close to 0. For FBI, the scores of events I and IV are nearly perfect, but the other four events show tendencies of overestimating the rainfall occurrences in time, especially event VI.The lowest FAR values are also found with storm type 1, with all the values less than 0.20 in the 16 members.Storm type 4 has the highest FARs, which are close to 1.0 in some members.Based on the FAR index, the ranking of the WRF performance in simulating temporal rainfall occurrences is type 1 > type 3 > type 2 > type 4, from the best to the worst.In the 16 members, CSIs of storm type 1 are always the highest, while CSIs of storm type 4 are always the lowest.It should be mentioned that the CSI is 0 in members 7, 11, 14 and 15 in storm event III, indicating a bad simulation of the temporal rainfall occurrences for this type 4 event.
In Fig. 8, type 1 event has the lowest RMSEs of the 16 members, but the values are nearly 100 %.Type 4 event has the highest RMSEs, which are all above 250 %.The other two types of storm events also have high RMSE values between 100 and 180 %.We can say that the WRF model cannot perform well in simulating the temporal rainfall patterns for all the storm types.Storm type 1 has the lowest MBEs, and the MBE values of storm types 3 and 4 are relatively higher than storm type 2 in most members.All SDs are above 100 % in the 16 members for the six events, with the lowest values found with event II.From Figs. 7 and 8, the same as the conclusions in the spatial dimension, most values of the indices for members 13, 14, 15 and 16 are in the range of the values for the other 12 members, which indicates that there are always some members performing better than the 4 members without cumulus parameterization.It is also nec-essary to use cumulus parameterization for the simulation of the temporal rainfall distribution.
The average ensemble values for the seven indices are also calculated for evaluating the performance of the WRF model in simulating the temporal rainfall patterns.The results are shown in Table 8.The values of POD and CSI for storm type 1 are the highest, and the values of FAR and RMSE are the lowest in the four storm types, which indicate that the WRF model performs best for storm type 1.The model performs the worst for storm type 4, with the lowest POD and CSI and the highest FAR and RMSE.In general, the simulation results of the temporal rainfall patterns are unsatisfactory for all the four storm types.The linear relationship between RMSE and the temporal C v is also significant and the correlation coefficient of linear regression (R 2 ) is 0.7524 (shown by Fig. 9).It indicates that the simulation error also increases with the increase of the rainfall unevenness in the temporal dimension.

Discussion
In this study, the performances of 16 WRF physical members are estimated firstly by AREs for cumulative rainfall amounts and then by a two-dimensional verification scheme for spatiotemporal rainfall distributions.According to the spatiotemporal evenness, six storm events are classified into four storm types.Storm type 1 has a two-dimensional evenness of rainfall which is even in the spatiotemporal distribution.The WRF model performs best for simulating this storm type, not only for the cumulative rainfall amounts but also for the spatiotemporal distributions.Storm type 2 is only even in space, and the simulation results from the WRF ensemble are better than storm types 3 and 4.But compared with type 1, the cumulative rainfall amounts of type 2 events are seriously underestimated.Storm types 3 and 4 are both uneven in spatiotemporal distribution, and the unevenness is especially remarkable for type 4 events.The simulations of the WRF model are unsatisfactory for the spatiotemporal patterns of the two storm types.The simulation results of type 4 events are the worst among the four storm types.Some of the members even miss the whole storm duration in space and time.It is interesting to find that the WRF model tends to underestimate the rainfall amounts except for storm type 1.With more events being investigated in the study sites, the general simulation errors of the WRF model can be determined by statistical analysis, which can help build a correction model to further improve the rainfall products of the WRF model.
For rainfall forecast operation, it is hard to identify the storm type before the storm occurs.Therefore it is important to determine the physical parameterizations which generally perform well.According to the REs of the 16 members for the six storm events shown in Table 5, the AREs of the six storm events for one certain member are cal- The members containing GD perform better than members with BMJ while worse than members with KF.The range of the AREs is 42.32-44.53%.The members without cumulus parameterization also perform better than members with BMJ while worse than members with KF, and the range of the AREs is 39.55-49.16%.That is to say, the cumulus parameterizations have a significant effect on the performance of the WRF model and BMJ performs the worst in the three cumulus parameterizations.Janjic (2000) indicated that BMJ performed poorly in accurately reproducing the range and the intensity of the low-level jet.The strong ability of BMJ in simulating the upward transportation of vapor always results in underestimation of the rainfall amount.That is the main reason why BMJ is not a good choice in the study area.Additionally, it is necessary to use cumulus parameterization for the simulation of the rainfall accumulation and spatiotemporal rainfall distribution in the study area.However, the threshold of the horizontal resolution needs to be further discussed to determine whether to use the cumulus parameterization.
The uncertainties of the rainfall processes affect the choice of the physical parameterizations in a certain area.It is necessary to select the most appropriate physical parameterizations to design the physical ensemble for rainfall simulation and prediction.In this study, the 16 members of the physical ensemble are constituted from two microphysics, two PBL and three cumulus parameterizations, which are proven to be appropriate and widely used in the neighboring areas of the study sites (Hong et al., 2006;Miao et al., 2011;Pan et al., 2014).With the development of the WRF model, more sophisticated and realistic physical parameterizations could be developed and should be tested in the study area.
The verification of the WRF model has always been recognized as a worthy issue to be explored.In this study, a verification method which can estimate the rainfall simula- Ultimately, the main goal of rainfall forecasts is to obtain efficient flood forecasts.The peak flood, flood peak appearance time and flood process are all significantly influenced by the rainfall accumulations and the spatiotemporal distribution of the rainfall (Schellekens et al., 2011;Cane et al., 2013;Fan et al., 2015).Event V, which occurred on 21 July 2012, has caused the greatest flood during the past 10 years in Jing-Jin-Ji (Beijing-Tianjin-Hebei) area and received widespread attention in China.The 24 h rainfall accumulation was 155.43 mm in the Zijingguan catchment, and the peak flow reached 2580 m 3 s −1 at the catchment outlet.In such cases, accurate rainfall simulations and predictions can greatly help flood warnings.However, to analyze the usefulness of the WRF simulations to flood warning, the rainfallrunoff transformation processes should be further considered.This will involve many uncertainties, such as the choice of the rainfall-runoff model, the data used for model calibration and the involvement of a real-time updating scheme, which also has a considerable impact on the accuracy of the flood forecasting results.The exploration of different parameterizations for flood warning purposes is an important issue and worth discussing in further study.

Conclusion
In this study, the FNL data from NCAR provide the initial and boundary conditions for the WRF model, which is used for rainfall simulation of six representative storm events with a duration of 24 h in the Fuping and Zijingguan catchments, located in the south and the north reaches of the Daqinghe basin in semi-humid areas of North China.Two microphysics, two PBL and three cumulus parameterizations are selected to develop the 16 members of the physical ensemble of the WRF model.Both the cumulative amount and the spatiotemporal patterns of the simulated rainfall are analyzed and verified.The relative error is used to evaluate the 24 h accumulated areal rainfall.The spatial rainfall distributions and temporal rainfall patterns are verified by a two-dimensional verification scheme including four categorical and three continuous indices.The six storm events are classified into four types based on the spatiotemporal evenness of the rainfall.In general, the ranking of the average model performance for different storm types is type 1 > type 2 > type 3 > type 4, from the best to the worst, depending on both the cumulative rainfall amounts and the spatiotemporal rainfall patterns.A negative correlation is found between the simulation error and the rainfall evenness in both spatial and temporal dimensions.Storm events with more evenly distributed rainfall tend to have better simulation results in space and time.In addition, for the small catchment scale, accumulated areal rainfall is more important than the spatiotemporal rainfall distributions.According to the REs of rainfall accumulations, member 4 is the better choice for storm types 1, 2 and 4, while members 9, 10, 11 and 12 have the worse performance for storm types 1 and 4. For type 3 events, members 5 and 7 are the better choices.It provides a reference for choosing the optimal ensemble in the study area for different storm types.
This study provides a reference for ensemble simulation of different rainfall types in semi-humid areas of China in the WRF model.However, the simulated rainfall has relatively large errors, and the simulation results of the temporal rainfall patterns are always unreliable, especially the results of events III and V, which cannot be used directly in hydrological studies.Data assimilation has been proven to be an effective method in improving the rainfall simulation results of the WRF model by many studies (Ha and Lee, 2012;Liu et al., 2012;Routray et al., 2012).Data assimilation can ingest various sources of observations (surface observed data, radar data, satellite data and sounding data) into the WRF model products and then use the respective error statistics to update and correct the WRF model products (Wan and Xu, 2011;Ha et al., 2014;Xie et al., 2016).More studies should be carried out in the study sites with the assistance of data assimilation so that the rainfall products from the WRF model can be further improved.

Figure 1 .
Figure 1.The nested domains and the orography of the Fuping catchment and Zijingguan catchment.

3. 1
Study area and storm events The Fuping and Zijingguan catchments are the study areas, which respectively belong to the south and north reaches of the Daqinghe catchment, located in northern China with semi-humid climatic conditions.The drainage area of Fuping (from lat 39 • 22 to 38 • 47 N and from long 113 • 40 to 114 • 18 E) is 2210 km 2 , and the area of Zijingguan (from lat 39 • 13 to 39 • 40 N and from long 114 • 28 to 115 • 11 E) is 1760 km 2 (shown by Fig. 2).The average annual rainfall is about 600 mm at the study sites, and the majority of rain focuses in the flood season.As shown by Fig. 2, there are eight rain gauges in the Fuping catchment and 11 rain gauges in the Zijingguan catchment.The observed hourly rainfall data from rain gauges are treated as the ground truth.Six 24 h storm events are selected from the 10 recent years (2006 to 2015) with the respective rainfall characteristics of the study sites.The encounter between the western pacific subtropical high and the cold vortex of westerlies and the strong upward motion caused by Taihang Mountains are the main factors of rain formation in the study area, while the six storm events have quite different spatial and temporal evenness.Table

Figure 2 .
Figure 2. The location of the Daqinghe catchment in northern China (light shading) and the locations of the two study sites in the Daqinghe catchment.

Figure 3 .Figure 4 .
Figure 3. Cumulative curves of the observed and simulated areal rainfall for the six storm events.

Figure 5 .
Figure 5. Spatial values of the three continuous indices for different storm events with the 16 members of the physical ensemble.

Figure 6 .
Figure 6.The relationship between RMSE and Cv in the spatial dimension.

Figure 7 .
Figure 7. Temporal values of the four categorical indices for different storm events with the 16 members of the physical ensemble.

Figure 8 .
Figure 8. Temporal values of the three continuous indices for different storm events with the 16 members of the physical ensemble.
• 04 15 N and long 113 • 59 26 E, and the nested domain sizes are 252 × 234, 144 × 126 and 96 × 84 km 2 for the Fuping catchments.The center of the domain is lat 39 • 25 59 N and long 114 • 46 01 E, and the nested domain sizes are 216 × 198, 108 × 90 and 72 × 42 km 2 for the Zijingguan catchment.The nested domains and the orography of the two catchment are shown in Fig.

Table 1 .
The constitution of the WRF physical ensemble.

Table 2 .
Durations and rainfall accumulations of the six selected 24 h storm events.

Table 3 .
Spatial and temporal C v of the observed rainfall for the six storm events.

Table 4 .
Rain-no rain contingency table for the WRF simulation against observation.

Table 5 .
Rankings of the 16 members of the physical ensemble according to RE (%) of the simulated rainfall accumulations for the storm events.

Table 6 .
AREs of the 16 members of the physical ensemble for the four types of storm events (%).
absolute values of MBE are used.Storm type 1 has the lowest MBEs of the 16 members, and the MBEs of storm types 3 and 4 are higher than storm type 2. The values of SD also show variations for a certain storm type in different members.As a whole, SD and RMSE have similar patterns for different types of storm events.From Figs.4 and 5, it can be easily determined that few values of the indices for members 13, 14, 15 and 16 are out of the range of the values for the other 12 mem- www.nat-hazards-earth-syst-sci.net/17/563/2017/Nat.Hazards Earth Syst.Sci., 17, 563-579, 2017 572 J. Tian et al.: Ensemble rainfall simulation for different type events

Table 7 .
Average index values of the 16 members of the physical ensemble for the simulations of the spatial rainfall distributions.

Table 8 .
Average index values of the 16 members of the physical ensemble for the simulations of the temporal rainfall patterns.the spatial and the temporal dimension is used.It is assumed that the observations from rain gauges are accurate and representative for the two study sites.However, it brings uncertainties to use point-based observations to evaluate grid-based simulations.More grid-based observational data should be involved to improve the reliability of evaluation, especially those from weather radar and remote sensing.