Verification of Pre-Monsoon Temperature Forecasts over India during 2016 with focus on Heat Wave Prediction

The operational medium-range weather forecasting based on Numerical Weather Prediction (NWP) models are complemented by the forecast products based on Ensemble Prediction Systems (EPS). This change has been recognized as an essentially useful tool for the medium range forecasting and is now finding its place in forecasting the extreme events. Here we investigate extreme events (heat waves) using a high-resolution NWP models and its ensemble models in union 10 with the classical statistical scores to serve the verification purposes. With the advent of climate change related studies in the recent past, the rising extreme events and their plausible socio-economic effects have encouraged the need for forecasting and verification of extremes. Applying the traditional verification scores and associated methods on both, the deterministic and the ensemble forecast, we attempted to examine performance of the ensemble based approach as compared to the traditional deterministic method. The results indicate towards an appreciable competence of the ensemble forecasting 15 detecting extreme events as compared to the deterministic forecast. Locations of the events are also better captured by the ensemble forecast. Further, it is found that the EPS smoothes down the unexpectedly soaring signals, which thereby reduce the false alarms and thus prove to be more reliable than the deterministic forecast.


Introduction
Reliable weather forecasting plays a pivotal role in our everyday activities. Over the years NWP systems have been 20 employed to serve the purpose. While the NWP models have demonstrated an improved forecasting capability in general, they still have a challenge in the accurate prediction of severe weather/extreme events. Severe weather events (thunderstorms, cloudburst, heatwavesheat waves, and coldwavescold waves, etc.) usually involve strong non-linear interactions, often between small scale features in the atmosphere (Legg and Mylne, 2004 ). For example, development of deep convection and thunderstorms in the tropics. These small-scale interactions are difficult to predict accurately (Meehl et 25 al., 2001) and a small deviation in these could lead to completely different results, as a result of the forecast evolution process (Lorenz, 1969). The inherent uncertainty in the weather and climate forecasts can be well handled by employing ensemble based forecasting (Buizza et al., 2005). The EPS (Mureau et al., 1993;Molteni et al., 1996;Toth and Kalnay, 1997) were first introduced in the 1990s in an effort to quantify the uncertainty caused by the synoptic scale baroclinic instabilities in the medium range weather forecasting (Legg and Mylne,2004). Ensemble forecasting has emerged as the 30 practical way of estimating the forecast uncertainty and making probabilistic forecasts. It is based on multiple perturbed initial conditions, ensemble approach samples the errors in the initial conditions to estimate the forecast uncertainty (spread in member forecasts). The skill of the ensemble forecast shows marked improvement over the deterministic forecast when comparing the ensemble mean to deterministic forecast after a short lead time The new EPS at the NCMRWF is now running for operational purposes. This global medium-range weather forecasting system has been adopted from the UK Met Office (Sarkar et al., 2016). Generally, the model and the ensemble forecast applications in addition to their verifications are used for prevalent events with a limited focus on the rare extreme weather 5 events. It would be for the first time that the EPS technique has been employed from this model output for the extreme events over India to study the heatwave events. The heatwave is considered if maximum temperature of a station reaches at least 40°C or more for Plains and at least 30°C or more for Hilly regions. Based on departure from normal, a station is declared to have heatwave conditionsheat wave conditions if departure from normal is 4.5°C to 6.4°C and severe heatwave if the departure from normal is >6.4°C. In terms of the actual maximum temperature, a station is under heatwave when actual 10 maximum temperature ≥ 45°C and severe heatwave when the maximum temperature is >47°C. There has been increasing interest in predicting such extremes, the heatwave and cold wave events in India due to the associated loss of life. An increasing number of extreme temperature events over India were documented in several recent studies (Alexander et al., 2005, Kothawale et al., 2010, Hartmann et al., 2013and Rohini et al., 2016)-Mehdi and Dehkale (2016 in a climatological study of heat/cold waves show that over the Indian sub-continent between 1969 and 2013 there were more frequent cold and 15 heatwave events over the Indo-Gangetic plains of India. In another study carried out for entire South Asia, Sheik et al., (2015) have reported that warm extremes have become more common and cold extremes less common.
The global temperatures have exhibited a warming trend of about 0.85°C due to anthropogenic activities between 1880 and 2012 (IPCC, 2013, Rohini et al., 2016. Similar trends were also observed in India with the annual air surface temperature rise during the 20 th century. This is evident from the detailed study presented in Kothawale et al (2010) based on the data 20 from 1901-2007. The study (Kothawale et al., 2010) shows that Indian mean maximum and minimum annual temperatures have significantly increased by 0.51, 0.71 and 0.27°C per 100 years respectively, during 1901-2007. However, an accelerated warming was observed during 1971-2007, mainly due to the last decade 1998-2007. The study (Kothawale et al., 2010) highlights that the mean temperature during the pre-monsoon season (March-May) shows an increasing trend of 0.42°C per 100 years. On the other hand, a recently reiterated IPCC report (2013) notified an "unequivocal" proof of the 25 increasing warming trend, globally which could be associated with the variations in the climate system. This indicates a need to comprehend the heatwave events on weather and climatic scales. This paper attempts to demonstrate the capability and strength of predicting such events using both ensemble and deterministic forecast. This research investigates the most recent heatwave events during the summer months March, April & May (MAM) 2016 in India. This investigation considers two case studies to demonstrate the strength and weaknesses of the EPS approach in predicting such extreme events. 30 With these factors in mind, we can say that temperature (Minimum and Maximum both), forms a vital component of weather and climatic studies which are becoming increasingly important and challenging. Reliable projections of such changes in our weather and climate are critical for adaption and mitigation planning by the agencies involved. The knowledge would 4 undoubtedly be useful for a layman and the society. Testing for the reliability of the NWP model results is efficiently done by the forecast verification methods. Forecast verification plays an important role in addressing two main questions: How good is a forecast? And how much confidence can we have in it?
Verification by employing statistical scores is a well-established method adopted in this study. However, not all score lead to the same conclusion. This is the challenging situation when one needs to decide how much confidence can be placed in a 5 model. Depending upon the statistical characteristics of the variable addressed, the score type is chosen and is employed for the verification. Not all scores are equally efficient in describing a variable. This fact offers a choice and challenge to adopt the most compatible score type. The set of verification scores used here are listed and briefly discussed in the next section.
In this paper, we investigate the utility of the ensemble prediction system over the deterministic forecast in studying extreme events like heatwavesheat waves. This forms the first documented study of the recent heatwave events over India which was 10 verified using the deterministic and the ensemble forecasts ensemble forecasts. This paper talks about what an EPS can and can't do. This also provides some important insights into the use of ensemble forecast over the deterministic forecast in predicting extreme events like a heatwave. However, this study is unable to encompass an entire discussion on the efficiency of the EPS in general as the work examines a narrow range of phenomena over a not so wider region.
The paper begins with a brief explanation of the observed temperature (Tmax & Tmin) data sets, model description and the 15 methodology used. It will then go on to the results' section which encompasses two case studies from the recent heatwave events in India, followed by the verification results and finally ending with the discussions and conclusions. Recently, IMD has developed a high resolution daily gridded temperature dataset at 0.5° x 0.5° resolution. Data processing procedure has been well documented (Srivastava et al., 2009). IMD has compiled, digitized, quality controlled and archived these data at the National Data Centre (NDC). Based on maximum data availability, some stations were subjected to quality control checks like rejecting values, greater than exceeding known extreme values, minimum temperature greater than maximum temperature, same temperature values for many consecutive days, etc. After these quality checks, 395 stations 25 were selected for further development of gridded data. IMD used measurements at these selected stations and interpolated the data into grids with the modified version of Shepard's angular distance weighting algorithm (Shepard,1968).In this study, we have used IMD's real-time daily gridded (Shepard, 1968;Piper and Stewart, 1996;New et al., 2000;Kiktev et al., 2003;

NCMRWF Unified Model (NCUM)
The Unified Model , operational at NCMRWF consists of an Observation processing system (OPS 30.1), four-dimensional variational data assimilation (VAR 30.1) and Unified Model (UM 8.5). This analysis system makes use of various conventional and satellite observations. The analysis produced by this data assimilation system is being used as initial condition for the daily operational high resolution (N768L70) global NCUM 10-day forecast since January 2016. The 5 horizontal resolution of NCUM system is 17 km and has 70 levels in the vertical extends from surface to 80 km height. The NCUM model forecast temperature (Tmax & Tmin) data have been interpolated to the 0.5 o x0.5 o resolution using bilinear interpolation method to match the resolution and grids of the observed data.

NCMRWF Ensemble Prediction System (NEPS) 10
NEPS is a global medium-range ensemble forecasting system adapted from the UK Met Office MOGREPS system (Bowler et. al. 2008). The configuration consists of four cycles of assimilation corresponding to 00Z, 06Z, 12Z 18Z and 10-day forecasts are made using the 00Z initial condition. The N400L70 forecast model consists of 800x600 grid points on the horizontal surface and has 70 vertical levels. Horizontal resolution of the model is approximately 33 km in the mid-latitudes.
The 10-day control forecast run starts with the operational NCUM (N768L70) analysis and 44 ensemble members start from 15 different perturbed initial conditions consistent with the uncertainty in initial conditions. The initial perturbations are generated using Ensemble Transform Kalman Filter (ETKF) method (Bishop et al., 2001). Uncertainty in the forecasting model is taken into account by making small random variations to the model and using a stochastic kinetic energy backscatter scheme, (Tennant et al., 2010). 20

Verification Metrics
There are several scores available for the categorical verification of ensemble forecasts. However, in the current study, we have used the POD, FAR, ETS, HK, and SEDI. A brief description of these scores is presented here.

POD Score or the Hit Rate (H):
POD tries to answer the question, "What fraction of the observed "yes" events were correctly forecasted?" It is very much sensitive to hits, but ignores false alarms and very sensitive to the climatologically 25 frequency of the event. It is good for rare events and can be artificially improved by issuing more "yes" forecasts to increase the number of hits. Its value varies from 0 to 1, for perfectly forecasted events POD=1. occurrences are well suited for the verification analysis using POD, FAR, Heidke skill score, equitable threat score, and H-K Statistics. However, in order to take advantage of these scores, for our continuous variable, temperature (Maximum and Minimum), we categorize it using the temperature ranges, 30-32, 32-34, 34-36, 36-38, 38-40, and 40-42 °C. ETS: It is also known as the Gilbert skill score describe how well the forecasted "yes" events agree with the observed "Yes" events and thus exploring the hits by chance. This score ranges between -1/3 to 1. '0' shows no skill and 1 denotes the perfect 15 skill. The score expresses the fraction of observed or the forecasted events projected accurately. Where total alarms false hits misses hits hits random ) )( ( + -= 20 SEDI: It expresses the association between a forecast and the observed rare events. It ranges between -1 and 1 where the perfect score is 1. This score converges to (2X -1) as the event frequency advance towards 0, where "X" denotes the variable that specifies the hit rate's convergence to 0 for the rarer events. SEDI is not influenced by the base rate SEDI score approaches 1.

Results and Discussions:
Traditionally, the performance of a forecast model is determined by a variety of statistical measures and scores which offer an effective way to quantify a model's capability. Before moving over to such methods, we begin with looking at the 5 ensemble based and deterministic forecasts (on a daily basis) over a period of three hot summer months in India, March, April and May, and also compare it with the observations. The models are running operationally and are providing the forecasts out to 10 days every day. The verification is confined to MAM 2016, over six different temperature thresholds. For From the spatial map Figure 2, the frequency of the observed maximum temperature Tmax ≥ 40°C over the Maharashtra 15 and adjoining regions show maximum (more than 70 counts) over the entire period of MAM 2016, which is picked up by both deterministic and ensemble forecasts. However, the deterministic forecast is showing more frequency spread over MP, UP and Bihar, Odisha, AP and adjoining states from day-1 to day-9. As forecast lead time increases from day-1 to day-9 the heatwave frequency increases from central India to the north and east India. Consequently, a higher number of heatwave extremes was predicted by NCUM over east UP, Bihar, West-Bengal, Odisha, Jharkhand, Chhattisgarh, and AP. On the other 20 hand, NEPS (Figure3) prediction for the day -1 to day -9 is much subdued than in the NCUM forecasts. However, both models, NCUM, and NEPS are, predicting more frequently the heatwavesheat waves over the above-said regions.
Comparatively, the ensemble-based model NEPS is performing better (spatially) for the extremes of heatwave events than the NCUM over most of the Indian states up to day-9.

Casualties reported during MAM-2016
Prevailing heatwave over India took a toll of more than 500 loss of lives. Heatwave

Synoptic features associated with HeatwavesHeat waves during 2016
The panels in Figure

Case-II HeatwavesHeat waves on 21 st May 2016
The severe heatwave conditionsheat wave conditions developed and intensified over parts of northwest India entire third 15 week Interestingly, the forecast score does not fade away with the lead time contrary to the expectation. This depicts that the NEPS performs better and its prediction skill remains quasi-constant throughout the lead time of 10 days (Figure 9). Similar observations can be made from the ETS plots ( Figure 10).The most obvious finding to emerge from the box and whiskers plots of the ETS scores is the better performance of the ensemble based forecast (NEPS) than that of the deterministic forecast (NCUM). This result is consistent with the earlier documented findings. At all the Tmax thresholds 30 (between 30 and 42°C), NEPS mean stands above the NCUM mean. The same observation holds during all the illustrated forecasts (Day1, 3, 5, 7, and 9). The scores falling under the 25% in the case of the ensemble based forecast are either similar or lie little above the deterministic forecast unlike the values underlying 75% which in the NEPS case are markedly higher than that of the NCUM's.
This finding raises an intriguing question regarding the difference in the characteristic distribution of both NEPS and NCUM forecasts. This result also advocates better performance of the ensemble based forecast over the deterministic forecast.
Importantly, the ensemble-based forecast predicts low false alarm than its counterpart, NCUM, especially in the high-5 temperature range. In the low-temperature range, between 30 and 32, NEPS has low FAR score (where 0 denotes the perfect score) for Day-1 and Day-3 forecast. Similarly, a comparatively higher score on Day-5, 9 and Day-7 respectively ( Figure   11). POD: Probability of detection of ensemble based forecast is higher than the deterministic forecast during all the lead times and at all the temperature thresholds except for the Day-1 forecast score for the NEPS in the range between 40-42°C where 10 NCUM shows better performance (Figure 12.) SEDI: At higher temperature ranges, representing rare events, the performance of NEPS and NCUM can be clearly seen from the SEDI score plot (Figure 13). We can notice a considerable difference between the performance of the two techniques for extreme events lying between 40 and 42 C, on all the days.
Apparently, NEPS demonstrates higher skill than that of NCUM in predicting the heatwave events. The heatwave event 15 prediction skill is best seen on the Day-5 forecast with NEPS's SEDI score encompassing the score value of 0.7. Monthly scores are listed in table 3.
A consistent result attained from the NEPS and NCUM verification demonstrates the better skill of the ensemble forecasts compared to the deterministic forecast for the considered cases.

Summary and Conclusions: 20
Unless the atmosphere is in a highly predictable state, we should not expect an ensemble to forecast extreme events with a high probability (Legg and Mylne, 2004). This is due to the small scale non-linear interactions involved in a model (NWP).
One of the several predicted interactions could be climatologically extreme and are hence more difficult to predict. A small variation in the intensity, timing, and position of such anomalies could lead to a large difference in their prediction growth in time. Thus, despite the efficiency of the EPS over the deterministic forecast in detecting extreme events, we should be 25 extremely careful in declaring it locally as explained above. The ensemble mean is relatively better in predicting the extremes of heat-wave events than the deterministic forecast overall Indian states up to day-9.
1) The ensemble forecast provides appreciable forecasts on all the days and is most reliable after the Day-2 forecast. This characteristic is more pronounced for extreme events than for the less extreme events where the ensemble forecast after Day-2 is less reliable as can be seen from the FAR and POD scores at the lower thresholds. This suggests that the 30 performance of EPS on different thresholds is different that is, if it performs well at higher thresholds, it does not necessarily mean that it would perform equally well at the lower thresholds too. Thus, we need to understand the model performance at all the concerned ranges and based upon those verification results, employ the ensemble forecast accordingly for operational purposes.
2) Our forecasts were obtained for the current summer season in India, MAM and since the severe events are rare in nature it limits the sample size for the ensemble forecast and thus poses a challenge for the efficient forecasting verification.
Despite the caveats involved, the ensemble forecast has shown to predict the heatwavesheat waves several days ahead of the event, as discussed in the results. The severe heatwavesheat waves (>40°C) can reliably be predicted for Day-2 onwards with less false alarms as compared to the deterministic forecast as observed here. This could be explained by 5 the inherent smoothing characteristic of the ensemble based prediction contrary to the deterministic one which in our case shows warm bias.
3) Comparatively, low efficiency of the ensemble based prediction on a shorter time scales (< Day-2) propose that the ensemble prediction may need a longer duration of time for the perturbation growth. This observation would prove to be an important aspect to consider for the future evolution of the ensemble based forecasting. If this hypothesis is true, for 10 the short-range forecasts, the ensemble-based prediction could fall at the back of other methods like moist SV's optimization (Coutinho et al., 2004), the ETKF (12, 13). However, over a medium range forecast and for the extreme events like heatwavesheat waves, the ensemble-based approach proves to be one of the most economic and effective tools.
For the present study, the data from the two models is available only from 2016. Ensemble based forecasts in realtime using 15 the NEPS started in November 2015 at NCMRWF. For a robust and conclusive result, it is necessary that the study is based on the higher number of cases. This will be carried out in future.
The temperature data from the station's distribution are discussed in this paper which is used to obtain the gridded Tmax and Tmin data. It is indeed likely that some of the station extremes are smoothed out in the gridded data. It should also be noted that the station's data network is sparse 395 and often there are missing values. Gridded data field provides a continuous and 20 gap-free data to work with.
Extreme events like heat wavesheat waves are rare in nature and here we provided a general view of the two particular heat wave events (11 April & 21 May). From our experience as well as the forecast for the post heat wave event days, we can state that the skill of predicting an event with the initial conditions of no indication of severity is comparatively lower than when the signature is present in the initial conditions. Even before the event, there is some signature of it as can be seen in 25 the figure (5, 6, 7 & 8). The overall prediction of warm conditions is nicely predicted but at closer lead times, the events are better predicted. Same can be seen in the box and whisker plots for ETS (and rest of the score plots as well). For instance, the skill of NEPS does not fall drastically from Day-2 to Day-7 and thus depicts a reasonable skill. So, overall the NEPS specifically, has a good skill in predicting the extreme event and is relatively robust. manuscript. Thanks are due for anonymous reviewers for their comments and suggestions which have helped in revising the manuscript.