Evaluating flood potential with GRACE in the United States

Reager and Famiglietti (2009) proposed an index, Reager’s Flood Potential Index (RFPI), for early largescale flood risk monitoring using the Terrestrial Water Storage Anomaly (TWSA) product derived from the Gravity Recovery and Climate Experiment (GRACE). We evaluated the efficacy of the RFPI for flood risk assessment over the continental USA using multi-year flood observation data from 2003 to 2012 by the US Geological Survey and Dartmouth Flood Observatory. In general, we found a good agreement between the RFPI flood risks and the observed floods on regional and even local scales. RFPI demonstrated skill in predicting the large-area, long-duration floods, especially during the summer season.


Introduction
Among the natural disasters, floods rank the first in terms of the total number of people affected and monetary losses (Center for Research on the Epidemiology of Disasters, 2013).Intensive precipitation events have increased during the final decades of the 20th century (Groisman et al., 2005;Alexander et al., 2006;Trenberth, 2011) and are expected to further intensify in the future (Groisman, 2012).In response, many countries have developed flood alert systems, such as the European Flood Alert System (Bartholmes et al., 2009) and the US National Weather Service Automated Flood Warning System (Scawthorn, 1999).While most of the systems rely on a dense network of gauging stations, over 95 % of all deaths and a significant portion of the economic losses caused by floods occur in developing countries where ground flood monitoring and management programs are still inefficient, and the costs of building control infrastructure such as dams, weirs, embankments and gauging stations can be prohibitive (Tariq, 2011).These problems were demonstrated well during the 2010 flood disaster in Pakistan (Larkin, 2010), where the deficiencies in flood monitoring and the ensuing lack of information led to coordination chaos (Hagen, 2011) and contributed towards an estimated USD 35 billion loss in economic impact.
To compensate for or complement the ground based observations, flood monitoring has increasingly relied on the products obtained with space-borne sensors such as NASA's AMSR-E (Hossain and Anagnostou, 2004), Quick Scatterometer (QuickSCAT) (Brakenridge et al., 2003), Spinning Enhanced Visible and InfraRed Imager (SEVIRI) (Proud et al., 2011) and Moderate Resolution Imaging Spectroradiometer (MODIS) (Brakenridge and Anderson, 2006).Among the remote sensing products that have been used for flood monitoring, data from the Gravity Recovery and Climate Experiment (GRACE) (Adam, 2002;Chen et al., 2004) are unique in that the changes in the amount of terrestrial water can be directly measured.Reager and Famiglietti (2009) proposed the Flood Potential Index (RFPI) to estimate flood risks worldwide based on GRACE Total Water Storage Anomaly (TWSA) and precipitation record.A qualitative comparison of RFPI with a record of observed floods from the Dartmouth Flood Observatory (DFO) data set suggested that the proposed RFPI product is useful for flood risk assessment in most regions (Reager and Famiglietti, 2009), yet no quantitative validation was reported.This leads to the main objective of our study: evaluate the skill of RFPI for flood forecasting over the continental USA, where floods are routinely monitored.

Flood Potential Index
Following the methodology proposed by Reager and Famiglietti (2009), we computed monthly 2003-2012 RFPI for the continental USA using the GRACE TWSA product.For each grid, the maximum water storage capacity of the soil S MAX , i.e., the amount of water the soil can hold (Reager and Famiglietti, 2009), was estimated as the maximum of GRACE water storage anomaly from 2003 to 2012.Storage deficit (S DEF ) represents the amount of water which storage can accommodate before achieving S MAX .The storage deficit was calculated for each cell for each month: where S(t − 1) represents the saturation condition of soil from the previous month.S DEF tells how much additional water a particular area can hold before reaching the maximum capacity and is calculated using the data from the previous month, establishing a potential for forecasting.Examples of S, S MAX and S DEF estimated for an arbitrary grid cell (52.5 • N, 117.5 • W) are shown in Fig. 1.Normally, S DEF is low/high during the months with high/low precipitation.Following Reager and Famiglietti (2009), flood potential (F ) for the month t was calculated as where P MON (t) is monthly precipitation.The flood potential can be interpreted as the amount of water in excess of the potential water storage.A combination of low S DEF and high precipitation for the previous month would indicate a high probability of flooding in the current month.Further, RFPI is computed by normalizing the flood potential: where the maximum of flood potential max[F (t)] is computed for each cell of the grid.The values of RFPI vary from −∞ to 1, with positive values indicating that water input from precipitation is above the mean water storage and should be interpreted as a potential risk for flooding.For validation of the RFPI skill we converted the index values to dichotomous events, where all positive values represent flood potential and all negative values represent absence of the risk.The computed hindcast was validated against the USGS and DFO flood occurrence data, rasterized to a 1 • × 1 • grid of geographical latitude and longitude.
Monthly data from January 2003 to August 2012 of GRACE RL05 TWSA product (Adam, 2002) from the CSR processing center (http://grace.jpl.nasa.gov)and CPC Merged Analysis of Precipitation (CMAP) (Xie and Arkin, 1997) were used to compute RFPI.Both data sets are gridded at 1 • × 1 • .The scaling grid recommended by GRACE Tellus data portal (Swenson, 2012) was applied to the GRACE data to account for the attenuation of small-scale surface mass variations (Velicogna and Wahr, 2006).

Flood observation data
For validation, flood events reported in two observational flood data sets were used: (1) DFO (Brakenridge and Anderson, 2006) and (2) the US Geological Survey (USGS) Retrieve Summary of Recent Flood and High Flow Conditions (Hirsch and Costa, 2004).The two data sets differ substantially in that the DFO is derived from news and governmental sources and hence mainly refers to large floods in denserpopulated regions, whereas the USGS reports are based on in situ stream gauges.In addition, the DFO data started in 1985 but the USGS data are available only since October 2007.DFO classifies a large flood event in cases of significant damage to structures or agriculture, human life loss and/or long duration.The DFO data were downloaded as a GIS vector data set providing an outline of the area affected by a flood with such attributes as flood dates, duration, fatalities and primary country of flooding.The data were further screened for quality control.For example, in several instances in 2006 and 2009 a mismatch was found between the assigned flood's geographical coordinates and the primary country of flooding; these events were excluded from our analysis.Finally, vector maps of DFO flood events were rasterized to 1 • × 1 • grids.Note that since DFO data are mainly based on media reports, it is expected to bias towards the more densely populated regions and/or regions of interest.
The USGS deploys 9044 gauging stations in the continental USA (Fig. 2) for flood monitoring.Each station reports a flood as a flow overtopping the natural or artificial banks on a daily basis.A flood is further categorized into minor, moderate or major, with number of days in a month in each flood category also reported.Because significant difference exists in spatial scale between GRACE RFPI data and USGS gauge-based flood reports, the USGS data from the individual gauge stations were generalized on a 1 • × 1 • grid.First, to ensure statistical significance, all grid cells containing less than five USGS gauging stations were excluded from the analysis (Fig. 2).For those grid cells with more than five stream gauging stations, gauge reports from all the stations within the cell were combined into a monthly flood coefficient X: where N represents the total number of stations within a cell; D mi , D mo and D ma are total numbers of days when the stations within a cell that reported minor, moderate or major floods, respectively.Note that Eq. ( 4) accounts for flood duration, geographical extent and flood stage.Analyzing several events from the DFO database and the corresponding X coefficient estimated from Eq. ( 4), we found that areas with cells flagged as flooded with X greater than 0.5 agreed well with the DFO flood report (Fig. 3).To ensure compatibility between the DFO and USGS generalized flood data, we tested multiple critical values for X and found that using X = 0.5 as an indicator for large flooding minimizes disagreement between the DFO and USGS flooded area observations.Note that the critical value X = 0.5 could mean that 50 % of the gauges reported minor flood for 1 day in a given month, or 10 % of the gauges reported moderate flood for 1 day or 1 % of stations reported major flood for 1 day.

Forecasting skill assessment
Forecasting skill is an overall measure of how well the previous forecasts were associated with previous observations (Murphy and Winkler;1997).A receiver operating characteristic (ROC) (Fawcett, 2006) is commonly used as a method The TPR, also called the hit rate or the probability of detection, is a relative number of times an event was predicted when it actually occurred; the FPR, sometimes referred to as the false alarm rate, gives a relative number of times the event was predicted when it did not occur.for testing the performance of a continuous index (such as RFPI) against binary observational data (e.g., flood or no flood).It uses a binary classifier that maps the index values below and above a certain threshold τ to the occurrence of an event.Since the exact RFPI threshold value is unknown a priori, the ROC analysis is performed for a range of possible RFPI threshold values.For each threshold, a pair of true positive rate (TPR) and false positive rate (FPR) was generated by constructing a contingency table (Table 1).A ROC curve plots TPR vs. FPR for different thresholds (Fig. 4) while a 1 : 1 line represents random guess.AUC (area under curve) is the area that resides beneath the ROC curve.Since the 1 : 1 line corresponds to a random guess, AUC = 0.5 relates to no skill and AUC > 0.5 relates to better than random skill.Morrison (2005) suggested AUC > 0.7 indicating a strong predictive skill; in practice, the 0.6, 0.7, 0.8 and 0.9 AUC values are frequently used as the thresholds for fair, satisfactory, good and excellent predictive skill.On the ROC plot, the optimal RFPI threshold value τ corresponds to a location on the ROC curve that is the closest to the (0; 1) point (Fig. 4).

Results
A satisfactory to good agreement was found between the RFPI and the observed floods from both DFO and filtered USGS data (i.e., X > 0.5) for the continental USA; AUC = 0.75 for the RFPI vs. DFO 2007-2012 flood observations and AUC = 0.72 for the RFPI vs. USGS 2003-2012 flood observations (Fig. 5).The slightly better skill in the RFPI vs. DFO comparison is probably due to the bias in DFO flood observations towards high-damage and largescale floods.The optimal RFPI threshold values are τ = −0.4 for the USGS and τ = −0.3 for the USGS comparison.
The validation against the USGS data has also demonstrated ability of the RFPI to estimate flood risks at a watershed level in large flat areas (Fig. 6), e.g., the Great Plains region, with AUC consistently exceeding the 0.7 satisfactory predictive skill level.However, we found that over the mountainous and coastal regions the RFPI has a limited ability for flood monitoring (Fig. 6).The resulting ROC curves have different shapes and the optimal RFPI threshold varies between −0.4 and 0.1 for different watersheds.The RFPI skill also varies with the seasons (Fig. 7).For example, in the larger Mississippi watershed (consisting of Upper and Lower Mississippi, Missouri, Ohio, Tennessee and Arkansas-White-Red watersheds; see Fig. 6), AUC is 0.67 in the winter period and 0.78 in the summer.
As a case study, we now examine a wide spread flood that occurred in the northeastern USA due to a series of heavy rain events during March and April 2007 (Fig. 8).Precipitation in the region had dropped steadily during the winter of 2006-2007.Surprisingly, the soil moisture deficit (S DEF ) had also decreased during the same period, probably due to melting of the accumulated snow (Fig. 8c).The sudden increase in precipitation in March triggered RFPI (blue squares in Fig. 8a) showing positive values, indicating that the amount of precipitation had exceeded the storage capacity and the region is at risk of potential flooding.The continual increase in precipitation during April caused regional flooding (green polygon in Fig. 8a).The area showing positive RFPI values predicted in March agrees well with the actual flood extent as reported by DFO for the month of April.Also, notice that RFPI estimated in April indicates a much more extended area subject to potential flooding (Fig. 8b).Had the heavy precipitation continued in May, the flooding would have been much more damaging and would have affected a much wider area than what has been reported by DFO.

Discussion
We found that the RFPI has a satisfactory or good predictive skill for flood monitoring (AUC ∼ 0.6-0.7), at both continental and watershed scales in the USA.It works particularly well for large river basins, such as the Mississippi River basin, that are located over flat areas.Its predictive skill is significantly higher in summer period, when floods are mainly caused by heavy rainfall.During the winter season in colder region, RFPI skill is relatively low as this index is sensitive to precipitation, while winter floods are primarily caused by ice jams and snowmelt.This, in turn, affects RFPI skill during spring months.Modification of the method to include snowmelt is likely to improve its predictive skill.While we have tested the RFPI in the USA where a dense network of flood gauges has been established, potentially greater use of this method is in developing countries, where due to inadequate monitoring capability, floods tend to cause significant damage and the most loss of life.Also, floods in developing countries, as found through the DFO database, are mainly caused by heavy rainfall events, for which the RFPI seems to perform well in predicting flood potential.Therefore, to further evaluate its applicability, we examined the Juba-Shabelle river basin, a 783 000 km 2 watershed shared between Somalia and Ethiopia.Similar to Fig. 8 we examined a flood caused by a heavy precipitation event over the basin in October 2006, which ranked as the most damaging flood in Eastern Africa in 50 years.In Ethiopia, over 150 people died and over 122 500 were displaced; in Somalia, over 80 people died and over 299 000 were displaced (DFO database).We found increasing RFPI in the Juba-Shabelle watershed 1 month prior to the flood (Fig. 9a) and during the month of flood (Fig. 9b), both predictions agreeing well with the actual flood extent area reported by DFO.The time series of the water storage deficit generated over the watershed (Fig. 9c) shows a significant decrease of nearly 3 cm in the available water storage capacity in September, 1 month before the damaging flood.Based on this preliminary analysis, we speculate that the developing countries with sparse or inadequate flood monitoring networks are potential beneficiaries of this approach.
The GRACE-based RFPI has limitations.Coarse temporal (month) and spatial (ca. 100 km) resolutions of GRACE data make the index unsuitable for forecasting local highintensity events such as flash floods.Nevertheless, it has a unique ability to monitor water storage within a region and, if combined with precipitation forecasting, could further increase warning lead time from 1 month to probably 1.5-2 months.For comparison, a more advanced flood warning system, such as EFAS, can generate probabilistic flood alerts with a lead time of up to 10 days (https://www.efas.eu/user-information.html).Another improvement could be to combine the RFPI with higher spatial and temporal resolution remotely sensed data, such as MODIS products used in the Center for Research on the Epidemiology of Disasters (http://www.cred.be/)flood monitoring system.

Figure 1 .
Figure 1.Variations of total water storage anomaly (TWSA, green line), monthly precipitation (P MON , blue line) and water storage deficit (S DEF , yellow line) at the grid cell (52.5 • N, 117.5 • W) during the study period.

Figure 3 .
Figure 2. The distribution of the USGS stream gauging stations (green dots).The blue squares indicate those 1 • × 1 • grids containing less than five stations and excluded from the study.

Figure 4 .Figure 5 .
Figure 4.An example showing the ROC curve estimated for the Tennessee watershed using different RFPI thresholds.The optimal value of the classifier threshold τ for this watershed is 0.1, corresponding to the point on the ROC curve that is the closest to the (0; 1) point.The dashed line represents random guess, which has an area under the curve (AUC) of 0.5, whereas a predictive index such as RFPI has an AUC > 0.5.

Figure 6 .Figure 7 .
Figure 6.The RFPI predictive skills are evaluated by comparing with USGS-reported floods using ROC curves and AUC values for each of the major watersheds.The colors of the ROC curve match the colors of the delineated watersheds.The Rio Grande and California watersheds (in white) were excluded due to low number of floods.The watersheds that have RFPI AUC values less than 0.7 are in grey color (Lower Colorado, Texas Gulf, Great Basin, Great Lakes, mid-Atlantic and Pacific Northwest) and not shown in the comparison.

Figure 8 .
Figure 8.The 2007 flood in the northeastern USA.(a) grid cells with positive RFPI values in March, 1 month before the flooding; (b) grid cells with positive RFPI values in April, the flooding month; (c) the average storage deficit (blue line) and precipitation (yellow line) over the northeastern USA from October 2006 to September 2007.In (a) and (b), the DFO-reported flood area is shown as green polygon.

Figure 9 .
Figure 9.The 2006 flood in the Juba-Shabelle river basin.(a) Grid cells with positive RFPI values in September, 1 month before the flooding; (b) grid cells with positive RFPI values in October, the flooding month; (c) the average storage deficit (blue line) and precipitation (yellow line) over the Juba-Shabelle river basin from January 2006 to December 2006.In (a) and (b), the DFO-reported flood area is shown as green polygon.

Table 1 .
Schematic contingency table for categorical forecasts of a binary event.