Development of super-ensemble techniques for ocean analyses: the Mediterranean Sea case

A super-ensemble methodology is proposed to improve the quality of short-term ocean analyses for sea surface temperature (SST) in the Mediterranean Sea. The methodology consists of a multiple linear regression technique applied to a multi-physics multi-model super-ensemble (MMSE) data set. This is a collection of different operational forecasting analyses together with ad hoc simulations, created by modifying selected numerical model parameterizations. A new linear regression algorithm based on empirical orthogonal function filtering techniques is shown to be efficient in preventing overfitting problems, although the best performance is achieved when a simple spatial filter is applied after the linear regression. Our results show that the MMSE methodology improves the ocean analysis SST estimates with respect to the best ensemble member (BEM) and that the performance is dependent on the selection of an unbiased operator and the length of training. The quality of the MMSE data set has the largest impact on the MMSE analysis root mean square error (RMSE) evaluated with respect to observed satellite SST. The MMSE analysis estimates are also affected by training period length, with the longest period leading to the smoothest estimates. Finally, lower RMSE analysis estimates result from the following: a 15-day training period, an overconfident MMSE data set (a subset with the higher-quality ensemble members) and the least-squares algorithm being filtered a posteriori.


Introduction
The limiting factors to short-term ocean forecasting predictability are the uncertainties in ocean initial conditions, atmospheric forcing (Pinardi et al., 2011), lateral boundary conditions tighter with numerical model representation and numerical inaccuracies.To assess and control these uncertainties, an ensemble approach can be used as shown, for example, by Kalnay and Ham (1989), where the simple ensemble mean is shown to have a smaller root mean square error (RMSE) than each contributing member.The assumption that different models may have complementary forecasting and analysis skills emerged from the pioneering work of Lorenz (1963), in which the notion of an ensemble forecast was first described, which was obtained by factorizing all the members' performances.Most common ensemble forecasts came from a single model running with a set of perturbed initial, lateral or vertical boundary conditions.Hence the implicit hypothesis is that forecast errors arise from inaccurate initial/boundary conditions, while the model is considered as being perfect.Accounting for the model error was the first step in multi-model ensemble forecasting.Feddersen et al. (1999) reported that the low ensemble spread is likely to be produced by correlated models; hence only a set of different models is expected to reduce the model systematic error.Shukla et al. (2000) proposed a combination of member predictions with similar forecast skills in order to further reduce Published by Copernicus Publications on behalf of the European Geosciences Union.
the posterior forecast error calculated from the multi-model ensemble.The optimal combination of several model outputs (each with its own strengths and weaknesses) that can sample the forecast uncertainty space is the underlying idea behind super-ensemble (SE) estimates.In this paper we used the heuristic SE concept of Krishnamurti et al. (1999) where different and independent model forecast members are merged using a multiple linear regression algorithm.This method led to a reduced forecast RMSE for the 850 hPa meridional wind and hurricane tracking in the Northern Hemisphere.This first work provided the basis for the successive development focused on the multi-model super-ensemble (MMSE) approach (Evans et al., 2000;Krishnamurti et al., 2000;Stensrud et al., 2000).While ensemble techniques are routinely used in operational weather forecasting (Toth and Kalnay, 1997;Stephenson and Doblas-Reyes, 2000), SE and MMSE approaches have mostly been applied in seasonal studies (Krishnamurti et al., 1999;Stefanova and Krishnamurti, 2002;Pavan and Doblas-Reyes, 2000;Kharin and Zwiers, 2002).Only preliminary work has been carried out in ocean forecasting.The work of Rixen et al. (2009) is a reference for temperature predictions; simulated ocean trajectory SE methods are described in Vandenbulcke et al. (2009); and Lenartz et al. (2010) present a SE technique with a Kalman filter to adjust three-dimensional model weights.Rixen et al. (2008) introduced the concept of a hyper-ensemble, which combines atmospheric and oceanographic model outputs.In this paper we develop a new MMSE method to estimate sea surface temperature (SST) as this is an important product of ocean analysis systems with multiple users.Accurate knowledge of SST is fundamental both for climate and meteorological forecasting; therefore increasing the capacity of SST analyses is crucial for the uptake of operational products.A MMSE data set is constructed to sample the major error sources for the SST forecast, and a new linear regression algorithm is developed and calibrated.A discussion of the current state of the art of SE techniques, to enhance the innovation methodology proposed in this paper, can be found in Sect. 2. After a description of the multi-model data set in Sect.3, a comprehensive explanation of the super-ensemble technique is reported in Sect. 4. Sensitivity studies on SE algorithm choices are proposed in Sect. 5. Finally, conclusions are drawn in Sect.6 2 Methods used in the literature The basic idea discussed in Krishnamurti's work is that each model can carry somewhat different representation of the foreseen processes, so an appropriate combination can reduce biases in space and time.In his work an unbiased linear combination of the available models, optimal (in the leastsquares sense) with respect to observations during a training period of a priori chosen length, reduces the RMSE for prediction on the south-north component of winds at 850 hPa (averaged over boundaries between 50 and 120 • E).Krishnamurti's SE could uptake both Asian monsoon precipitation simulations and hurricane track-intensity forecasts.In his approach all observations have equal importance, so Lenartz et al. (2010) applied this method for ocean wave forecasting, introducing a way to change the importance in the observation using data assimilation techniques (Kalman filter and particle filter) adapted to the super-ensemble paradigm.With this technique the regression weights change on a timescale corresponding to their natural characteristic time, discarding older information automatically, and rate of change is determined by the joint uncertainties of the weights, models and observations.Rixen et al. (2008) in very limited area could demonstrate that the SE methods outperforms the individual models on several error measures.Skill improvements can be found applying dynamic, non-Gaussian and regularized filters.

Multi-model multi-physics data set
The MMSE data set includes the collection of daily mean outputs from five operational analysis systems in the Mediterranean Sea and four outputs from the same operational forecasting model but with different physical parameterization choices.The study period lasts from 1 January to 31 December 2008.The main differences between the MMSE members are mainly due to the different numerical schemes used, the data assimilation scheme and the model physical parameterizations.Optimally interpolated satellite SST observations (OI-SST) (Marullo et al., 2007) are used as the truth estimator, and the model outputs are compared with the satellite OI-SST to assess their quality.The main characteristics of MMSE members are listed in Table 1, while a more detailed description of the originating analysis and forecasting systems can be found in Appendix A. Our aim is to estimate the most accurate daily SST for a 10-day analysis period that took place after a training period defined in the past.The resulting MMSE estimate is also called a posterior analysis.The similarities and differences between MMSE members and the OI-SST data set are quantified in terms of the anomaly correlation coefficient (ACC), RMSE and normalized standard deviation (SD).These statistical scores are listed in Table 2 for each MMSE member for the whole of 2008, with the seasonal cycle removed.Mercator-V0 best reproduces the SD of the observations, while INGV-SYS4a3 (Istituto Nazionale Geofisica e Vulcanologia) analysis has a higher ACC and lower RMSE.Thus hereafter INGV-SYS4a3 will be called the best ensemble member (BEM).

The truth estimator choice
So far, as discussed in Sect.2, the SE paradigm has been successfully applied only in small portion of the Mediterranean Sea (Ligurian Sea or Adriatic) usually concurrently with  (Pacanowski and Philander, 1981) Bilaplacian Bilaplacian 3DVAR (Dobricic and Pinardi, 2008) Mercator-V0 NEMO 1.09 (Madec, 2008) k-epsilon (Gaspar et al., 1990) Bilaplacian Isopycnal Laplacian SAM2 (Brasseur et al., 2005) Mercator-V1 NEMO 3.1 (Madec, 2008) k-epsilon (Gaspar et al., 1990) Bilaplacian Isopycnal Laplacian SAM2 + 3DVAR (Lellouche et al., 2013) HCMR Princeton Ocean Model (Mellor and Blumberg, 1985) k-l (Mellor and Yamada, 1982) Laplacian Bilaplacian SEEK filter (Pham et al., 1998 ocean cruises, where a very dense data set of observation can be used to calibrate the SE procedures, but no inference is drawn for the entire Mediterranean Sea.The SE approach can be tested in a regional model using a robust and reliable truth estimator that can cover the same degree of freedom of the system.Satellite data such as OI-SST or delayed-time SSH can be optimally interpolated to get 2-D maps.The alternative of remote-sensing information would be to compare models with in situ observations that are sparse in space and time.In general only a dozen Argo floats are drifting in and sending information from the sea.The mooring buoy are intermittent in time and affected by high representativeness error since coastal model output is still less reliable than in the open ocean.For all these reasons the OI-SST choice as truth became straightforward.It must be pointed out that none of the models employed has assimilated the OI-SST.There is only flux relaxation through SST nudging for INGV members and Mercator, while HCMR (Hellenic Centre for Marine Research) and Mercator assimilate the real-time GOS AVHRR (Global Ocean Satellite Advanced Very High Resolution Radiometer) SST (Casey et al., 2010).

Super-ensemble methodology
Our SE methodology is based on Krishnamurti et al. (1999).
Let us call S1 t the SE estimate of a model state variable and F i,t the model state at time "t" for the "ith" model.Let us define two different periods, the training and analysis periods, the former period preceding the target analysis period.A S1 t estimator is then defined as where F i and O are the time mean over the training period, as defined in Appendix B, a i are the regression coefficients, N is the number of SE members and M is the number of training period days.The regression is unbiased because the time mean of the data set is removed and only model field anomalies are used.The regression coefficients are computed as a classical multilinear ordinary least-squares problem.Let us define the covariances of the model ensemble members as and the covariance between observations and model anomalies as The regression coefficients are then written as (4) Yun et al. (2003) reported a skill improvement in the SE algorithm when the seasonal signal is removed prior to the regression procedure.The second SE method, called S2 t , uses the same regression algorithm of Eq. ( 4) but with the This means that not all the model members and the observations used in the estimation are really independent, and the matrix inversion in Eq. ( 4) is near to being singular.In order to reduce overfitting, several methods have been proposed.Following Boria et al. (2014), who used a spatial filter to reduce overfitting in ecological niche models, we developed a new method, called S3, which filters the S2 estimates with a simple spatial median filter with a radius of 15km.This value is also related to the first baroclinic Rossby radius of deformation in the Mediterranean Sea (e.g., Robinson et al., 1987;Pinardi and Masetti, 2000).
Here the variables O t F i,t indicate anomalies with respect to the seasonal and the training period.Decomposing the matrix into a horizontal EOF, called eof(x, y), and temporal amplitudes , we can write the least-squares solution of Eq. ( 4) computed for the amplitudes of the spatial EOFs.The O t and F i,t fields will be projected into the retained eof(x, y) to obtain The regression coefficients are now written for each eof component as follows: A new SE estimate, S4, is now defined as  The different statistical regression algorithms are summarized in Table 3.A flow chart (Fig. 1) is provided to help the reader in understanding the logical path that authors designed for the described procedures.

Classical SE method experiments
In order to find the minimum training period length possible, a simple experiment has been done using the observations as one of the ensemble members in the training period.This test can be considered as the maximum skill that could be achieved with a MMSE approach, and it is also a way to check the coefficient estimates.For a training period of 15 days, all the regression coefficients are 0 except for the weight related to the observational member, which is retrieved to be 1.Trimming the data set (removing members), we noticed that when the training period days (M) are less than the number of ensemble members involved (N), in our case 9 (Table 1), the algorithm fails, giving incorrect values for the coefficients.Hence the minimum training period units must be such that M > N .In our case any training period longer than 10 days will work well.However to add robustness to regression algorithm, we set 15 days as the minimum length of the learning period.On the basis of a 15-day training period, Fig. 2 shows the S1 and S2 posterior estimates for the first day of the test analysis period.S1 and S2 reconstructed SST fields are very noisy compared to observations, and both S1 and S2 are clearly worse than the BEM estimate.
The two estimates at the end of the test analysis period are shown in Fig. 3, where the overfitting problem is even more evident.The field noise can be reduced by lengthening the training period to 35 days, as shown in Figs. 4 and 5.Both S1 and S2 predictions show a reduction in the warm bias with respect to the BEM in the eastern Mediterranean (Figs. 2 and 3).However S1 does not show any improvement in terms of RMSE and ACC (data not shown).Even if we neglected the overfitting that affects SE prediction trained with a learning period longer than 35 days, we think long training is out of the scope of our research since we are focused on a potential operative approach In order to examine the effect of the specific MMSE members on the S1 estimation performance, we create three different MMSE data sets (Table 4).Data set A corresponds to an overconfident (Weigel et al., 2008) data set or correlated ensemble members.We consider data set B, which is well dispersed, to be the best and the worst ensemble member, together with other correlated members, i.e. with similar RMSE, SD and ACC (see Table 4), while data set C, which is poorly dispersed, considers worst members together with correlated members.To quantify the differences between the three data sets, a bias indicator d is estimated.This value corresponds to the domain averaged SST difference as follows: where the triangular parentheses indicate the domain average.If d is close to 0, this means a small data set bias, while a positive (negative) d means a positive (negative) SST bias.
Figure 6 shows the distributions of d for different MMSE data sets.The distributions are bimodal with two maxima, the first around 0 • C and the second around 0.05 • C, which means that the data set statistically tends to overestimate the spatial mean SST.From these distributions we can also observe that data set A has the smallest bias because the d = 0 • C peak is larger than the peak at 0.05   for data set C, which is constructed from badly dispersed model ensemble members, the estimate enhances the positive bias.This means that the algorithm is capable of neglecting the information from biased members.The small S1 bias remains the same till the 5th day of the analysis period (not shown), after which the performance of the algorithm starts to deteriorate.On the last day, there is a nearly flat histogram (Fig. 8).This means that unbiased analyses can be produced for up to 5 days with a 15-day training period, no matter which MMSE data set is used.With respect to RMSE, different MMSE data sets and training period lengths give different results already on day 1 of the analysis period, as shown in Table 5, both for S1 and S2 posterior analysis estimates.It is now evident that the overconfident data set and the longest training period (35 days) produce on average the lowest RMSE values during the analysis period.In conclusion, the Krishnamurti et al. (1999) method can be applied relatively successfully to the oceanic multi-model state estimation case, using at least a 14 (and up to 35)-day training period and with only a five-member ensemble data set if the quality of the chosen members is high.However, the S1 and S2 estimates are both affected by noise, and only a modification in the regression method will lead to a low-RMSE posterior-analysis noiseless estimate.We want to enhance that these performances are due to the usage of a multi-model multi-physics choice.Table 5. RMSE mean value throughout the analysis period for the full data set (see Table 1) and the three data sets of Table 4 as

New SE method experiments
In order to reduce the overfitting of the SE estimate, here we show the results of the S3 and S4 algorithms.Both proposed methodologies are used with the overconfident data set (data set A) and a 15-day training period.In S3, the 15 km value has been found by means of sensitivity studies done applying a circular filter at each point of the domain.Figure 10 shows RMSE according to the chosen filter radius length.
We see that with a short radius there is no influence of the filtering.With a radius longer than 15 km the fields became too smooth and there is degradation of performance.In S4, the number of retained EOFs is changed for each experiment, and this number is chosen in order to account for 99.5 % of the system variance.Figure 11 shows the number of retained EOFs as a function of seasons and for different training period lengths.The minimum number of retained EOFs is 46 with a 15-day training period, while the maximum number    is 164 obtained with 35 days.As expected, the number of EOFs retained increases when we extend the training period, with some variability during the year.Usually the minimum number of EOFs was found in the summer.The S3 and S4 posterior estimates are shown in Figs. 12 and 13 for the first day of the test analysis period.Both estimates are much smoother than the equivalent S1 and S2 estimates in Fig. 2. The S3 also seems to be less biased with respect to observations.A map of the differences between the truth and the SE estimates highlights the better performances of S3 compared to S4 for the whole test analysis period (not shown).Error statistics for the various methods were computed for all of 2008; we again produced a 10-day analysis from the overconfident training data set A every 4 days with a variable training period from 15 to 35 days.The RMSE is shown Fig. 14.The best SE method is given by S3, which has about half of the RMSE value of the BEM for the whole of 2008.This is due to the fact that filtering acts as a smoother by keeping the large-scale bias small, while EOFs do not control bias on a large scale.Following Murphy (1993) we evaluate the ACC in order to assess the "consistency" of the proposed       had a constant ACC irrespective of the chosen training period.However BEM was the most "consistent" member.This means that, although our SE can be used as a statistical tool, physical constraints are needed in order to have more consistent maps too.Nevertheless it should be highlighted that S1 and S2 are even less consistent compared to the worstcontributing member.Bias skills were also evaluated for the proposed methodologies; however, as expected due to their construction, all the SE estimates were unbiased (Fig. 16).
Thus no inference can be drawn from the bias skills.

Conclusions
We developed a multi-model multi-physics super-ensemble methodology to estimate the best SST from different oceanic analysis systems.Several regression algorithms were analysed for a test period and the whole of 2008.We examined different conditions when the MMSE estimate outperforms the BEM of the generating ensemble.The target was to obtain 10-day posterior analyses using a training period in the past for the regression algorithm and to generate the lowest bias and RMSE for the MMSE estimates.The results show that the ensemble size, quality and type of members, and the training period length are all-important elements of the MMSE methodology and require careful calibration.Almost 2000 posterior analyses were produced for 2008 with different training periods.The classical SE approach, as proposed by Krishnamurti et al. (1999), here called the S1 estimate, SST.An initial improvement to S1, named S2, is the subtraction of the seasonal signal in the ensemble members and the unbiased estimator.This leads to a strong reduction in RMSE (more than 20 %), but the resulting field is noisy compared to observations.This is the well-known overfitting problem of the technique described in Kharin and Zwiers (2002).The further modification of S2 using a simple spatial filter, named S3, can give lower RMSE values than the BEM for the entire 10-day analysis period.A new methodology, based on EOFs, named S4, also reduces the RMSE.However S3 outperforms S4 and could represent a practical technique for applications in operational oceanographic analyses for up to 10 days on the basis of the previous 15 days of analyses.One could wonder if the proposed combination of models can offer interesting skill below the sea surface as well.Unfortunately sparseness of subsurface observations make it very hard to envisage a horizontal EOF method that allow us to correctly take advantage of and spread the information coming from the observation itself.We must remark that this is only a starting point.MMSE techniques for ocean state estimation problems require further study before optimal methods can be found.
In this paper we show that with a rather limited but overconfident data set (with a low bias of the starting ensemble members) the RMSE analysis can be improved.This posterior value-added estimation could, for example, be used to produce a more accurate MMSE analysis data set.
Future developments could involve the addition of physical constraints during the regression, considering for example cross correlations with atmospheric forcing.MMSE should also be applied to the ocean forecast problem instead of the analysis problem.The difference for MMSE forecast estimates is that atmospheric forecast uncertainties are not contained in training period analyses, and the size of the ensemble members required could increase considerably, as well as the complexity of the estimation problem.

Appendix A: Data set description
The analysis systems that generated the ensemble members of the experiments used in this paper are briefly described below: -SYS3a2: system composed of the numerical code of OPA8.2 implemented in the Mediterranean Sea (Tonani et al., 2008) and 3DVAR assimilation scheme (Dobricic and Pinardi, 2008).
-SYS4a3 uses NEMO 2.3 (Oddo et al., 2009) (Pham et al., 1998), and the observation assimilated are in situ temperature and salinity profiles in the Coriolis database, together with SST and along-track SLA.Further details are described in Brasseur et al. (2005).
-Mercator-V1(PSY2V4R1): the numerical code is based on NEMO 3.1 version, and it is implemented in the North Atlantic and Mediterranean Sea with a horizontal resolution of 1 12 o and 50 vertical levels.The realtime system was initialized in October 2006 from a 3-D climatology of temperature and salinity (Levitus et al., 2005) providing analysis and forecast from 2010 to 2013.The code includes several improvements in the model configuration such as open-boundary condition (from the global system) and higher-frequency atmospheric forcing (each 3 h).The data assimilation scheme is similar to the previous version plus a bias correction based on the 3DVAR assimilation scheme (Lellouche et al., 2013) -HCMR: Hellenic Centre for Marine Research (Korres et al., 2009).The Mediterranean Sea model is based on the Princeton Ocean Model (POM) code, a primitive equations 3-D model using the Mellor-Yamada 2. The error covariance matrix is approximated with 60 EOF modes (correction directions), where the first 18 (the most dominant ones) are evolved with the model dynamics while the rest are kept invariant in time.The localization technique adopted for the Mediterranean Sea forecasting system is explained in Korres et al. (2010).The method localizes the covariance matrix by neglecting observations beyond a cut-off radius which is selected upon sensitivity studies to be equal to 200 km.
-NEMO multi-physics: this is the same as SYS4a3 NEMO 2.3 code without assimilation but with different model physical parameterizations.

Appendix B: Algorithm time averages and projection on EOFs
Here we show how the observed and model fields are decomposed into different temporal signals.Let us consider O(x, y, t) to be the daily OI-SST and F (x, y, t) one of the model members' daily mean SST.We can always decompose the signal into seasonal mean, training period mean and anomaly.Considering the two time average operators: f (x, y) s = 1 q q t=1 f (x, y, t)) (B1) where "q" is the number of days in the month of the year with a value evaluated over a long time series mean from 2001 to 2007, and N is the number of training days, we write O (x, y, t) = O (x, y) S + O (x, y) TR + O (x, y, t) , F (x, y, t) = F (x, y) S + F (x, y) TR + F (x, y, t) .(B3) The last term on the right of Eq. ( B3) is the anomaly term used in Eqs. ( 4) and ( 9).

Figure 1 .
Figure 1.Flow chart of methodologies developed in the paper.

Figure 2 .
Figure 2. MMSE estimates for the first day of the test period (25 April 2008) using a training period of 15 days, S1-SST (top panel, left) and the corresponding estimate for S2-SST (top panel, right), SST from satellite (bottom panel, left) and best ensemble member SST (bottom panel, right).

Figure 3 .
Figure 3. MMSE estimates for the last day of test period (4 May 2008), S1-SST (top panel, left) and the corresponding estimate for S2-SST (top panel, right), SST from satellite (bottom panel, left) and best ensemble member SST(bottom panel, right).

Figure 4 .
Figure 4. MMSE estimates for the first day of the test period (25 April 2008) using a training period of 35 days, S1-SST (left panel) and the corresponding estimate for S2-SST (right panel).

Figure 5 .
Figure 5. MMSE estimates with a 35-day training period and for the last day of the test period (4 May 2008), S1-SST (left panel) and the corresponding estimate for S2-SST(right panel).

Figure 7
Figure 7 presents the histogram of d for the S1 estimates on the first day of analysis for the whole of 2008 as a function of the full and subsampled MMSE data sets.It is evident that all the MMSE data sets give a S1 distribution peak around d = 0 • C with a strong reduction in the distribution width.This means that the algorithm is capable of neglecting the information from biased members.The small S1 bias remains the same till the 5th day of the analysis period (not shown), after which the performance of the algorithm starts to deteriorate.On the last day, there is a nearly flat histogram (Fig.8).This means that unbiased analyses can be produced for up to 5 days with a 15-day training period, no matter which MMSE data set is used.With respect to RMSE, different MMSE data sets and training period lengths give different results already on day 1 of the analysis period, as shown in Table5, both for S1 and S2 posterior analysis estimates.It is now evident that the overconfident data set and the longest training period (35 days) produce on average the lowest RMSE values during the analysis period.In conclusion, theKrishnamurti et al. (1999) method can be applied relatively successfully to the oceanic multi-model state estimation case, using at least a 14 (and up to 35)-day training period and with only a five-member ensemble data set if the quality of the chosen members is high.However, the S1 and S2 estimates are both affected by noise, and only a modification in the regression method will lead to a low-RMSE posterior-analysis noiseless estimate.We want to enhance that these performances are due to the usage of a multi-model multi-physics choice.
a function of the training period length and S1 and S2.1.59 1.07 0.86 0.79 0.68 Data set A 1.19 0.96 0.77 0.65 0.62 0.56 Data set B 1.21 0.98 0.81 0.67 0.64 0.57 Data set C 1.30 1.05 0.86 0.71 0.68 0.59 Here for the first time we can assess the SE prediction impact in terms of the data set composition.Our research activity begins studying the characteristic that a data set should fulfill in terms of the spread of the ensemble and the mean bias of each member.Only a multi-model multi-physics data set could satisfy all the requirements.We can prove this inference with a set of two subsamples: subsample D: multi-model (MM): INGV-SYS3a2, INGV-SYS4a3, Mercator-V0, Mercator-V1 and HCMR; subsample E: multi-model multi-physics (MM-MP): INGV-SYS3a2, INGV-SYS4a3, Mercator-V1, HCMR and INGV MP1.From Fig. 9 one can clearly see the improvements to SE brought about by substituting one member with a simulation with similar performances.

Figure 6 .
Figure 6.Distributions of d in Eq.(11) for the full data set (a), overconfident data set A (b), well-dispersed data set B (c) and badly dispersed data set C (d).The bin width is 0.05 • C. Area under the curve equals the total number of models per day in the year 2008.

Figure 7 .
Figure 7.The effect of multi-model composition in the distributions of d for the full data set (a), overconfident data set A (b), well-dispersed data set B (c) and badly dispersed data set C (d).The bin width is 0.05 • C. The effect of multi-model combination of proposed subsamples on the SE estimates valid for the 1st day of the test period with a training period of 14 days.

Figure 8 .
Figure 8.The effect of multi-model composition on the distributions of d for the full data set (a), overconfident data set A (b), well-dispersed data set B (c) and badly dispersed data set C (d).The bin width is 0.05 • C. The effect of multi-model combination of proposed subsample on the SE estimates valid for 10th (last) day of the test period with a training period of 14 days.

Figure 9 .
Figure 9. Domain average (over the Mediterranean) and time mean the over year 2008 of the RMSE for a 15-day training period for the overconfident data set A.

Figure 10 .
Figure 10.Domain average (over the Mediterranean) and time mean over the year 2008 of the RMSE of S3 estimates training the overconfident data set A for 15 days and with a different circular radius.

Figure 11 .
Figure 11.Number of retained EOFs histogram, on ordinates the length of the training period, colour bar proportioned to the day of the experiments.

Figure 12 .
Figure 12.S3 and S4 estimate for the first day of the test period (25 April 2008) using a training period of 15 days, SE3-SST (left panel, a) and the corresponding estimate for S4-SST (right panel, b).

Figure 13 .
Figure 13.S3 and S4 estimates valid the last day of the test period (4 May 2008) using a training period of 15 days, S3-SST (left panel, a) and the corresponding estimate for S4-SST (right panel, b).

Figure 14 .
Figure 14.Domain average (over the Mediterranean) and time mean during the year 2008 of SE prediction RMSE overconfident data set A. SE predictions trained for 15 days.Error bars stand for the standard deviation of the RMSE during the year.

Figure 15 .
Figure 15.Spatial average over the Mediterranean Sea and time mean during 2008 SE prediction ACC of overconfident data set A. SE predictions trained for 15 days.

Figure 16 .
Figure 16.Spatial average over the Mediterranean Sea and time mean during 2008 SE prediction bias of overconfident data set A. SE predictions trained for 15 days.Error bars stand for the standard deviation of the bias during the year.
5 turbulence closure scheme.The model has a bottomfollowing vertical sigma coordinate system, a free surface and a split-mode time step.Potential temperature, salinity, velocity and surface elevation are prognostic variables.The model has a horizontal resolution of 1 10 o and 25 sigma layers along the vertical with a logarithmic distribution near the surface and the bottom.The model includes parameterization of the main Mediterranean rivers, while the inflow/outflow at the Dardanelles is treated with open-boundary techniques.The Mediterranean model is forced with hourly surface fluxes of momentum, heat and water provided by the POSEIDON-ETA high-resolution ( 1 20 o ) regional atmospheric model(Papadopoulos and Katsafados, 2009) issuing forecasts for 5 days ahead.The assimilation system for the Mediterranean Sea hydrodynamic model is very similar to the one presented in the work ofKorres et al. (2010).It is based on the singular evolutive extended Kalman (SEEK) filter with covariance localization and partial evolution of the correction directions.

Table 1 .
Multi-physics multi-model SE members model and data assimilation characteristics: the column lists the most significant differences between the models in terms of code and model physical parameterizations.
Kharin and Zwiers (2002)period time mean.The definition of the new unbiased estimator is presented in Appendix B.Kharin and Zwiers (2002)suggested that the poor performance of MMSE algorithms is due to overfitting i.e. biased estimates of the regression coefficients.
www.nat-hazards-earth-syst-sci.net/16/1807/2016/Nat.Hazards Earth Syst.Sci., 16, 1807-1819, 2016 seasonal cycle subtracted Storch and Navarra (1995)ression coefficient technique defined by vonStorch and Navarra (1995)offers an alternative way of performing the regression, ensuring more uncorrelated variables.In our formulation it is suggested that the F i,t are decomposed into horizontal empirical orthogonal function (EOF) mode singular vectors of a data matrix which contains the training period model and observed fields.Thus we form a state variable vector that contains model and observation anomalies for the training period:

Table 3 .
Nomenclature and characteristics of the four MMSE algorithms used.

Table 4 .
MMSE data sets: members are detailed in Table1.Overconfident data set) (Well-dispersed data set) (Badly dispersed data set)

Table 6 .
ACC mean value throughout the analysis period for data set A (see Table1) as a function of the training period length for the proposed SE methodologies.