The use of genetic programming to develop a predictor of swash excursion on sandy beaches

We use genetic programming (GP), a type of machine learning (ML) approach, to predict the total and infragravity swash excursion using previously published data sets that have been used extensively in swash prediction studies. Three previously published works with a range of new conditions are added to this data set to extend the range of measured swash conditions. Using this newly compiled data set we demonstrate that a ML approach can reduce the prediction errors compared to well-established parameterizations and therefore it may improve coastal hazards assessment (e.g. coastal inundation). Predictors obtained using GP can also be physically sound and replicate the functionality and dependencies of previous published formulas. Overall, we show that ML techniques are capable of both improving predictability (compared to classical regression approaches) and providing physical insight into coastal processes.


Introduction
Wave runup, is the final expression of waves travelling from deep to shallow water and is directly associated to coastal 20 hazards like flooding or erosion. Wave runup height can be defined from water level elevation time series at the shoreline (t) as the sum of two distinguished components: the wave set up (the temporal mean of the time series 〈 〉 relative to the still water level) and the swash (t) (the vertical fluctuation of the water level around the wave set up). Understanding and predicting swash characteristics is critical for researchers seeking to understand the dynamics of fluid motions and sediment transport in the swash zone (e.g., Elfrink and Baldock, 2002;Masselink and Puleo, 2006), and for managers and practitioners 25 addressing hazard setbacks, risk and coastal vulnerability (e.g., Bosom and Jimenez, 2011;Vousdoukas et al., 2012). Wave runup, (and therefore swash excursion) is a key component to evaluate inundation hazards and vulnerability to storm impacts (e.g., Bosom and Jiménez, 2011;Stockdon et al, 2007;Serafin et al., 2017). Stockdon et al., (2007) found that the wave action counted for about 48 % of the maximum total water level during two hurricanes along USA coast. The problem of accurate predictions of wave runup and swash on sandy beaches has been a research topic for over 50 years but today we still 30 struggle to provide reliable quantitative predictions.
The first predictors of these phenomena were developed in the context of coastal structures (Miche, 1951;Hunt, 1959) and the formulas proposed, usually developed for steep slopes and under the assumption that runup motions reflect the standing component of the incident wave field. Overall these formulas suggested a dependence of the uprush elevation on wave steepness and structure slope. A variety of predictors have since then been developed for vertical runup ( ) and swash 35 (S) on sandy beaches (e.g. Guza and Thornton, 1982;Holman and Sallenger 1985;Holman 1986;Ruessink et al., 1998, Stockdon et al., 2006, with details of the parameterizations depending on different combinations of deep water significant wave height ( ), deep water wave length ( ) and beach slope ( ). Guza and Thorton (1982) proposed a linear relationship between the significant runup ( and : , (1)  40 where . Guza and Thornton (1982) also first distinguished between infragravity and incident swash components, indicating that the swash component related to low frequencies (infragravity, ) depends only on significant wave height (therefore excluding the beach slope) while the incident component can saturate as a result of the dissipative processes occurring in the surf zone. Their findings were later confirmed by several other studies although different dependencies on environmental parameters were suggested (e.g. Holman and Sallenger, 1985;Ruessink et al., 1998). 45 Holman and Sallenger (1985) studying an intermediate to reflective beach (Duck, North Carolina, USA) described as: , ( where is a constant, / is the wave steepness and √ ⁄ ( 3)  50 where is the foreshore beach slope and is the surf similarity index which is also often used for beach classificationbeaches are classified as dissipative for values of <0.23, reflective for >1 and intermediate between the two (Short, 1999). Stockdon et al., (2006) used 10 experiments from different locations to generate new parameterizations of wave runup on natural beaches. The 2% exceedance value of wave run up was defined as: 55 where 〈 〉 is the maximum setup elevation and is the total swash defined as: where and are the incident and infragravity components of swash. Stockdon et al. (2006) used regression techniques to obtain relationships for and : 60 and √ . Stockdon et al., (2006) is the most commonly used empirical parameterization of runup but, as can be noted comparing eq. 6 and 7, the beach slope is missing from the predictor of the infragravity component of swash. The 65 dependency (or not) of on beach slope is a topic that has been debated but not solved and some authors (e.g., Ruessink et al., 1998) have indicated that infragravity swash is independent from the beach slope while a weak dependence on beach slope has instead been reported by others (e.g., Ruggiero et al., 2004). Cohn and Ruggiero, (2016) suggested a bathymetric control of the infragravity swash component through 1D and 2D numerical simulations performed using Xbeach (where incident swash contribution is excluded) and compared them with previous formulas (Ruggiero et al., 2001;Stockdon et al., 70 2006) and field data on dissipative beaches. They suggested that beach morphology (> -2 m MSL) influences the infragravity component of runup more than the nearshore morphology (< -2 m MSL) and indicated that including the foreshore beach slope in the formulation of improves predictability. Overall, it remains unclear if and when depends on beach slope.
In addition the similarity in the temporal scales of wave setup and infragravity motions could also be a confounding factor in measurements. Finally, a number of other studies have also proposed other predictors that introduce other parameters to 75 account for the cross-shore wind component and the tidal range (Vousdoukas et al., 2012), the presence of nearshore sandbars (Cox et al., 2013) or the sediment mean grain size for the case of gravel beaches (Poate et al., 2016). The abovementioned empirical runup formulas have been developed primarily with classic regression approaches (e.g.; Ruessink et al., 1998;Ruggiero et al., 2001;Stockdon at. al., 2006;Vousdoukas et al., 2012).
Because of the importance of accurate predictions of swash excursion, the predictors provided by Stockdon et al., 80 (2006) have been tested by various authors on beaches ranging from reflective to dissipative (e.g., Vousdoukas et al., 2012;Cohn and Ruggiero, 2016;Atkinson et al., 2017). Predictions using Stockdon et al. (2006) are certainly sound (especially considering the task of generating a universal formula for vertical swash excursion) even though differences between measurements and predictions, possibly associated to local conditions, are inevitably found. More importantly, the regression approach of multiple datasets first proposed by Stockdon et al. (2006) paves the way for our working hypothesis: can 85 powerful data-driven techniques be used to provide robust, reliable and realistic predictions of swash excursion?
When enough data exists, Machine Learning (ML) is a viable approach to regression problems. ML is a subdiscipline of computer science focused on techniques that allow computers to find insightful relationships between variables involved in swash processes, learning at each iteration (algorithm training and validation) from the provided dataset. A key goal of ML is to develop predictors that are generalizable (able to describe the physical process beyond the training dataset 90 itself). Many different data-driven techniques fall under the purview of Machine Learning (e.g., decision trees, artificial neural networks, Bayesian networks, and evolutionary computation), all of which have shown applicability in coastal settings (e.g., Pape et al., 2007;Knaapen and Hulscher, 2002;Dickson and Perry, 2015;Yates and Le Cozannet, 2012).
Previous Machine Learning work has focused on predicting runup and swash, but only for engineered structures, impermeable slopes, and/or for laboratory experiments (e.g., Kazeminezhad and Etemad-Shahidi, 2015;Bonakdar and 95 Etemad-Shahidi, 2011;Bakhtyar et al., 2008;Abolfathi et al., 2016) and not on natural beaches apart from Vousdoukas et al., (2011) which used artificial neural network (ANNs) for shoreline contour elevation (which includes the wave runup), on a natural beach in Portugal. In this study we focus on the use an evolutionary technique, Genetic Programming (GP), to solve the symbolic regression problem of developing new, optimized swash predictors.
In this contribution we first develop a swash excursion predictor using the original dataset of Stockdon et. al., 100 (2006), one of the most comprehensive studies in this area of research. In addition, we use data from Guedes et al., (2013), Guedes et al., (2011Guedes et al., ( , 2012, and Senechal, et al., (2011) to broaden the parameter space and to test the new swash equations.
The data used in this work cover a broad range of swash excursion including extreme wave conditions (maximum = 6.4 m in Senechal, et al., 2011). High swash excursions, generated by extreme storms, are of particular interest when studying coastal hazards because they relate to flooding, beach and dune erosion (Bosom and Jimenez, 2011;Stockdon et al., 2007). 105 The new ML derived results are also compared to the most widely used predictors from Stockdon et al., (2006). Finally, we discuss the physical interpretation of the GP predictors and how we can use ML to gain knowledge of physical process related to the infragravity swash component.

Data
This work is based on two published video image-derived runup datasets -13 field experiments in total. The first dataset 110 (referred to here as the "original dataset") is composed by 491 swash measurements from 10 experiments aggregated by Stockdon et al., (2006). The second dataset (referred to here as the "new dataset") consists of 145 swash measurements compiled for this work from three experiments performed by Guedes et al., (2013), Guedes et al., (2011), andSenechal, et al., (2011).
The compiled dataset of total swash is plotted in Fig. 1. The compilation of a large dataset deriving from 13 115 different experiments requires merging data collected using different techniques and equipment. Details of each experiment can be found in the original references. Looking at the environmental forcing conditions, Figure 1 shows that the original and new dataset cover similar ranges of beach slope, while they differ in significant wave height (the new dataset includes wave heights over 6 m) and peak period (the original dataset includes more short period waves). Both datasets include recordings of infragravity swash ( ; m), total swash ( ; m), beach slope ( ) and associated 125 offshore wave characteristics: significant wave height ( ) and peak period ( ). From these measurements the offshore significant wave length ( wave steepness ( / ), and Iribarren number ( ) are calculated. Experiments are located in North America, Europe and Oceania and cover a large range of the environmental condition (see Table 1 and Fig.   1).
130 Both datasets include all beach types, from dissipative to reflective. The two datasets also have a similar range of 140

Methodology
The large amount of data available (636 field swash records), including multidimensional variables, supports the feasibility of a ML approach. The data covers a wide range of environmental conditions (including extreme storms) and beach type, ensuring the applicability of our results to sandy beaches spreading from dissipative to reflective. We now outline the methods of the study. In Sect. 3.1, we present the supervised ML approach. In Sect. 3.2 we present the data pre-processing 160 technique used to decide what data is shown to the ML algorithm. In Sect. 3.3 we discuss the techniques used to test the results from the ML algorithm against the testing data.

Genetic Programming
GP is a population-based machine learning approach based on evolutionary computation (Koza, 1992). The process of genetic programming can generally be divided into four steps: 1) an initial population of solutions for the problem is 165 produced. For regression tasks such as developing a predictor for swash, the initial population of candidate solutions is in the form of equations (encoded as a tree or graph with a predefined mix of variables, operators and coefficients; Fig. 3). For step 2 of the routine the solutions are all compared to the training data to determine 'fitness' using a predefined error metrics; 3) the best solutions that minimise the error are proposed and the worst solutions are discarded; 4) new solutions created through 'evolutionary' rules (crossover via reproduction and mutation) and are added to the population of retained solutions. 170 Steps 2 through 4 are repeated until the algorithm is stopped. The search is stopped after the GP evaluated 10 11 formulas because the solutions stabilized and no significant improvement in formula performance was found.
At the end of a routine, when the solutions have stabilized, a final population of solutions exist. A range of final solutions is given by the algorithmmore mathematically complex solutions (with more variables, operators, and coefficients) that minimize error are given alongside more simple, parsimonious solutions with higher error. These solutions 175 exist along a "Pareto Front" that balances decreases in error with increasing solution complexity. Given a range of solutions with different error and complexity, we do not know of a perfect method for a user to determine the single best solution from the suite of final solutionsa user must decide on the solution according to different criteria: minimization of the error, computational time, physical meaning. In our work we adopted the criteria of minimization of the error with an eye toward the ability to interpret physical meaning from the formulas. A compromise between error reduction (more complex 180 predictors) and ability of the predictors to generalize (predictive power on new data) should be found during the selection of a predictor. All genetic programming in this study is performed using the software "Eureqa" developed by Schmidt and Lipson, (2009; which has successfully been used for a range of coastal problems (e.g., Goldstein et al., 2013;Tinoco et al., 2015). We search for predictors of total and infragravity swash elevationultimately searching for the best equation that satisfies ( Note also that we perform some experiments searching for total and infragravity swash as 190 a function of composite variables like wave steepness, wave power ( ), and the Iribarren number. However, the predictors did not show improvementalso keep in mind that the GP can autonomously find these interrelationships between the basic parameters themselves, leading to the appearance of these composite variables in each optimization experiment. In addition to physical parameters, constants are included in the research and the mathematical operations allowed to the GP are: addition (+), subtraction (-), multiplication (*), division (/), exponential (^) and square root (√). Predictors developed on 195 the training subset are assessed on the validation subset, using an error metric (also known as fitness function). From the available metrics we selected the mean squared error (MSE): where is the number of samples, is the measured value, and ( is the value predicted by GP as a function of .
All selected formulas from the genetic programming routine are further optimized. First, the formulas are 200 rearranged algebraically to ease interpretation by the user. Second, two coefficients of each selected formula are further optimized using a gradient descent algorithm in an iterative process.

Training, Validation and Testing
In order to obtain generalizable predictors, it is necessary to train, validate and test any ML routine on distinct and nonoverlapping subsets of data (e.g., Domingos, 2012). There is no universal, optimal method to select enough data to explain 205 variability of the dataset while still retaining the most data to use for testingrecent work by Galelli et al., (2014) highlights that, even with the numerous input variable selection methods that have been proposed, there is no single best method for all the typologies of environmental datasets, and for all environmental models.
We adopt the maximum dissimilarity algorithm (MDA) as selection routine (e.g., Camus et al., 2011), already successfully tested in other works of predictors developed by GP for physical problems (e.g., Goldstein and Coco, 2014). 210 The MDA is a routine for the selection of the most dissimilar points in a given dataset. Each data point is a vector composed by all the variables of our data set ( , , , , ), where each variable is normalized between 0 and 1. At each iteration (i=1…n), the MDA finds the most different data point from the data selected in all previous iterations. Consequently the MDA selects a diverse set of data from the original 491 data points used by Stockdon et al., (2006). The operator must set the number of data points selectedwe apply the MDA to 150 data points (~30% of the 215 original dataset). We also run the analysis using a subset of variables (not including the variables representing swash elevations) but no significant loss in prediction power of the algorithms developed by the machine learning algorithm was observed. The data selected by the MDA is used as the training subset and we use the remaining data (~70 % of the original dataset, not selected) as validation subset.
The predictors developed by the GP using this training data is tested using the new dataset from Guedes et al., 220 (2011), Guedes et al., (2013), and (Senechal, et al., 2011). This new dataset is completely independent from the training and unknown at the GP algorithm providing a test in the ability of the GP parameterization to generalise, even beyond the range of the testing and validation data (Fig. 1). The performance of our predictors using the testing data is compared to the Stockdon et al., (2006) predictors using the error metrics in Sect. 3.3.

Error Evaluation
We use three different error metrics for the testing phase and for comparing our predictor with known predictors in the literature. The mean square error as defined in Eq. (8), the root mean square error: and the maximum absolute error: 230 where is the number of samples, is the measured value, and ( is the value predicted by the GP as a function of .

Results of GP experiments
After ~10 11 formulas were evaluated, the solutions from the GP algorithm for both and follow a "Pareto front" 235 where the error decreases (compared with the validation subset) as the size (or complexity) of the formula increases. Generally, extremely complicated predictors fit the training and validation dataset better than simpler predictors but they may lose generalization power when tested on a separate testing dataset (overfitting). In other words a predictor with overfitting could represent the noise in the training and validation subsets instead of defining a general predictive rule (Dietterich, 1995) and therefore it will result in smaller training errors but in higher testing errors. Several viable techniques 240 exist for selecting the best solution to avoid overfitting, all meant to balance the fact that simpler solutions (the minimum description length) might risk losing more accurate information contained in more complex models (e.g., O`Neill et al., 2010). Picking a solution is a subjective task, and relies on specific domain knowledge on the part of the userhere we focus on predictors with clear physical plausibility (avoiding predictors with physical nonsense such as increase of as decreases) and avoid predictors that are difficult to interpret, (e.g., extremely nonlinear relationships, possibly a result of 245 overfitting the training dataset). We also focus on two predictors for both the and , evaluating a simpler and more complex predictor to determine if the more complex expression warrants use when generalized to the testing dataset.

Total Swash
Following the principle of error reduction and physical interpretability of the results, we finally select from the pool of candidate solutions available from the GP experiments, two formulas for , one simpler (Eq. 11) and one more elaborate 250 (Eq. 12). , .
Note that the coefficients of both Eq. (11) and (12) are dimensional. Eq. (11) represents the best solution in terms of error reduction while maintaining a physical interpretability. It also stands out for its simplicity and only weak nonlinearity 255 it looks similar to a multiple linear regression. In Eq. 12 the first and the third term depend exclusively on , while the second term includes the contribution of the incident waves. The total swash in both GP predictors is related to the wave peak period (instead of wave length) different from previous formulations (e.g. Stockdon et al., 2006;Holman and Sallenger 1985). Recently also Poate et al., (2016) used the wave peak period in their runup predictor for gravel beaches. The use of the peak period instead of the wave length has no influence on the physics of the predictor, but could allow the users a more 260 direct utilization of the formula. Figure 4 displays a comparison of performance of swash predictors obtained through the ML approach (Fig. 4a, 4b) and Stockdon et al. (2006) (Fig 4c), on the training and validation dataset. This does not constitute a test of the predictors, only a consistency check to see that the predictors are modelling the training and validation data appropriately. Overall Eq.
(5) shows a higher scatter in the whole original dataset (details on the errors can be found in Table 2 and Sect. 4.4). It is not 265 clear why all formulas do not successfully fit the data Duck 82 and Delilah (especially for m). The Stockdon et al., (2006) predictor shows scatter at larger total swash, while the GP predictors shows slight under fitting of swash elevation during large events. Stockdon et al. (2006), Eq. (5), (6) and (7)    has higher scatter than both GP predictors (Fig. 5), and considerably overestimates swash measurement of Truc Vert (intermediate beach under extreme highly energetic wave storm) and Ngarunui (dissipative beach under mild wave conditions) while underestimates the observations at the reflective beach of Tairua. Equations (11) and (12), from the GP routine, perform similarly (Fig. 5 a, b).

Infragravity swash
The two formulas selected for describing are Eq. (13) and the more complex Eq. (14): 290 As in the case of , the coefficients of Eq. (13) and (14) for are dimensional. The reader should also note that both formulas depend on the beach slope in contrast with Ruessink et al., (1998) and Stockdon et al., (2006), Eq. (7) in this manuscript, but in agreement with other slope inclusive predictors (Ruggiero et al., 2001;2004). Eq. (14) represents the best 295 solution in terms of error reduction while maintaining physical meaning and Eq. (13) is a simpler predictor where the contribution of beach slope and waves to infragravity swash remains separate. Both Eq. (13) and (14) have the same nonlinear term ( , with slight difference in the coefficients, that describes the incoming waves. The threshold that flips this term from negative to positive is related to wave height and is probably an indication that for small waves the infragravity component is extremely limited (this terms needs to be negative to compensate for other terms that only depend 300 on beach slope and provide a constant contribution). The ML predictor not only suggests that the beach slope is important when predicting infragravity swash, but also indicates a nonlinear interaction between waves and beach morphology through the wave length (second term of Eq. 14). Figure 6 displays a consistency check, highlighting the performance of swash predictors obtained through a ML approach (Fig 6a, 6b) and Stockdon et al., (2006) (Fig 6c), on the training and validation dataset. It is not clear why all 305 formulas provide less precise prediction with data from Duck 84 and Duck 82 but we note that these two experiments focused on intermediate to reflective conditions with relatively large wave conditions (Table 1). Generally the three formulas seem to perform similarly. Some differences are found in dissipative settings (i.e., Agate and Terschelling) -predictions by Stockdon et al., (2006) tend to overestimate compared to the GP predictors.

Figure 6: Observed versus predicted by (a) GP Eq. (13), (b) GP Eq. (14) and (c) Stockdon et al., (2006) Eq. (7) on the original dataset (Stockdon et al., 2006). This is not a test of any predictor, only a consistency checkall data was shown to the GP algorithm and is the same data used to generate the linear regression in panel c.
The same difficulty in predicting swash excursion on a dissipative beach is observed on Ngarunui (Fig. 7). Even though this 315 experiment was performed under mild wave conditions ( 0.6-1.26 (m) and 8.1-12.4 (s), Table 1) compared to the experiments at Agate and Terschelling. Note that dissipative beaches are the one were the infrragravity motion has greater importance. Also Truc Vert presents dissipative conditions in the swash zone, while the surf zone is intermediate ( up to 0.87 as reported by Senechal et al., 2011). For this experiment Eq. (13) and (7) (Fig. 7 a, c) overestimate while Eq. (14) has better performance for the dissipative beach Ngarunui, suggesting that it could be the most appropriate for 320 predictions.  Eq. (7) Stockdon et al., Overall the GP predictors perform better than the Stockdon et al., (2006) formulation for all the error metrics considered and for the new testing datasets (for both and ). While for the predictor of smaller size performs better than the more complex predictor, for the errors decrease with increasing GP predictor size (Eq. (13) to (14)

Discussion
In this work we use data compiled by Stockdon et al., (2006) to build new predictors, by the use of GP, for both total and 355 infragravity swash elevations. We then test the generalizability of these new predictors using new data (including some extreme conditions). This is different from many previous applications of ML in coastal settings in two ways: First, we are testing the ML-derived predictor on data that is collected from a different setting (compared to the training data)three beaches not included in the training data. Second, the testing data includes events that are outside the data range of the training datawe are extrapolating the ML-derived predictor as a test of its generalizability. We do not assume a single 360 criteria for the selection of the best predictors, but we find a compromise between error reduction (on the testing dataset) and the physical interpretability of the results.
Results demonstrate that the GP predictors proposed in this work perform better than existing formulas and that ML can identify nonlinear relationships between the variables of this problem. Specifically,Eq. (14) introduces the dependence of on the beach slope, but also its nonlinear relationship with the wave length. Furthermore, solutions for found by the 365 GP algorithm with the smaller size (not shown) show a simple linear dependence on with a constant (identical to early formulation of wave runup e.g. of Guza and Thornton, 1982). More complex predictors add a dependence on , √ (similar to Eq. 7 of this manuscriptfrom Stockdon et al., 2006) and .
The GP algorithm found solutions for that include the beach slope ( , a variable that is never excluded from predictors of further increasing size. Because the candidate solutions resulted from GP experiments follow a "Pareto front" 370 distribution in which the increase in fitting (smaller MSE) grows as the size of the formula rises, the continuous inclusion of for more complex predictors implies that including in formulation reduces prediction error. The improvement of classic empirical techniques, by innovation in data-driven methodologies, has already been discussed (e.g. the case of depthaveraged velocities over model vegetation by Tinoco et al., 2015). Experiments based on GP also highlighted a way to focus on and add dependencies in predictors describing coastal processes (e.g. grain size in the case of prediction of ripple wave 375 length by Goldstein et al., 2013). The predictors proposed in this work perform well on a wide range on environmental conditions, including, as defined by Nicolae et al., (2016), the highest stormy condition dataset (Truc Vert beach) recorded in the field and available in the literature. Furthermore the work here demonstrates that ML derived results, when physically plausible, may be generalizable beyond the limits of the training data, extrapolating to a novel, out of sample data set.
Looking at the limitation of the proposed models, the variables taken into account ( , , , ) are easily 380 accessible but also oversimplify the processes that affect swash. For instance, we do not include the influence of the wave directional spread (Guza and Feddersen, 2012), the cross-shore wind component and the tidal range (Vousdoukas et al., 2012). However, in order to include these and other aspects (e.g., role of underwater vegetation, nearshore bathymetry) it is necessary to perform more field experiments that record swash, runup and other relevant variables. An additional limitation is that the swash formulas obtained in this study approaches a nonzero value as wave height approaches zero. While this is 385 physically incorrect, the data used in the analysis does not include the limit condition of 'no waves-no swash'. Consequently, even if the GP formulas obtained do not correctly predict the limit condition corresponding to a no wave scenario, the prediction for both datasets has smaller errors compared to commonly used formulas. Generally the results from machine learning technique are strictly related to the range of the training and validation datasets (original dataset in Fig. 1). This work demonstrated that the applicability of the predictors can sometimes be used beyond the range of the testing dataset 390 (new dataset in Fig. 1). However it is unknown how predictors will perform in settings beyond those in the present workfuture tests on new field data are therefore recommended. Furthermore, parameterizations always work better when free parameters are optimized to a given site by using existing data and it should be considered when proposing universal parameterizations.
Our results contribute to the discussion on the role of beach slope on the prediction of the infragravity component of 395 swash. The GP algorithm found an dependence on beach slope and increasingly more complicated formulas (i.e., more precise predictions) found by the GP all include beach slope as one of the predictive variables. This result is in line with studies such as Ruggiero et al., (2001 and2004) and in contrast with Stockdon et al., (2006), Senechal et al. (2011) and Ruessink et al., (1998). Although difficult to quantify and extremely simplified (this parameter together with sediment diameter should integrate the effect of the entire cross-shore profile), our results suggests that some parameter involving the 400 beach profile should be considered when predicting runup characteristics.
Our results are relevant for a variety of applications where the errors related to empirical formulation obtained by classic regression techniques could be reduced. For instance in the case of coastal hazards, Stockdon et al., (2006) formulation for wave runup is used by Serafin and Ruggiero, (2014) for their extreme total water level estimation and by Bosom and Jimenez, (2011) in their framework for coastal hazards assessment. Accuracy in runup formulation has 405 consequences for risk and vulnerability assessment as coastal management maps (De Muro et al. 2017;Perini et al., 2016), and other several studies regarding sediment transport (Puleo et al., 2000), swash zone hydrodynamics and morphodynamics (Puleo and Torres-Freyermut, 2016).

Conclusions
Starting from a large dataset covering a wide range of swash, beach and wave field characteristics, we developed 410 two new predictors for total and infragravity swash elevations, using the machine learning technique of Genetic Programming. We tested and compared our new formulas with previously developed and largely accepted parameterizations of swash (e.g., Stockdon at al., 2006) using independent published datasets. Results of the two GP predictors selected (one for total and one for infragravity swash) show better performance compared with the formulation of Stockdon et al., (2006), evaluated using an independent (unknown to the algorithm) dataset (which included extreme highly energetic wave storm, 415 particularly relevant for coastal hazards). This work contributes to reducing the uncertainty in predicting the swash excursion and consequently in assessing the coastal vulnerability and hazards (e.g. inundation) which depend in part upon wave swash (Bosom and Jimenez, 2011). A better prediction of swash excursion could also influence retreat or accommodation strategies and integrated planning for the mitigation of coastal hazards. Furthermore, GP results indicate that the beach slope influences the infragravity component of the swash -GP predictors improve in performance when the beach slope was 420 included. We therefore conclude that beach slope is a relevant parameter when predicting the infragravity component of the swash elevation, even though this is contrary to several previous studies (e.g., Stockdon et al., 2006;Ruessink 1998;Senechal et al., 2011). ML and specifically GP can be a useful tool for data-rich problems providing robust predictors and possibly also physical insight. The role and importance of the scientist is not reduced or substituted by the machine but instead improved thanks to a powerful data analysis tool. 425