This study develops methods for estimating lightning climatologies on the
day

Generalized additive models (GAMs) are used to model both the probability of occurrence and the intensity of lightning. Additive effects are set up for altitude, day of the year (season) and geographical location (longitude/latitude). The performance of the models is verified by 6-fold cross-validation.

The altitude effect of the occurrence model suggests higher probabilities of lightning for locations on higher elevations. The seasonal effect peaks in mid-July. The spatial effect models several local features, but there is a pronounced minimum in the north-west and a clear maximum in the eastern part of Carinthia. The estimated effects of the intensity model reveal similar features, though they are not equal. The main difference is that the spatial effect varies more strongly than the analogous effect of the occurrence model.

A major asset of the introduced method is that the resulting climatological information varies smoothly over space, time and altitude. Thus, the climatology is capable of serving as a useful tool in quantitative applications, i.e. risk assessment and weather prediction.

Severe weather, associated with thunderstorms and lightning, causes
fatalities, injuries and financial losses

Lightning is a transient, high-current (typically tens of kiloamperes)
electric discharge in the air with a typical length in kilometres. The
lightning discharge in its entirety is usually termed a lightning
flash or just a flash. Each flash typically contains several
strokes , which are the basic elements of a lightning discharge

One possibility of harnessing the complete data set to produce a lightning
climatology in such a manner is using generalized additive models

In this study GAMs are applied to estimate a climatology of the probability
of lightning and a climatology of the expected numbers of flashes with a
spatial resolution of 1 km

A study investigating lightning data for the period 1992 to 2001

Other studies focus on lightning detected in the vicinity of the Alps: a
6-year analysis of lightning detection data over Germany reveals that the highest
activity is in the northern foothills of the Alps and during the summer months,
when the number of thunderstorm days goes up to 7.5 yr

The paper is structured as follows: the lightning detection data, the
region of interest, Carinthia, and the pre-processing of the data are
described in Sect.

In this study 6 years of data (2010–2015) from the ALDIS detection network

ALDIS is part of the European cooperation for lightning detection (EUCLID)

The region of interest is the state of Carinthia in the south of Austria at
the border with Italy and Slovenia. Carinthia extends 180 km in an east–west
direction and 80 km in a north–south direction. The elevation varies between
339 m and 3798 m a.m.s.l. (above mean sea level). For invoking elevation
as a covariate into the statistical model (Sect.

The lightning data, for May to August of the 6 years, are transferred to the
same 1 km

Altitude of Carinthia (m a.m.s.l.) averaged over
1 km

Daily frequency of 1 km

Empirical climatological probability of lightning for a day in July
in Carinthia on the 1 km

Figure

This section introduces the statistical models for estimating the
climatologies for lightning occurrence and lightning intensity. The aim of
the statistical model is to explain the response, i.e. the probability of
occurrence or counts of flashes, by appropriate spatio-temporal covariates,
i.e. logarithm of the altitude (logalt), day of the year
(doy) and geographical location (lon,lat). Since the
response might non-linearly depend on the covariates, we choose generalized
additive models (GAMs) as a statistical framework, for which a brief
introduction is presented in Sect.

It is assumed that the number of flashes detected within a cell and day are
generated by a random process

In Sect.

The main motivation for using a GAM is the possibility to estimate
(potentially) non-linear relationships between the response and the
covariates. In the following, the basic concept of GAMs is introduced for an
arbitrary parameter

The value of

Estimation of a GAM for such a large data set, i.e. 7 309 152 data points, is
feasible, e.g. via function

The first component models the probability of lightning Pr(

The second part is the truncated count component for the expected number of
flashes given lightning activity. We will refer to this component as
the intensity model. It is assumed that the positive counts of flashes within
a spatial cell and day follow a zero-truncated Poisson distribution with the PDF,

The family for modelling the zero-truncated Poisson distribution

In this section the verification procedures are briefly introduced, namely the cross-validation, the applied scores and the block-bootstrapping.

In order to ensure the verification of the model along independent data, we
applied a 6-fold cross-validation

The log-likelihood is applied as scoring function, which is also called
logarithmic score in the literature on proper scoring rules

To assess confidence intervals of the estimated parameters and effects,

This section presenting the results of the statistical models is structured
as follows: first, the non-linear effects of the occurrence model are
described in Sect.

The effects of the occurrence model on the scale of the additive
predictor.

The estimates of the effect of the occurrence model (Sect.

How the effects in Fig.

The second term

The temporal or seasonal effect

The spatial effect

Coordinates of the sample locations in Fig.

The effects of the intensity model on the scale of the additive
predictor. Labelling is analogue to Fig.

As the altitude is a function of longitude and latitude, one could ask whether it would be sufficient to only take spatial effect into account that implicitly contains the altitude and skip the explicit altitude effect. In general the presented method would be capable of modelling the influence of the altitude within the spatial effect implicitly. However, the shape of the altitude in the region of interest is very complex. Thus, a spatial effect with a large degree of freedom would be required in order to account for the complex altitude shape. As we know the shape of the altitude we can pass it to the GAM as an isolated effect. The altitude effect contains only information associated with the altitude while the remaining effects are captured by the spatial term.

The introduced model (Eq. 1) could also be extended by potentially non-linear functions of other covariates that are meaningful for a climatological assessment, e.g. surface roughness, slope and aspect of topography. However, in the present case, adding these covariates was not improving the model.

The non-linear effects of the intensity model (Sect.

The altitude effect

The seasonal effect

Climatological probability (expected values) of lightning in
Carinthia on the 1 km

The spatial effect

In order to illustrate how climatological information can be drawn from the
GAMs, two different kinds of applications are presented. First, maps show
spatial climatologies (Figs.

Climatological number of flashes (expected values) in Carinthia on
the 1 km

The spatial distribution of climatological probabilities of lightning
occurring in a cell for 20 July (close to the seasonal peak) varies from 1.8 to
6.5 % (Fig.

A comparison of Figs.

For the same day, 20 July, the expected number of flashes is depicted in
Fig.

Next to the spatial information one can extract seasonal climatologies for
different locations (Fig.

Relative frequencies (%) of number of flashes (columns) for 20 July
of the sample locations (rows) in Fig.

Seasonal climatologies for sample locations, which are highlighted
in Fig.

The climatologies of the expected number of flashes are depicted in
Fig.

Finally, it is also possible to derive relative frequencies of the number of
flashes of a specific location and day of the year from the GAM. The relative
frequencies have been derived for the five sample locations
(Table

This section addresses two helpful points for end users. The first one is on
how to choose the cross-validation score in order to avoid overfitting of the
seasonal effect (technically speaking the selection of its smoothing
parameter

To illustrate the first point a subset of the large data set is
selected. We pick all data points in a 5

Figure

The reason for the distinct estimates lies in the dependence structure of the
data. For one cell the probability of detecting lightning on 1 day
given that lightning was detected on the previous day is 6.7 %. Spatial
dependence is much stronger. Provided that lightning occurs in one cell, the
probability of lightning occurring in the adjacent cell is 41 %. This strong
spatial dependence comes with a physical meaning. First, the preconditions
for thunderstorms and lightning to take place vary much stronger from
day to day than in the course of a single day. Second, thunderstorm systems,
i.e. multi-cell thunderstorms or super-cell thunderstorms, cover a large
area or even travel over a larger area

For this reason we recommend exploring the dependence structure of the data first and defining the cross-validation score according to this dependence structure.

Local fits for the location E. Circles show empirical
estimates. For comparison the estimate of the full occurrence model is added
(dashed line).

Finally, we discuss how the introduced model (Eq.

This study presented how generalized linear models (GAMs)

A hurdle approach was employed to compute a climatology of the intensity of lightning in order to properly handle the large number of zeros in the data. Thus, two aspects of lightning are captured by the models: the probability of lightning occurring and the number of flashes detected within a grid cell. The effects of the two models are similar but not equal. In particular, the spatial effect of the intensity model varies more strongly than the corresponding effect of the occurrence model. For instance, local intensity maxima are triggered in the vicinity of radio towers.

In sum, the occurrence model and the count model took roughly 150 and
180 degrees of freedom, respectively. This is a relatively small number compared
to the degrees of freedom required by other methods. Counting and averaging
flashes with respect to a resolution of km

ALDIS data are available on request from ALDIS (aldis@ove.at) – fees may be charged.

Thorsten Simon, Georg J. Mayr and Achim Zeileis defined the scientific scope of this study. Thorsten Simon performed the statistical modelling, evaluation of the results and wrote the paper. Georg J. Mayr provided support on the meteorological analysis. Nikolaus Umlauf and Achim Zeileis contributed to the development of the statistical methods. Wolfgang Schulz and Gerhard Diendorfer were in charge of data quality and gave advice on the lightning-related references. All authors discussed the results and commented on the paper.

The authors declare that they have no conflict of interest.

We acknowledge the funding of this work by the Austrian Research Promotion
Agency (FFG) project