Modelling long-term fire occurrence factors in Spain by accounting for local variations with geographically weighted regression

Humans are responsible for most forest fires in Europe, but anthropogenic factors behind these events are still poorly understood. We tried to identify the driving factors of human-caused fire occurrence in Spain by applying two different statistical approaches. Firstly, assuming stationary processes for the whole country, we created models based on multiple linear regression and binary logistic regression to find factors associated with fire density and fire presence, respectively. Secondly, we used geographically weighted regression (GWR) to better understand and explore the local and regional variations of those factors behind human-caused fire occurrence. The number of human-caused fires occurring within a 25yr period (1983–2007) was computed for each of the 7638 Spanish mainland municipalities, creating a binary variable (fire/no fire) to develop logistic models, and a continuous variable (fire density) to build standard linear regression models. A total of 383 657 fires were registered in the study dataset. The binary logistic model, which estimates the probability of having/not having a fire, successfully classified 76.4 % of the total observations, while the ordinary least squares (OLS) regression model explained 53 % of the variation of the fire density patterns (adjusted R2 = 0.53). Both approaches confirmed, in addition to forest and climatic variables, the importance of variables related with agrarian activities, land abandonment, rural population exodus and developmental processes as underlying factors of fire occurrence. For the GWR approach, the explanatory power of the GW linear model for fire density using an adaptive bandwidth increased from 53 % to 67 %, while for the GW logistic model the correctly classified observations improved only slightly, from 76.4 % to 78.4 %, but significantly according to the corrected Akaike Information Criterion (AIC c), from 3451.19 to 3321.19. The results from GWR indicated a significant spatial variation in the local parameter estimates for all the variables and an important reduction of the autocorrelation in the residuals of the GW linear model. Despite the fitting improvement of local models, GW regression, more than an alternative to “global” or traditional regression modelling, seems to be a valuable complement to explore the nonstationary relationships between the response variable and the explanatory variables. The synergy of global and local modelling provides insights into fire management and policy and helps further our understanding of the fire problem over large areas while at the same time recognizing its local character.


Introduction
Human factors are critical to explain fire occurrence worldwide, but they are particularly relevant in European Mediterranean countries with a long fire history related to traditional farming activities, as is the case in Spain.It is estimated that more than 90 % of forest fires are caused by people in European Mediterranean countries (Leone et al., 2009;Vélez, 2009;FAO, 2007).Additionally, these areas have experienced important socioeconomic transformations over the last few decades, including land abandonment and/or higher tourist and urban pressures on the forest areas, which could imply higher ignition risk.Given the importance of the human risk, any improvement in the modelling and assessment of factors that drive human-made ignitions is critical for fire prevention, planning and management.Also, a better knowledge of the spatial patterns of fire occurrence and their relationships with underlying factors of human risk becomes a necessity to locate and make prevention efforts more efficient.
It is also necessary to further improve the modelling techniques.In fire occurrence modelling, different statistical and regression modelling techniques have been applied at several temporal and spatial scales, in many cases assuming that the model parameters are valid and homogeneous for the entire study area from which the data were sampled or, alternatively, assuming that the model structure is spatially stationary, such as the examples in Syphard et al. (2008), Chuvieco et al. (2010), Vilar et al. (2010) and Kwak et al. (2012).However, when large geographical study areas are involved, it would be more reasonable to find varied rather than constant relationships.For instance, Koutsias et al. (2005Koutsias et al. ( , 2010)), when modelling fire densities, observed that the explanatory power of classical linear regression increased considerably after assuming varying relationships instead of constant ones.Their analysis was developed at the provincial level (NUTS-3) across the European Mediterranean Basin countries (Portugal, Spain, southern France, Italy and Greece) using geographically seighted regression (GWR), so initiating the use of GWR in fire modelling studies.
Although interest in accounting for regional variations in wildfire occurrence factors has been shown recently in some studies (Moreira et al., 2009;Carmo et al., 2011;Gonzalez-Olabarria et al., 2011;Padilla and Vega-García, 2011;Nunes, 2012), except for Koutsias et al. (2005Koutsias et al. ( , 2010)), this has only begun to be addressed very recently by using local geographically weighted regression (Tulbure et al. 2011;Poudyal et al., 2012;Avila-Flores et al., 2010;Sá et al., 2011;Rodrigues and De la Riva, 2012).In our study, similar to those of Koutsias et al. (2010) and Sá et al. (2011), GWR is considered as a complement to the "global" regression modelling approach, with which it is compared in order to better understand particular processes at the regional scale, but at the same time recognizing its own local characteristics and patterns (Fotheringham et al., 1996(Fotheringham et al., , 1997(Fotheringham et al., , 2002)).The term "global" is used here to describe a model that refers to a homogeneous process in which the relationships being modelled are the same everywhere within the study area.

Objectives
The work presented here is an extension of previous research (Martínez et al., 2009) that showed how the rate of humancaused fires in Spain can be predicted and explained from socioeconomic and geographic variables, assuming spatially stationary processes.The overall objective of this new study is to check, in a quantitative way, if these stationary models are adequate to properly explain and understand longterm fire occurrence patterns in a large study area such as the Spanish peninsular territory.In order to achieve this overall objective, three improvements have been implemented over the previous work.
The first concerns the predicted variable.Instead of modelling only the high versus low occurrence, in this paper we have addressed two aspects of fire occurrence: (i) fire presence/absence and (ii) fire density, using a longer historical time period (25 yr versus 13) for both.For these two aspects we built two predictive "global" models at the national scale using two "classical" regression approaches: OLS linear regression to explain long-term fire density patterns and, complementarily, a binary logistic model to define the existing underlying factors behind fire presence and to better understand why in some of the municipalities no fires have been observed during the studied period.The terms "ordinary" and "classical" are used here to represent the default regression model in many statistical software packages, in contrast to other specific models like GWR.
The second innovative aspect of this work is the analysis of the spatial variations within the fire occurrence models to explore possible local characteristics and regional patterns.For this we used GWR, which assumes non-stationary relationships between the explanatory variables and fire occurrence.Given the large territory of Spain with important climatic and socioeconomic differences -for example between the northern and the southern regions, the Atlantic and Mediterranean areas, or between the mountains, the large plains and the river depressions -we hypothesize that some explicative factors should show region-specific trends, deviating from global or national patterns.In addition, we assumed that a unique stationary model for Spain would be "notably influenced by the high fire occurrence of the Galicia region (northwest of the country).This area contains 11 % (850) of all municipalities in Spain, but 70 % (152 891) of the forest fires, and thus creates a spatial imbalance in the global model" (Martínez et al., 2009(Martínez et al., , p. 1251)).These obvious premises have scarcely been tested in Spain using quantitative models.Therefore, to explore the spatial variability, we focus on the assessment (i) of variables presenting contradictory signs to the global coefficients, (ii) of areas where we observe unusually or unexpectedly high or low local coefficients, and (iii) of variables presenting a positive or negative influence in the model; finally, (iv) we try to deduce, if possible, the cause of those spatial patterns.
The third novelty of this paper with respect to our previous study (Martínez et al., 2009) was the consideration of missing explanatory environmental variables regarding climate, vegetation and topography.These structural environmental variables are essential to enable fire ignition and they are the basis on which the remaining socioeconomic, historical, land use and landscape variables interact.Besides, missing response variables could be the cause of the unexplained spatial variations in the model parameters (Fotheringham et al., 2002;Koutsias et al., 2012).For this reason we have specifically tried to consider all key explanatory factors, including the environmental ones.
The starting point of creating fire occurrence models is to identify the most critical factors and then define and gather datasets to generate quantitative models.We identified factors based on previous literature reviews about fire causes and fire modelling (Leone et al., 2003(Leone et al., , 2009;;Martínez et al., 2004Martínez et al., , 2009;;Vélez, 2009), where the theoretical or expected relationships between fires and each factor/variable in Spain and Mediterranean countries were explained.Then, each of those explanatory factors was measured as a numeric indicator (direct or surrogate) from available datasets.
However, model building in this study is not fully "conceptual" because the final variable selection is obtained by semiautomatic statistical techniques.Besides, this paper does not intend to build a "mechanistic" or "cause-effect" model that explains human fire occurrence in the different environments of Spain.For that objective it would be better to use other approaches, for example, such as defining different environmental regions or study areas inside Spain and building specific models for each region, and subsequently comparing which are the most influential variables for each.Nor do we intend to analyze the statistical and spatial interactions between explanatory variables within the global models.

Study area and fire database
As the dependent variable, the numbers of human-caused fires occurring within a 25 yr period  were computed for each of the 7638 municipalities of the Spanish peninsula (487 000 km 2 ) analyzed.This is another improvement over the previous study of Martínez et al. (2009) in which a 13-yr series from 1987 to 2000 was used.These data were obtained from the Spanish Forest Fire Report Database, one of the best and longest fire statistics in Europe (Leone et al., 2009).A total of 383 657 fire events has been gathered and considered in the database, regardless of their size.A binary variable (fire/no fire) for each municipality was derived to develop logistic models, and a continuous variable (fire density or the total number of fires in the period divided by the area of each municipality in km 2 ) was estimated to build linear regression models, in this case selecting only the 6993 municipalities in which one or more fires were registered during the study period.Log transformation was applied to convert the original fire density values (Fig. 1c) to approximate a normal distribution (Fig. 1d), since the original count data would be more appropriately modelled with Poisson or Negative Binomial models depending on their variance to mean ratio (Cardille et al., 2001).The spatial distribution of both dependent variables is shown in Fig. 1a and b, revealing critical regions for fire occurrence, especially in the NW of the country, and also along the Mediterranean coast and in some mountain ranges in the centre.
In the previous study the dependent variable was defined as the cumulative number of fires in the studied period divided by the forest area of each municipality.Instead, the flammable land cover (both vegetation and crops) are here considered as explanatory variables in order to analyze their influence and weight inside the models, and not as part of the dependent variable.

Independent variables
The independent variables used in the analysis were composed initially by 29 socioeconomic and demographic indicators together with agricultural and land cover statistics compiled in Martínez et al. (2009).The identification of these variables was based on experts' interviews, analyses of fire reports and causality statistics, and an extensive literature review.Some of the factors could not be estimated directly or from surrogate variables, while some others were not available for all the regions.Table 1 in Martínez et al. (2009) lists these variables along with their theoretical relationships with fire ignition factors and the literature source, when available.Additionally, for the present study 6 new environmental variables were added referring to topographic characteristics (mean altitude and slope), climatic indicators (summer temperature and mean annual precipitation obtained from Ninyerola et al. (2005), using the available station data set with more than 15 and/or 20 yr), and forest vegetation statistics (total wildland area and the wildland area without tree cover, both obtained from the Forest Map of Spain-MFE50 developed between 1997-2006).Total wildland area included tree-covered areas (standing forest), shrublands and grasslands and theoretically this variable is supposed to be more related to fire presence (binary model).Wildlands without tree cover only comprise shrublands and grasslands.We hypothesize that these types of areas are more strongly correlated with the fire density (linear model).All 35 variables were compiled and calculated at the municipality level for the peninsular territory of Spain, with the exception of the region of Navarre, and all of them selected after checking for multi-collinearity as described also in Martínez et al. (2009Martínez et al. ( , p. 1244)).

Global models using classical regression
Both predictive models, based on OLS and binary logistic regression, were calculated in SPSS using automatic stepwise forward procedures for variable selection in combination with manual modification (i.e.selection using the "introduce method").All cases were checked for potential collinearity problems of the selected variables by calculating the correlation matrix and applying other common statistical  tests such as tolerance coefficient, variance inflation factor (VIF) (Krebs et al., 2012) and eigenvalue analysis (SSTARS, 2012).The regression models were built using the standardized Z-scores for the dependent and independent variables.Additionally, the normal distribution of residuals and the lack of systematic patterns were checked for OLS.Clustering of over and/or under predictions is, for example, evidence that at least one key explanatory variable is missing.For these reasons we analyzed the histogram, the scatterplot, the normal Q − Q plot and the residual maps.
To evaluate the influence of individual variables in the models, several criteria were computed and analyzed globally: (i) a simple calculation of the standardized coefficients according to the method of Menard (2010, p. 89); (ii) the tstatistic and its level of significance, although in the case of logistic regression we used the Wald statistic; (iii) the step at which the variable was input into the model; and (iv) the change in the R 2 when the variable was removed from the model (the greater the change, the more important the variable).In the case of logistic regression, we used the change in logarithm of likelihood (−2 LL); and (v) the odds ratio or the exponential of the logit coefficient B (Exp (B)) for the logistic model case.

Local models using GWR
To overcome the assumption of stationarity we applied the GWR approach using the independent variables of global regression, both for the linear and the logistic model.All analyses were implemented within GWR 3.0.1 software for Windows (Fotheringham et al., 2002;Charlton et al., 2003) using both the adaptive (nearest neighbours) and the fixed (distance) kernel types, with the minimization of the corrected Akaike Information Criterion (AIC c ) being the criterion to determine the optimal bandwidth size of the kernel functions.This parameter (AIC c ) was also used to compare the global OLS or logistic model with the local GWR model.As a complement, the ANOVA tests the null hypothesis that the GWR model, in the linear approach, represents no improvement over the global OLS model.
The main output from GWR for each observation point is a set of parameter estimates (local coefficients for each independent variable) and associated diagnostics (standard errors, influence index, Cook's D statistics, local R 2 statistic, and local standard deviation) that can be visualized within a GIS environment (Charlton and Fotheringham, 2009).Detailed analysis of these maps allowed us to better understand and explore the spatial variability of the explanatory factors, as local R 2 values show the performance of the GWR model in different areas.Additionally, GWR software includes two tests to determine whether the local parameter estimates are significantly stationary or not.Firstly, the variables might exhibit non-stationarity if the inter-quartile range (25 % and 75 % quartiles) of the GWR parameters is greater than ±1 standard deviation (SD) of the equivalent global OLS param-eters (Fotheringham et al., 2002;Wang et al., 2005).Secondly, significance of the spatial variability in the local parameter estimates can be examined by a Monte Carlo test, but only in the case of linear GWR, since this test is not available for logistic GWR.
Similar to OLS regression, some spatial autocorrelation statistics for the residuals of the models have been estimated using Moran's I index of spatial autocorrelation.This made it possible to explore their spatial structure and identify whether GWR captured the spatial pattern of the residuals.If the residuals were autocorrelated then the results of the OLS regression analysis would violate one of the assumptions of OLS regression and the regression analysis would be unreliable.In the case of logistic regression, we computed the Average Nearest Neighbour Distance Index (ANND value) included in ArcGIS Desktop 10 in the Spatial Statistics toolbox.With an index < 1 the pattern would tend towards clustering, while if > 1 the trend is toward dispersion or competition.The interval range is from 0 to 2.14.(ArcGIS Desktop 10 Help).

Classical regression models
After collinearity analysis we decided not to introduce the variables "slope" and "population occupied in agriculture" into the regression procedure.Instead, we introduced the variable "agricultural areas but with significant areas of natural vegetation".The stepwise procedure for the binary logistic regression selected 9 significant variables for the final model, which successfully classified 76.4 % of the total observations using the estimated optimal cut-off point of 0.91, which corresponds to the intersection of the two lines in which sensitivity and specificity are equal (Vasconcelos et al., 2001).Among the nine explanatory variables identified as critical by the analysis, the most important variables were the forest surface, population decrease and forest-cultivated land interface.Mean annual precipitation and mean summer temperature were also relevant (Table 1).The spatial distribution of the residuals (over and under estimations) of the logistic model (Fig. 2) shows that the spatial pattern of the errors is not very clear because they are dispersed through different regions of the country.However, some areas were error-free, particulary in the north, northwest and some parts of the centre and west.Most of the errors are underestimations (Table 2) because it was more probable that at least one fire had occurred during the 25-yr period than for no fires at all.The Average Nearest Neighbour Distance Index (ANND) showed the residuals tended towards clustering (0.77) and the Z-score of −18.6 indicates there is less than a 1 % likelihood that this clustered pattern could be the result of a random process.
In the case of OLS regression, the model selected 23 variables as significant using an automatic stepwise procedure.Municipalities where prediction and observation data did not agree; either fire is predicted when it is not observed (overestimated) or fire is not predicted when it is observed (underestimated).Ordinary logistic regression on the right and GW logistic regression on the left.To simplify the model we selected the first nine most explanatory, plus three others in positions 11, 15 and 18 (decrease in number of owners of agrarian holdings, % owners of agrarian holdings > 55 yr, and density of agricultural machinery, respectively) that, in our opinion, included relevant aspects of agrarian structure, as reported by different regional studies.Consequently, the final model consisting of 12 variables (Table 3) explained 53 % of the variation of the dependent variable (adjusted R 2 = 0.53).Among these variables, mean annual precipitation, density of agricultural properties, mean altitude, population decrease and non tree-covered forest surfaces were the most explanatory.For the residuals of this OLS regression model, the Kolmogorov-Smirnov test value was low (0.028) but still significant (p = 0.000), showing that the residuals fit the normal curve poorly.However, as can be observed in the histogram (Fig. 3b), the residuals with a mean value close to 0 and a SD of 0.99 approximate acceptably well to the shape of the normal curve.The Normal Q − Q plot (Fig. 3c) represents the expected values in a straight line when the data are normally distributed.In this case, the residuals fit properly except for low observed values (low fire densities).A clustered pattern can be observed in the distribution map of the residuals (Fig. 3d) in parts of the country, although there is no clear systematic pattern.The over-predicted cases (negative values) were more concentrated in some inland areas of the eastern part of the Iberian Peninsula.Under-prediction was more dispersed with some areas especially clustered in the NW.
Both models, logistic and OLS, are complex with a high number of variables, and for some variables the effect of introducing them in the model (measure by the change in R 2 or the change of −2 LL) is very weak, as can be seen in Tables 1  and 3, although still significant.In any case, the high number of variables in the model was considered in agreement with the objective of identifying the factors that are more significant to explain fire risk, rather than obtaining parsimonious models with very few variables.

Geographically weighted regression models
The GWR results showed that local models based on GWR generally fit better than global models based on classical OLS or logistic regression, while the number of effective parameters increased considerably from 10 to 29.7 in logistic, and from 13 to 63.03 or 158.34 for linear models.Based on the minimization of the corrected Akaike Information Criterion (AIC c ), the best fixed bandwidth size in the case of logistic GWR was 219 km (AIC c = 3321.19).For the linear GWR, the best bandwidth size for the fixed mode was a distance of 154 km (AIC c = 5898.6),while for the adaptive mode it was 5724 nearest neighbours (AIC c = 6395.5).Models like this, with a high number of neighbours, tend to have a poor fit and present an oversmoothed pattern, as could be   observed in the resulting maps.For that reason, we finally selected a kernel size of 1300 nearest neighbours, which showed a better fit (AIC c = 5063.72)when trying to better capture the regional variations within the country, avoiding both over-and under-smoothing.In relation to the adaptive kernel, the manually chosen value of 1300 nearest neighbours approximately represents the number of municipalities of two contiguous average regions in Spain.Regarding the fixed kernel of 154 km defined automatically by statistical criteria for GWR software, this is also considered appropriate to capture regional variations, since the mean area of the Spanish regions (autonomous communities) is 34 524 km 2 (e.g.Catalonia or Extremadura), which corresponds to a circle of about 145 to 160 km radius.
Comparing the fitting of the OLS and GWR models, the GWR logistic model, using a fixed bandwidth of 219 km, correctly classified 78.4 % of the observations compared to 76.4 % of the ordinary logistic regression.This improvement is not as high as expected, but it is significant as the deviance (−2 LL) improved from 3431.2 to 3261.4 and the AIC c from 3451.2 to 3321.19.The optimal cut-off point for the classification of this GW logistic model according to the graph of sensitivity versus specificity is 0.90.For the linear approach, the explanatory power of the OLS model increased from 53 % to 67 % in the case of the adaptive mode, using a bandwidth of 1300 nearest neighbours, and 62 % in the case of the fixed mode using a bandwidth of 154 km.The adaptive mode gave slightly better results, as indicated by the coefficient of determination with a 14 % improvement, while in the case of logistic GWR it was only 2 %.The AIC c enhanced considerably using GWR (from 7440.3 to 5063.7).The Fvalue of the ANOVA test suggests that the GWR model is a significant improvement on the global OLS model in Spain, at a confidence level less than 0.01 (99 %), for both fixed and adaptive models.
Figure 4 shows local R 2 values for GW linear and GW logistic models indicating the areas where the predictions of the models are better.In both cases, best fits were found in the northwest and some eastern areas of the Mediterranean coast where there is usually high fire occurrence (check Fig. 1a).However, these maps are too oversmoothed in capturing local variations, especially in the logistic GWR model.The trend of the residuals of the logistic model towards clustering did not significantly decrease from the ordinary model to the GWR model according to the Average Nearest Neighbour Distance Analysis (Table 2), and there was only a minor improvement, especially in the overestimation errors.Although in the GW logistic model there were fewer errors, the spatial distribution was very similar to the ordinary logistic model (Fig. 2).Also, analysis of the linear GW regression model residuals revealed similar characteristics to the global OLS model, with a mean value of 0.01 and a SD of 0.51, acceptably following the shape of the normal curve (Fig. 5b).The Kolmogorov-Smirnov test value was low (0.03) but still significant (p = 0.000), showing that the normal fit was poor.The residuals fitted properly except for low fire density values according to the normal Q − Q plot (Fig. 5c).However, the scatterplot was more compact along the tendency line and the standardized residual map (Fig. 5d) showed a more dispersed distribution through the study area in comparison to the OLS model (Fig. 3d), without any evident systematic pattern.These analyses indicated a slightly better performance of the GWR model.

Regional and local variations
The results of the Monte Carlo test on the local estimates pointed out a significant spatial variation (at 0.1 % significance level) in the local parameter estimates for all the variables of both linear GWR models (fixed and adaptive).Besides, all variables in the linear and logistic models showed evidence of spatial variability (non-stationarity) across the study area since the inter-quartile range (25 % and 75 % quartiles) of the GWR parameters was greater than ± 1 SD of the equivalent global OLS parameters.
Local coefficient estimates for each explanatory variable are presented in Fig. 6 for logistic GWR, and in Fig. 7 for adaptive linear GWR.Negative coefficients are represented by cold colours (green to blue) and positive coefficients with warm colours (orange to red).The objective of these maps is to explore the spatial variability and to understand better local and regional variations of the fire occurrence causal factors in Spain, developed in the discussion section.

Spatial autocorrelation of residuals
Spatial correlograms of the residuals of the linear models (Fig. 8) show that there is significant spatial autocorrelation of the residuals of the OLS regression model up to a distance of 600 km, while for the residuals of the GWR model the autocorrelation has been reduced significantly but still exists in relative short lag distances, up to approximately 110 km (Table 4).Less structured residuals have been observed in other studies dealing with GWR (Koutsias et al., 2010), indicating that although the method does not directly address spatial autocorrelation issues (Jetz et al., 2005), it provides a solution to the problem of spatially autocorrelated errors (Propastin and Kappas, 2008).

Discussion
In both regression modelling approaches there were important variables related with land and population abandonment, agrarian activities, or development processes, in addition to forest properties and climatic variables.However, only two variables, precipitation and population decrease, were common between the two approaches, indicating different underlying mechanisms for fire presence and for fire density at the community level.In this discussion we analyze the most important explanatory variables for each model and explore their spatial variations according to GWR local parameters (Figs. 6 and 7).Some variables presented high variability in explaining the dependent variable, occasionally even being contradictory to the global coefficients.

Driving factors of long-term fire presence
The percentage of wildland area was the most important factor to discriminate non fire-prone from fire-prone municipalities (defined as those in which at least one fire was observed during the 25-yr period studied).This is reasonable since the probability of fire ignition and spread was very low in places with a very low percentage of forest and natural cover, as fuels are very scarce or non-existing.The influence of this variable was higher in the south of the country, as observed in Fig. 6.Another important variable was forest-cultivated land interface (ICFSUP P), which is related to agricultural activities where fire is frequently used in arable and crop lands, Fig. 6.Local coefficients for GW binary logistic model using a fixed bandwidth of 219 km.Negative coefficients are mapped with cold colours (green) and positive with warm colours (orange to red).Variable names and their descriptions are in Table 1.(Martínez et al., 2009, Ortega et al., 2012;Gonzalez-Olabarría et al., 2012), which found that the landscapes most vulnerable to fire were those with fine-grained forest-agriculture mixtures or mosaics, where the human-caused fires were more intense than homogeneous and nonfragmented landscapes.
Variables DIS 50 91 (population decrease between 1950 and 1991) and DIS SAU (decrease in agricultural area between 1989 and 1999) were positively correlated with the occurrence of at least one fire event.Both variables can be associated with abandonment of land and traditional activities and the movement of population from rural and mountainous areas to lowlands and urban areas.A consequence of  3. land abandonment is fuel build-up.Instead, according to the positive correlations observed for these variables, in municipalities with population reduction and land abandonment, fires were expected in cases where the decrease is lower.Under this demographic and social context, areas maintaining a relatively higher agricultural population are more fire prone.This is an example of the contradictory types of rela-tionships between the explanatory and response variables in wildfire occurrence modelling.According to the local coefficient maps in the NW, the occurrence of at least one fire is more closely associated to the population presence than the rural exodus or land abandonment (DIS 50 91), while in the south, the presence of agricultural land (DIS SAU) is more influential (Fig. 6).Population and agricultural area decrease are also closely related with the population potential (POT DEN), a similar concept of population density or human presence, which is further associated with the probability of fire ignition and area burned.This has a positive influence in many studies (Cardille et al., 2001;Maingi and Henry, 2007;Romero-Calcerrada, 2008;Catry et al., 2009;Sebastian-Lopez et al., 2008;Martínez et al., 2009;Marques et al., 2011;Nunes, 2012) or a negative relationship for some areas in other studies (Narayanaraj and Wimberly, 2012;Sá et al., 2011).Additionally, the previously mentioned variables were also related with the CORINE land use class "agriculture but with significant areas of natural vegetation" (CL 21 PM), showing that fire occurrence was more likely in municipalities where agricultural and forest areas are intermixed, similar to what has been reported by Ortega et al. (2012).Recently, when trying to explain the extreme 2007 fires in the Greek Peloponnese, Koutsias et al. (2012) observed that the CORINE land cover category "agricultural land, highly interspersed with significant areas of natural vegetation" was the most affected by fire, reflecting the encroachment of natural vegetation in abandoned fields and also recent patterns of evolution in the wildland-rural interface where agricultural land is increasingly intermixed with natural vegetation.
Together with land abandonment and population decrease, the economic value of lands and forests was identified as a factor of human-caused fires due to a decreasing involvement in conservation and land management by the remaining rural population.In this sense, the NOGES PF variable in the model was positively correlated with fire occurrence.This variable measures the percentage of forest surface with less management, control and planning over time, which in Spain is the private forest land, land belonging to local authorities with free use, consortiums and neighbouring forests.All these kinds of properties have a generally worse conservation and protection status than national, regional or public forest.Local coefficients for this variable were positive in the Mediterranean coast and negative in the NW.Padilla and Vega-Garcia (2011) found that several variables related to forest ownership (private, public and communal areas) were significant for the northern ecoregions of Spain.
Finally, climatic variables were also found to be relevant factors to explain fire occurrence.Mean summer temperature and mean annual precipitation are important factors, especially in the warmer areas of the E and SE.Many studies (Shyphard et al., 2008;Drever et al., 2008;Vilar et al., 2010;Padilla and Vega-García et al., 2011;Oliveira et al., 2012, Sá et al., 2011;Narayarnaraj and Wimberly, 2012;Nunes, 2012) selected several climatic variables as very significant in their fire models -some related to precipitation, such as fire-season and off-season precipitation, precipitation seasonality, soil water storage and soil moisture anomaly -and others related to temperature, especially the maximum temperature in the driest season.

Driving factors of long-term fire density
Summer temperature was not a significant factor to explain fire density in linear regression, unlike in the logistic model (fire/no fire incidence).However, the mean annual precipitation was the most important factor to explain forest fire density (Table 3).Local coefficients for this variable were positive across almost the entire country, especially in the SE and some parts of the inland west, which may be related to the impact of rainfall on fuel availability, particularly in the dry SE regions of Spain.The exception was observed in the NE and central Pyrenees (negative local coefficients) where rainfall occurs also in the summer and therefore the fire season tends to be shorter, although this also happens in other parts of the country where positive coefficients are found.Oliveira et al. (2012) pointed out that the most important variables related with fire density distribution in the EUMed region were off-season precipitation (positive influence related to vegetation growth and fuel accumulation) and fire season precipitation, with a negative relationship limiting fire ignition and spread.Sá et al. (2011) indicate that in the drier areas of sub-Saharan Africa there is a positive relationship between fire incidence and soil water, which is important for vegetation growth.
The density of agricultural properties (PAR SEXP) was positively related to fire occurrence, suggesting that highly partitioned agricultural properties increased the human-caused ignition risk.In combination with the variable "density of agricultural machinery" (MAQUIN D), this indicates that, the higher the number of properties and machines, the more likely conflicts and negligence become.Fire is one of the preferred tools to eliminate stubble, weeds, field margins, hedges and shrubs, and to reclaim abandoned lands (Leone et al., 2003), especially in areas where agricultural parcel density is very high and irregularly distributed in space.In Spain, more than 20 % of the fires that occurred within the 25-yr period  were caused by intentional or negligent agricultural burnings and other burnings of shrublands to regrow or maintain pastures for livestock, although the importance of these causes could be far greater, and actually estimated at 45 % (Leone et al., 2009;Koutsias et al., 2010).In addition, as explained in Martínez et al. (2009Martínez et al. ( , p. 1248)), in many cases mechanization implies a willingness to obtain more space and land for cultivation, and fires are one of the tools to achieve it.Also, more intensive agricultural activity, promoted by mechanization over time, may increase the need to burn more stubble, agricultural residues and prunings, as well as a higher number of ignitions produced by accidental sparks deriving from engine operation.Similar agriculture related variables have also been used in other fire modelling studies (Sebastian-Lopez et al., 2008;Catry et al., 2009).According to the spatial distribution of the local coefficients, we identified areas where the expected direction in relation to high fire density exhibited opposite trends to the global model.This is the case of variable PAR SEXP in the Valencia Region and the northern plateau (especially in the "Ribera del Duero" region), where local negative coefficients were found.Both areas have high fragmentation of small-holdings both in irrigated and in rainfed arable land agriculture, but they have few forested areas and a landscape with less wildland-agrarian mosaics.However these areas present high and positive coefficients for variable agricultural machinery.Instead, in the eastern Cantabrian regions (Basque Country and Cantabria) and the Guadalquivir depression in the SW, variables MAQUIN D (agricultural machinery) and TIT 55 P (percentage of old owners of agrarian holdings) showed the opposite trends.In some of these humid Atlantic environments of the north, livestock and forestry are more important than agriculture.The southwestern Guadalquivir area presented one of the lowest indices in the number of agricultural machines compared with other irrigated areas of the country.
The mean municipality altitude (ALT MEAN) variable was the third most explanatory in the model.The global coefficient and most of the local coefficients throughout the country were negative, so at lower altitudes more fire densities were expected, especially in the central part of the country and the eastern coast.However, we observed a positive influence across the entire S of the country and in the NW (Galicia).Coefficients were neutral (close to 0) in the N of Aragon and the central Pyrenees in the upper mountains.Other studies showed that elevation presents contrasting relationships with fire occurrence.Some studies found a positive influence (Catry et al., 2009;Marques et al., 2011) as a consequence of pastoral burns (renovation of pastures for livestock) or a higher frequency of lightning in higher altitudes (Vazquez and Moreno, 1998;Narayanaraj and Wimberly, 2012), while other studies found a negative correlation (Vasconcelos et al., 2001;Sebastian-Lopez et al., 2008;Gonzalez-Olabarría et al., 2012;Vilar et al., 2010;Padilla and Vega-Garcia, 2011), suggesting that lower elevations tend to be the more xeric places, with dryer fuels and less productivity.However, fuel dry-out is probably a function of the temporal distribution of precipitation, which in the Mediterranean area is very high in summer due to the seasonal drought.In addition, when altitude increases, the vegetation loading tends to decrease with more unburnable areas appearing (rocks, sparse vegetation, ice, etc.) although only over a certain height.Unlike lightning-caused fires, Narayanaraj and Wimberly (2012) detected a negative association between elevation and slope and human-caused fires in a mountain area of Washington State.Similarly, Vilar et al. (2010) found a less intense land use at high elevations in the Madrid Region.In some regions of Spain, as in other parts of the world, population, roads and some land uses responsible for the higher number of ignitions are concentrated in coastal areas, decreasing with increasing elevation (Badia-Perpinya and Pallares-Barbera, 2006).The same conclusions about how topography reflects the locations of human activities in relation to fire ignitions are indicated for a region of China by Xu et al. (2006), showing that the anthropogenic factors are closely related to fires when altitudes of forests are lower than 900 m.However, at higher elevations their influence is much lower.In other studies the topography effect has been related with fires, using variables related with roughness or terrain shape index (Dickson et al., 2006, Nunes, 2012;Padilla and Vega-Garcia, 2011;Narayanaraj and Wimberly, 2012).
As in the logistic model, population decrease between 1950 and 1991 (DIS 50 91) was found to be a relevant explanatory variable.Besides, the inclusion in the model of DIS TIT (decrease of the number of owners of agrarian holdings 89-99) reinforces the idea of the relationship between land abandonment and rural exodus, with a high fire risk (Hill et al., 2008;Nunes, 2012).The local coefficient maps of these two variables portrayed two patterns: (1) on one hand, in the east of the country, fires are related to population abandonment and rural exodus, with the resulting accumulation of fuels, but at the same time there is some maintenance of agricultural activities related to fires because the variable DIS TIT correlates positively with fire density; and (2), in contrast to this trend, the western region, particularly in the NW, DIS 50 91 has positive correlation with fires, indicating a further influence of the population presence on fires.However, in these western areas, DIS TIT correlates negatively to fires, so if agricultural land owners decline and agricultural activities are abandoned, fires tend to be more frequent, mainly because of greater fuel accumulation.In addition, in www.nat-hazards-earth-syst-sci.net/13/311/2013/ this NW area (Galicia), which is the most fire affected region, we found strong positive coefficients with the variable ageing agricultural population (TIT 55 P), because in this region the older population is the more accustomed to use fire in farming works, as they did in their youth (Vélez, 2009).This variable also suggests the impact of the land abandonment process in increasing fire frequency.A similar trend was also observed by Nunes (2012) in Portugal, where the ageing index correlates negatively with the density of the population and is positively associated with agricultural land abandonment.The same process has been pointed out for the Mediterranean European Basin using the difference of the youth index between 1990 and 1960 (Koutsias et al., 2010) as a proxy.
Two of the variables relate fires to forestry and landscape features.Thus, more fires were found in landscapes with a large percentage of shrublands and grasslands (DESAR P), especially in the Cantabrian Mountains and N coast where there are numerous pastoral fires to create, maintain, or regrow pastures for livestock (Moreira et al., 2011).Many studies have confirmed that shrubland is one of the most fire affected land cover types (Nunes et al., 2005;Sebastian-López et al., 2008;Catry et al., 2009;Moreira et al., 2009;Bajocco and Ricotta, 2008;Nunes, 2012;Oliveira et al., 2012) due to a combination of factors: "a higher rate of fire spread, a larger frequency of ignitions (e.g. to create pastures) and a lower fire fighting priority" (Marques et al., 2011, p. 783).In sub-Saharan Africa the herbaceous vegetation proportion is the variable best related with fire incidence (Sá et al., 2012).On the other hand, more fire density was found in Spanish landscapes with high fragmentation (FRAG7 × 7), especially in the three main river depressions where agriculture dominates (Duero, Ebro and Guadalquivir) and where there is less forest cover.Heterogeneous and interspersed patterns composed by spatially separated patches with different land uses presents higher ignition frequencies (Ortega et al., 2012;Ruiz-Mirazo et al., 2012).
According to the spatial distribution of the local coefficients of these variables, it might be surprising to find high coefficients in areas where the values of these variables are low, such as in the southern part of Central Spain and in the inland mountains of the south, where a low population density, few population centres and scarce wildland-urban interfaces are found.However, these few places with higher densities of human activities seem to tend to bias the model and therefore seem to be decisive for the fire occurrence in those areas.

Conclusions
In this study we built two complementary models which cover the whole range of the human-caused fire occurrence in Spain during a 25 yr period.The first model tries to predict and explain fire densities, and the second fire presence/absence.The most influential variables for both models are related to agrarian activities, land abandonment, rural exodus and development processes.Additionally, specific traits of vegetation, climatology and topography have also been very important, since they affect the initial conditions enabling fire incidence.The inclusion of these environmental variables results in an improvement over the previous model (Martínez et al., 2009), on which this study is based.
Relevant differences between both models are found because only two explanatory variables are common: mean annual precipitation and population decrease.Potentially flammable land cover types (total wildland area and agricultural/forest interfaces and mosaics) and the mean summer temperature are the main specific variables for the fire presence model.Instead, agricultural fragmentation, elevation, shrublands and grasslands, along with human structures (roads, settlements, etc.) and other rural indicators are specific variables for the fire density model.
However, these stationary models and global regression approaches seem to be insufficient to appropriately explain the underlying fire factors, because all variables selected showed significant spatial variations at the regional or local scale according to the GWR model.Nevertheless, only some of them present, in fact, very high variability or contradictory relationships with the response variable and/or the global trends.For example, the density and fragmentation of agricultural plots has a negative relationship with fires in regions with low forest areas and less wildland-agrarian mosaics, as along the E coast in the Valencia Region and the eastern part of the northern plateau (Ribera de Duero), and both areas are characterized by having small-holdings of irrigated agriculture and also rain-fed arable lands.Also, precipitation, decrease in owners of agrarian holdings, population entities or the urban-forest interface present unexpectedly high regression coefficients in areas where those variables have low original values.Thus, although precipitation seems to be a very important factor to model fire densities in the driest areas of the country, it is not that relevant in other areas Nat.Hazards Earth Syst.Sci., 13, 311-327, 2013 www.nat-hazards-earth-syst-sci.net/13/311/2013/ with more rainfall availability.Similarly, the percentage of forest and wildland area has a higher influence in the S of the country, which is drier and with less vegetation as compared to the N. Another interesting pattern is observed between the E and the W-NW where population presence seems to have a further influence on fires, although at the same time important land abandonment processes are observed.In the E, instead, fires seem directly more related to population abandonment and rural exodus, but also to agricultural activities, though to a lesser degree.Finally, lower altitude seems more related with the fire density along the eastern coast and in the central part of the country, unlike the pattern observed in the S and NW where higher altitudes present more fire risk.In the upper mountains of the Central Pyrenees this relation is neutral.This analysis is another contribution to the field of fire management and fire risk assessment in the Mediterranean countries, which quantitatively and spatially demonstrated the importance of considering regional variations and local modelling as a complement to global and stationary models in order to better understand the fire problem over large study areas.

Fig. 1 .
Fig. 1.Map and histograms of dependent variables.Spatial distribution of the fire density in the Spanish municipalities with more than 1 event (A).The log transformation was applied to convert the original values of the dependent variable (C) to an approximate normal distribution (D).The map (B) shows the spatial distribution of the municipalities without fires used for the binary logistic modelling.

Fig
Fig. 2.Municipalities where prediction and observation data did not agree; either fire is predicted when it is not observed (overestimated) or fire is not predicted when it is observed (underestimated).Ordinary logistic regression on the right and GW logistic regression on the left.

Fig. 3 .
Fig. 3. Residual analysis of the OLS regression model: scatterplots between observed and predicted observations (A), histogram data plots of the standardized residuals (B), normal Q − Q plot of the standardized residuals (C) and map of the standardized residuals (D).

Fig. 5 .
Fig. 5. Residual analysis of the GW linear regression model: scatterplots between observed and predicted observations (A), histogram data plots of the standardized residuals (B), normal Q − Q plot of the standardized residuals (C), and map of the standardized residuals (D).

Fig. 7 .
Fig. 7. Local coefficients for adaptive GWR linear model using a bandwidth of 1300 nearest neighbours.Negative coefficients are mapped with cold colours (green to blue) and positive with warm colours (orange to red).Variable names and their descriptions are in Table3.

Fig. 8 .
Fig. 8. Spatial correlograms of the residuals of the OLS linear regression (left) and GWR regression modelling (right).

Table 1 . Model parameters and sensitivity analysis for ordinary logistic model: ranking of influence of the input variables (the lower the ranks, the more important).
Notes: intersect (constant) = −3.887.Ranking criteria: (i) standardized coefficients; (ii) Wald statistic; (iii) step at which the variable was input into the model in a forward stepwise automatic procedure; (iv) change in log of likelihood (−2 LL) when the variable was removed from the model; (v) odds ratio or the exponential of the logit coefficient B (Exp (B)).

Table 2 .
Residual spatial performance of logistic models using the Average Nearest Neighbour Distance Analysis.

Table 3 .
Model parameters and sensitivity analysis for ordinary linear regression model (OLS): ranking of influence of the input variables (the lower the ranks, the more important).Ranking criteria: (i) standardized coefficients; (ii) t statistic; (iii) step at which the variable was input into the model in a forward stepwise automatic procedure; (iv) change in the R 2 when the variable was removed from the model.

Table 4 .
Moran's Index Summary for different band distances on the Linear GWR residuals.