Abstract: In the present study, the meteorological variables including air temperature, solar radiation, wind speed and relative humidity were considered daily. The R2 of ANNs and SVMs models were obtained 0.92 and 0.96, respectively; whereas the efficiency of ANNs and SVMs models were 0.83 and 0.91, respectively. Both ANNs and SVMs approaches work well for the data set used in greenhouse condition, but the SVMs model works better in comparison with the ANNs model.
INTRODUCTION
Nowadays, greenhouse cultivation plays a significant role on fresh vegetable production in Iran. Greenhouse farming, also known as protected cultivation, is one of the farming systems widely used to provide and maintain a controlled environment suitable for optimum crop production leading to maximum profits. This includes creating an environment suitable for working efficiency as well as for better crop growth (Donatelli et al., 2003). The main advantage with the greenhouse farming is that the production can be obtained throughout the year, which is not possible in the open field farming due to heavy rainfall and wind, especially in tropical regions (Allen et al., 1998). In addition, greenhouse technology can contribute to solve the global issues such as the shortage of artificial energy, water, environmental pollution and instability of ecological system in various ways. Irrigation system is one of the most important components affecting the yield and quality of agricultural products from greenhouse farming system. However, more research is needed on irrigation management to establish the appropriate method to be used for estimating crop evapotranspiration (ET), to avoid the excess or deficit water application, soil salinity and groundwater contamination. ET0 refers to the water removed from a unit ground area completely covered with a reference crop, healthy and unstressed and with ample water supply (Walter et al., 2002). The ET0 is used to quantify evaporative demand within a region and to estimate crop ET when the ETo is multiplied by a crop coefficient (Kc) factor to account for differences between the grass and crop ET (Allen et al., 1998; Schuch and Burger, 1997). Operational software tools in the domains specified above require the estimate of ET0 and procedures to do that have been repeatedly implemented in software applications. This is one of the reasons why there is an increasing demand for modular approaches in model development (Jones et al., 2001). Such a modular approach leads to the concept of encapsulating the solution of a modeling problem in a discrete, replaceable and interchangeable unit (Donatelli et al., 2004; Troya and Vallecillo, 2001). Reference evapotranspiration can be determined by lysimeter but this method is very expensive and not easy to use. Therefore ET0 could be estimated by some equipments such as pan, reduced pan and atmometer, but the method based on class A pan evaporation is the most common due to its simplicity and relatively low cost. Many researchers have attempted to estimate the evaporation through the indirect methods using the climatic variables, but some of these techniques require the data which can not be easily obtained (Rosenberry et al., 2007). The evaporation process is strongly nonlinear in nature; some researchers should emphasize the estimation of relatively accurate evaporation in the research greenhouse using modeling techniques (Bruton et al., 2000; Lindsey and Farnsworth, 1997; Xu and Singh, 1998). Sudheer et al. (2002) investigated the prediction of class A Pan Evapotranspiration (PE) using the neural networks model. They used the neural networks model for the evaporation process using proper combinations of the observed climate variables such as temperature, relative humidity, sunshine duration and wind speed for the neural networks model. Kisi (2006) used proper combinations of the observed climatic variables such as air temperature, solar radiation, wind speed, pressure and relative humidity for the neuro-fuzzy model to estimate the daily PE. The aim of this study is estimation of daily pan evaporation using support vector machines and artificial neural networks in greenhouse condition. As the pan evapotranspiration data can be used in the greenhouse and this method is introduced as a suitable method in the greenhouse and as the artificial neural networks and SVMs are used out of the greenhouse (Bruton et al., 2000), the aim of this study was to evaluate if these models can be used inside the greenhouse and if it is possible to estimate Reference evapotranspiration by using climate data inside the greenhouse.MATERIALS AND METHODS
Research Site and Operating Dates
The experiment was carried out from 23 September 2007 to 23 September 2008
in Isfahan University of Technology, Iran. The geometric characteristics of
the glasshouse are as follows: eaves height 3.5 m, ridge height 5 m, total width
10 m and length 20 m. The local altitude is 1624.4 m with latitude 32° 42’
N, longitude 51° 28’ E, mean annual precipitation of 134 mm, mean
annual temperature of 17°C and mean relative humidity of 38%. The glasshouse
was built at North-South orientation, covered with a polyethylene film of 0.15
mm thickness. It was naturally ventilated with a single continuous roof vent
and lateral windows were kept open during daytime. For measuring microclimate
inside the glasshouse, air temperature was measured by Hobo pendant temperature
light sensors (2 No.) and relative humidity was measured by RH-sensor. Incoming
solar radiation was measured by luxmeter which is placed at the center of glasshouse
0.15 m above the floor. The handy vane anemometer was used to measure daily
wind speed at 2 m above ground level and a class A evaporation pan were placed
in the southern part of the greenhouse. The class A pan was constructed of galvanized
iron sheet, 1.21 m in diameter and 0.25 m in depth. The water depth was maintained
between 0.18-0.22 m; the water level was thus 0.07-0.03 m below the rim in order
to avoid a great variation in water volume. Water level was measured daily using
a hook gauge with resolution of 0.01 mm, which was placed on a still well. Evaporation
readings of class A pan were taken everyday between 7:00 and 8:00 a.m. Reading
the class A pan installed inside the greenhouse was not influenced by rain directly,
since glass cover protected the equipments from rain. Reference evapotranspiration
is calculated by the following equation (Abedi-Koupai and
Asadkazemi, 2006):
ET0 = KP Epan |
(1) |
Where:
ET0 | = | Reference evapotranspiration (mm) |
KP | = | Pan coefficient |
Epan | = | Pan evaporation (mm) |
The KP values were assumed to be unity as recommended by Fernandes et al. (2003) for greenhouse condition.
Support Vector Machines
A support vector machine uses a linear model to separate the sample data
through some nonlinear mapping from the input vectors into the high-dimensional
feature space (Eslamian et al., 2008; Tay
and Coa, 2001). The linear model constructed in the new space can represent
a nonlinear decision boundary in the original space. SVM aims at finding a special
kind of linear model, the so-called optimal separating hyperplanes. The training
points that are closer to the optimal separating hyperplane are called support
vectors, which determine the decision boundaries. In general cases where the
data is not linearly separated, SVM uses the nonlinear machines to find a hyperplane
that minimizes the number of errors on the training set. Consider a training
set D = {xi,yi}Ni = 1 with input
vectors X I = { X1I, . . . , XIN}
∈ Rn and target labels yi ∈ {-1, + 1}.
SVM binary classifier satisfies the following conditions:
yi (wT φ (xi)+b≥1)
i = 1,...,N |
(2) |
where, w represents the weighting vector and b is the bias.
The nonlinear function Φ (0): Rn →Rnk maps
the input vectors into a high-dimensional feature space. From Eq.
2, it can be seen that it is possible for multiple solutions to separate
training data points. From a generalization perspective, it is the best to choose
two bounding hyperplanes at opposite sides of a separating hyperplane wT
Φ (X)+ b = 0 with largest margin 2/(|w|2). However, most of
the classification problems are linearly non-separable cases. Therefore, it
is general to introduce slack variables æI to permit misclassification.
Thus the optimization problem becomes as follow:
(3) |
(4) |
where, C is the penalty parameter of the error term. The solution of the primal
problem is obtained after constructing the Lagrangian. Then, the primal problem
can be converted into the following QP-problem.
(5) |
(6) |
where, αi is Lagrange multipliers, Qij = yiyjΦ
(X)T Φ (X). Due to a large amount of computation, inner product
is replaced with kernel function which satisfies Mercer’s condition, K(xi,
xj) = (X)T Φ (X). Finally, we get a nonlinear decision
function in primal space for linearly non-separable case.
(7) |
Four common kernel function types of SVMs are given as follows:
• | Linear kernel: k (xi, xj) = xTi xj |
• | Polynomial kernel: k (xi, xj) = (YxTi xj + r)d |
• | Radial basis kernel: k (xi, xj) = exp (-Y||xi - xj||2) |
• | Sigmoid kernel: k (xi, xj) = tanh (YxTi xj+r) |
where, d, r ∈ N and Y ∈ R+ are constants (Eslamian et al., 2008).
Modeling for SVM
Model selection and parameter search play a crucial role in the performance of SVMs. However, there is no general guidance for selection of SVM kernel function and parameters so far. In general, the Radial Basis Function (RBF) is suggested for SVMs. The RBF kernel nonlinearly maps the samples into the high-dimensional space, so it can handle nonlinear problem. Furthermore, the linear kernel is a special case of the RBF. The sigmoid kernel behaves like the RBF for certain parameter; however, it is not valid under some parameters. The second reason is the number of hyperparameters which influences the complexity of model selection. The polynomial has more parameters than the RBF kernel. Finally, the RBF function has less numerical difficulties. While RBF kernel values are 0<Kij = 1, polynomial kernel value may go to infinity or zero when the degree is large. In addition, polynomial kernel takes a longer time in the training stage and is reported to produce worse results than the RBF kernel in the previous studies (Sudheer et al., 2002). The linear kernel SVM has no parameters to tune except for C. For the nonlinear SVM, there are additional parameters, the kernel parameters c to tune. Improper selection of the penalty parameter C and kernel parameters can cause overfitting or underfitting problems. Currently, some kinds of parameter search approach are employed such as cross validation via parallel grid-search, heuristics search and inference of model parameters within the Bayesian evidence framework (Sudheer et al., 2002). For median-sized problems, cross-validation might be the most reliable way for model parameter selection. In v-fold cross-validation, the training set is first divided into v subsets. In the ith (i = 1, 2,. . . , v) iteration, the ith set (validation set) is used to estimate the performance of the classifier trained on the remaining (v-1) sets (training set). The performance is generally evaluated by cost, e.g., classification accuracy or Mean Square Error (MSE). The final performance of classifier is evaluated by mean costs of v folds subsets. In grid-search process, pairs of (C, c) are tried and the one with the best cross-validation accuracy is picked up. In this study, it is preferred a grid-search on (C, c) using 10- fold cross-validation for the following reasons. Firstly, the cross-validation procedure can prevent the overfitting problem. Secondly, computational time to find good parameters by grid-search is not much more than that by the other methods. Furthermore, the grid-search can be easily parallelized because each (C, c) is independent. While other methods are iterative process, which might be difficult for parallelization. We use LIBSVM software to conduct SVMs experiment. The overall procedure of modeling SVM is shown in Fig. 1.
The adequacy of the ANNs and SVMs evapotranspiration models were evaluated
by estimating the coefficient of determination (R), defined based on the evapotranspiration
estimation errors as:
(8) |
(9) |
(10) |
where, Ei(pan) and Ei(simulated) are daily pan evaporation
measurement and ANNs model evaporation, Emean.
Fig. 1: | Overall procedure of modeling SVM |
Artificial Neural Networks
Artificial neural networks are a type of parallel computer structure, within
which a number of processing units are linked together so that the computer’s
memory is distributed and information is passed in a parallel manner. A large
number of ANN architectures and algorithms have been developed so far, multilayer
feedforward networks (Sudheer et al., 2002), self-organizing
feature maps Hopfield networks, counterpropagation networks, radial basis function
networks and recurrent ANNs (Sudheer et al., 2002).
Of these networks, the most commonly used are feedforward networks and radial
basis function networks. Multi-layer feedforward networks have been found to
perform best when used in hydrological applications and as such they are by
far the most commonly used (Sudheer et al., 2002).
The attempt to choose between different methods and defining which is superior,
is likely to fail as in most cases the choice should be application oriented.
It is preferable for every new application to test different types of ANNs rather
than use a pre-selected one.
Feed-Forward Propagation Neural Networks (FFNN)
The most commonly used ANN is the three-layer feed-forward ANN. In feedforward
neural networks architecture, there are layers and nodes at each layer. Each
node at input and inner layers receives input values, processes and passes to
the next layer. This process is conducted by weights. Weight is the connection
strength between two nodes. The numbers of neurons in the input layer and the
output layer are determined by the numbers of input and output parameters, respectively.
In the present feed-forward artificial neural networks are used. The model is
shown in Fig. 2. In the Fig. 1, i, j, k denote nodes input layer, hidden layer
and output layer, respectively. W is the weight of the nodes. Subscripts specify
the connections between the nodes. For example, Wij is the weight
between nodes i and j. The term feed-forward means that a node connection only
exists from a node in the input layer to other nodes in the hidden layer or
from a node in the hidden layer to nodes in the output layer; and the nodes
within a layer are not interconnected to each other.
To evaluate the performance of these models in daily ET0 estimates,
between the predicted and measured evapotranspiration using the class A pan
method values, several performance criteria were used including regression analysis,
agreement index (D), Mean Absolute Error (MAE), maximum absolute error (MAXE)
and efficiency (EF). These criteria are defined as:
Fig. 2: | Feed-forward artificial neural networks with one layer |
(11) |
(12) |
(13) |
(14) |
(15) |
Where:
Oi | = | Measured value |
Ei | = | Predicted value |
= | mean observed values |
RESULTS AND DISCUSSION
Artificial Neural Network
In this study, ANNs model was performed with neuro solution software. Sixty
percent of the total data was randomized for as training data, 20% of the total
data was randomized as testing performance and 20% was selected for cross validation
performance. ANNs evaporation model with four input variables (air temperature,
relative humidity, solar radiation and wind speed) are considered.
For ANNs model, the number of hidden layers considered after trial and cross validation is two layers and number of hidden neurons is obtained five neurons and the used functions for hidden and output layers are log sigmoid.
The Training Performance
In neuro solution software, 60% of the total data was randomized for training
data. This software does not need for standardized input layer and training
data was used ordinary in this performance. Table 1 shows
a statistical analysis of the PE for training performance. According to Table
1, the best network with 2000 epoch has MSE = 0.0067.
The Test Performance
The testing performance applied a cross-validation method in order to overcome
the over fitting data. The cross-validate method is not to train all of the
training data until MSE was reached to the minimum amount, but is to cross-validate
with the testing data at the end of each performance. The correlation coefficient
and MSE values were used to judge the performance of ANNs for data. Actual and
predicted values of efficiency were also plotted. Table 2
shows a statistical analysis of the reference evapotranspiration for the testing
performance of ANNs. Table 2 shows that for cross validation,
the values of D, EF and r-square (R2) were 0.93, 0.83 and 0.920,
respectively.
Sensitivity of the Pan Evaporation to Meteorological Variables
From Fig. 3, it is clear that increasing in wind speed, air temperature
and solar radiation are significant at 12.566, 7.577 and 5.968 level, while
decreasing relative humidity is significant at the 2.251 level. Wind speed and
air temperature are the most sensitive variables.
Support Vector Machines
The D, EF and r-square (R2) values were used to judge the performance
of SVMs for the data set. One advantage of using SVMs is the use of a quadratic
optimization, which provides a global minimum in comparison with the local minima
with back propagation neural network due to the use of non-linear optimization.
Both ANNs and SVMs were applied for calculating the D, EF and r-square (R2)
using cross-validation and a percentage split method for the input data set
comprising different attributes. Table 3 shows a statistical
analysis of the reference evapotranspiration for the testing performance of
SVMs. Figure 4 and 5 were show the actual
and predicted values of reference evapotranspiration by ANNs and SVMs, that
shows fitting of the measured and predicted values of reference evapotranspiration
by ANNs and SVMs.
Table 1: | Statistical analysis of the pan evaporation for the training performance |
Table 2: | Statistical analysis of the reference evapotranspiration for testing performance |
Table 3: | Statistical analysis of the reference evapotranspiration for testing performance |
Fig. 3: | The sensitivity of the reference evapotranspiration to the four meteorological variables |
Fig. 4: | Predicted reference evapotranspiration using ANNs |
Fig. 5: | Predicted pan evaporation using SVM |
CONCLUSION
Comparison of the R2 and efficiency values also suggests an improved performance by both ANNs and SVMs. A possible reason of the better performance by both ANNs and SVMs may be related to that they have a larger number of user- defined parameters. Based on this study, both ANNs and SVMs approaches work equally well for the data set used. The computation cost involved with SVM is significantly smaller than the ANNs algorithm. Base on the values of the efficiency, the SVMs approaches work better than the ANNs model.