INTRODUCTION
The areas Artificial Neural Networks (ANN) have seen an explosion
of interest over the last decades and are being applied across an extraordinary
range of problem domains. A number of ANN model have been successfully
implemented in solving complex problems (Khamis et al., 2006; Lee,
2008). ANN is a causality model that is widely used to solve complex problems.
The problem becomes more complex in a situation where information is incomplete.
Back propagation networks in ANN may provide us with some solutions. Backpropagation
is the most wide used algorithm for training multilayer feedforward neural
network (Rumelhart and McClellana, 1986). Backpropagation consists of
three layers namely the input layer, at least one intermediate hidden
layer and the output layer. In backpropagation network the connection
weights is in a oneway direction. Typically, units are connected in a
feedforward manner with the input units fully connected to the hidden
layer while the hidden units are fully connected to the output layer.
A backpropagation network consists of many elements or neurons that are
connected by communications channels or connectors. These connectors carry
numeric data arranged by a variety of means and organized into layers.
The network can perform a particular function when certain values are
assigned to the connections or weights between elements. To describe a
system, there is no assumed structure of the model, instead the network
are adjusted or trained so that a particular input leads to a specific
target output (Haykin, 1999; Bansel et al., 1993). The mathematical
model of a backpropagation network comprises a set of simple functions
linked together by weights. The network consists of a set of input x,
output units y and hidden units z, which link the inputs to outputs (Fig.
1). The hidden units extract useful information from inputs and use
them to predict the output. The type of neural network shown here is known
as the multilayer perceptron (Haykin, 1999).
A network with an input vector of elements x_{l} (l = 1, 2,..,N_{i})
is transmitted through a connection that is multiplied by weight, w_{ji},
to give the hidden unit z_{i} (j = 1, 2, 3, …, N_{k}):
where, N_{k} is the number of hidden units and N_{i }is
the number of input units. The hidden units consist of the weighted input
and a bias (w_{j0}). A bias is simply a weight with constant input
of 1 that serves as a constant added to the weight. These inputs are passed
through a layer of activation function f which produces:
The activation functions are designed to accommodate the nonlinearity
in the inputoutput relationships. A common activation function is the
sigmoid or hyperbolic tangent:
The outputs from hidden units pass another layer of filters:
and feed into another activation function F to produce output y (k =
1, 2, 3, …, N_{o})
The weights are adjustable parameters of the network and are determined
from a set of data through the process of training (Lai, 1998; Zhang et
al., 2001). The training of a network is accomplished using an optimization
procedure (such as nonlinear least squares). The objective is to minimize
the Sum of Squares of the Error (SSE) between the measured and predicted
output. There are no assumptions about functional form, or about the distributions
of the variables and errors of the model, ANN model is more flexible than
the standard statistical technique (Fareway and Chatfield, 1998). It allows
for nonlinear relationship and complex classificatory equations. The users
do not need to specify as much details about the functional form before
estimating the classification equation. Instead, it lets the data determine
the appropriate functional form Limsombunchai et al. (2004) and
Herrera et al. (2007).
In accordance to standard analytical practice, the sample size was divided
on a random basis into two sets, namely the training set, which contain
80 and 20% of the total sample, respectively. To evaluate the modeling
accuracy the correlation coefficient, r and Mean Squares Error (MSE) were
calculated. The model with a higher r and lower MSE was considered to
be a relatively superior model.
MATERIALS AND METHODS
In Malaysia, the short term forecast of electricity demand is being
generated by senior engineer at Operation and System Planning Division
from the Tenaga National Berhad, the national power electricity company
in order to arrive at hourly, daily and weekly forecast, various factors
need to be taken into account (Ismail and Mahpol, 2007). Some of the factors
are the daily temperature, weather, days of the week, legal and religious
holidays. Seasonal effects and human behaviour whether they will take
a day off preceding and following the holidays as to take advantage of
a long break (Nima, 2001).
This study proposed the modification of the backpropagation algorithm
to predict the Malaysian daily electricity load demand. Backpropagation
is the most widely used techniques in the ANN literature and the easiest
to understand. Its learning and update procedure is intuitively appealing
because it is based on a relatively simple concept, if the network gives
the wrong answer, the weights are corrected so that the error is lessened
and as a result, future responses of the network are more likely to be
corrected (Klein and Rossin, 1999a; Mandal et al., 2007). When
the network is given an input, the updating of activation value propagates
forward from the input layer of processing units through each internal
layer, to the output layer of processing units (Haykin, 1999). The output
units then provide the network`s response. When the network corrects its
internal parameters, the correction mechanism starts with the output units
and back propagates backward through each internal layer to the input
layer. Backpropagation network is a gradient descent method and its objective
is to minimize the mean squared error between the target values and the
network outputs (Manda et al., 2007). Thus the Mean Square Error
(MSE) function is defined as:
Where:
t_{kj} 
= 
Target value from output node (k) to hidden node (j) 
o_{kj} 
= 
Network value from output node (k) to hidden node (j) 

Fig. 2: 
Proposed error function of backpropagation model with
activation function of 1/(1+e^{2x}) 
The proposed error function for standard backpropagation (mm) is defined
implicitly as:
where, E
_{k} = t
_{k}  a
_{k} and
E_{k} 
= 
Error at output unit k 
t_{k } 
= 
Target value of output unit k 
a_{k } 
= 
An activation of unit k 
A proposed backpropagation error function is shown in Fig.
2, with its errors reducing rapidly compared to MSE (Fig.
3), thus the proposed error will give less iterations for convergence.
The updating weight of standard backpropagation model is proportional
to the negative gradient of weight changes as shown:
Thus, in this case, the above relation in Eq. 1 can be rewritten as:
It is known that, where θ_{k} is a bias term, thus by taking partial derivative
and substituting it into Eq. 8 gives:
Assume tha By using chain rule, this gives:

Fig. 3: 
Mean square error of backpropagation model with activation
function of 1/(1+e^{x}) 
By taking partial derivatives of our first activation function, sigmoid
function and simplifying it by substituting in terms of a_{k},
gives:
Derivative of f (net_{k}) in terms of a_{k} is given
by:
It is known thatThus
taking the partial derivatives with respect to a_{k} gives:
In simplified form, Eq. 12 becomes
By substituting Eq. 11 and 13 into Eq. 10 we have the improved error
signal of backpropagation for the output layer as: and an error signal
for modified backpropagation of the hidden layer which is the same as
standard backpropagation:
ACTIVATION FUNCTION
Activation function: Inverse tangent f(x) = tan^{1}(x) If the arctangent activation function (Fig. 4) is used
instead of sigmoid function then we have another improved error function:
From Eq. 11, we assume By using chain rule, this gives We know that,
a_{k }= f(Net_{k}). For the proposed method, instead
of using sigmoid function we use the arctangent function, tan^{1}
(x). By taking partial derivatives of a_{k} and simplify it by
substituting in terms of a_{k}, gives:
From the Eq. 1618, we have
Improved error signal of backpropagation using inverse tangent is obtained
as:
The derivatives of sigmoid and inverse tangent are shown in Fig.
5. The inverse tangent has higher derivatives value at the saturation
region, when output is approaching ±1 or 0 for sigmoid function.
Activation function: Hyperbolic tangent f(x) = Tan h(x) Another function to be used is hyperbolic tangent (Fig.
6, 7) function,

Fig. 4: 
Activation function of inverse tangent f(x) = tan^{1}(x) 

Fig. 5: 
Derivative of arctangent y_{1} = kos^{2}a_{k}
and sigmoid, y_{2} = 2a_{k} (1a_{k}) 

Fig. 6 : 
Activation function of tan h f(x) = tan h (x) 
Kalman Error (Kalman and Kwasny, 1997) was defined as:

Fig. 7 : 
First derivative of tanh f(x) = tan h (x) 
MODIFICATION OF IMPROVED ERROR WITH MSE
Activation function: Sigmoid f(x) = 1/(1+e^{2x})
We modify the error function further by substituting the error term E_{k}
= t_{k}a_{k} with mean squared error, in the improved
error function in Eq. 6
Where:
E_{k} 
= 
Error at output unit k 
t_{k } 
= 
Target value of output unit k 
a_{k} 
= 
An activation of unit k 
If n = 2, then the error signal becomes:
Activation function: Inverse tangent, f(x) = tan^{1} (x)
If n = 2, then the error signal becomes:
Ativation function: Tangent hyperbolic f(x) = tan h (x)
If n = 2, then the error signal becomes:
RESULTS AND DISCUSSION
The variables used in this study is to train and test the backpropagation
networks including the peak load demand, mean temperature, maximum temperature
and minimum temperature for Peninsular Malaysia, previous day load, previous
7 days load (same day of the previous week) and the previous 6 days load.
The input neurons were selected based on a correlation analysis which
shows the existence of a significant correlation between peak load, lag
loads and temperature. Thus there are seven input signals to the neural
network with one hidden layer and one output layer which produces the
single output  the system peak load. In this study, various combinations
of networks were trained to get the optimum value. For comparison, we
selected networks with 10 and 20 hidden neurons. The performance of proposed
error function was compared to the mean squared error from standard backpropagation.
Both algorithm were measured using MAPE (mean absolute percentage error),
given as:
Table 1: 
Comparison of convergence speed between sigmoid and
arctangent (hidden = 20) 

Table 2: 
Comparison of convergence speed between sigmoid and
arctangent (hidden = 10) 

Table 3: 
Comparison of percentage error between sigmoid and arctangent
(hidden = 20) 

where, p_{i} is the predicted value and a_{i} is the
actual value, N is the number of patterns used for training and testing.
Table 1 and 2 show the convergence
rates of improved error used by backpropagation algorithm for the sigmoid
function and the arctangent function. From the chosen learning rate (η)
and momentum (α), it is clear that improved error with sigmoid function
converges faster compared to improved error with arctangent function.
This is true for all chosen learning rate (η) and momentum (α)
and for both 20 and 10 hidden nodes.
Table 3 shows the forecasting accuracy of daily maximum
load using hidden nodes of 20 and Table 4 shows the
forecast accuracy using 10 hidden nodes. From these results, we find that
the inverse tangent function has smaller learning error compared to sigmoid
function for all chosen learning rate (η) and momentum (α).
When both networks were used for testing for two months ahead, the percentage
errors of both functions was comparable in the range of 58%.
Table 4: 
Comparison of percentage error between sigmoid and arctangent:
(hidden = 10) 


Fig. 8: 
Forecasted Load for arctangent and sigmoid functions
(hidden = 20, Eta = 0.01, alpha = 0.9) 
Figure 8 shows the forecasted load for months of March and
April 2006. From the graph, both activation functions show that there is not
much difference in terms of forecast accuracy. However, but when forecast value
were plotted, we can see that the sigmoid function does follows the actual pattern
better compare to arctangent function, even though the accuracies of both functions
are not quite satisfactory for forecasting purposes.
CONCLUSION
This study introduces improved error signals of backpropagation
using two activation functions, the sigmoid and the arctangent, in forecasting
daily maximum electricity load. The improved error has been tested on
daily maximum load data for year 2006 until 2007 for Peninsular Malaysia.
The result shows that convergence rate of improved error with sigmoid
function is much faster compared to the improved error with arctangent
function. The arctangent function should theoretically yield faster convergence
speed as it has higher derivative values compare to the sigmoid function.
The results are contradictory to the expectation. In terms of forecast
accuracy both function shows similar results, with sigmoid function following
the actual pattern better. From both results, it seems that sigmoid function
with improved error has better convergence rate and better forecast ability
but needs further refinement in terms of forecast accuracy to be used
as forecasting tools.
ACKNOWLEDGMENTS
The research is supported by IRPA Grant Vote No. 030106SF0177
and the Department of Mathematics, Faculty of Science UTM. The authors
would like to thank lecturers and friends for many helpful ideas and discussions.
We also wish to thank TNB for providing necessary data in this study.