INTRODUCTION
Assessing future changes in exchange rates has been of long interest to economists
as well as policy makers (Groen, 2005). Exchange rate
plays a principal conduit through which monetary policy affects real activity
and inflation. In order to keep inflation stable at appropriate level and economic
activity at higher level, the monetary authority must have confidence, which
will come through the better understanding of the movements of exchange rate,
in conducting the monetary policy (Pandaa and Narasimhanb,
2007). Also, in order to intervene efficiently in the foreign exchange market,
the policy makers in the central bank must be very much aware of the movement
of exchange rate and its consequences (Pandaa and Narasimhanb,
2007). Multinational corporations in order to gain a competitive advantage
over their rivals are extending in the fast growing emerging markets. Although,
these corporations have enjoyed many benefits from economic growth in these
regions, business operations in the developing economics, the recent financial
turmoil in the developing economics highlights the instability of these growing
economies and stresses the firms’ need to closer scrutinize the foreign exchange
rates. This notion has been echoed by many industrial leaders to call for greater
transparency of the foreign exchange markets and enhancing the predictability
of the currency exchange movements (Chen and Leung, 2003).
Perhaps, these are few reasons why monetary authority, policy makers and corporations
might wish to forecast exchange rates (De Grauwe et al.,
1993).
Recently, it is well documented that many economic time series observations
are nonlinear while, a linear correlation structure is assumed among the time
series values therefore, the ARIMA (AutoRegressive Integrated Mmoving Average)
model can not capture nonlinear patterns and approximation of linear models
to complex realworld problem is not always satisfactory. While nonparametric
nonlinear models estimated by various methods such as Artificial Intelligence
(AI), can fit a data base much better than linear models and it has been observed
that linear models, often forecast poorly which limits their appeal in applied
setting (Racine, 2001). Artificial Intelligence (AI)
systems are widely accepted as a technology offering an alternative way to tackle
complex and illdefined problems (Kalogirou, 2003).
They can learn from examples, are fault tolerant in the sense that they are
able to handle noisy and incomplete data, are able to deal with nonlinear problems
and once trained can perform prediction and generalization at high speed (Kalogirou,
2003). They have been used in diverse applications in control, robotics,
pattern recognition, forecasting, medicine, power systems, manufacturing, optimization,
signal processing and social/psychological sciences. AI systems comprise areas
like expert systems, ANNs, genetic algorithms, fuzzy logic and various hybrid
systems, which combine two or more techniques (Kalogirou,
2003). Among the mentioned AI systems, according to Hykin
(1994), a neural network is a massively paralleldistributed processor that
has a natural propensity for storing experiential knowledge and making it available
for use. Also, the greatest advantage of a neural network is its ability to
model complex nonlinear relationship without a priori assumptions of the nature
of the relationship like a black box (Karayiannis and Venetsanopoulos,
1993).
Concerning the application of neural nets to economic time series forecasting,
there have been mixed reviews. For instance, Mark and Sul
(2001) and Groen (2005b) use panels of between 3
to 17 OECD countries to first test for cointegration between the exchange rate
and monetary fundamentals and secondly use this cointegrating relationship to
successfully predict exchanges rates at horizons of three to four years. Cifter
(2008) proposed wavelet network to investigate the effects of international
F/X markets on emerging markets currencies. They used EUR/USD parity as input
indicator and three emerging markets currencies as output indicator. Using wavelet
networks, they found that the effects of international F/X markets increase
with higher timescale. They also find that the effects of EUR/USD parity on
Turkish lira are higher on 916 days and 3364 days scales. Fahimifard
et al. (2009) compared the ANFIS and ARIMA models for forecasting
1, 2 and 4 week(s) ahead of Iran’s poultry retail price. Their study stated
that ANFIS outperforms ARIMA in all three horizons and the effective role of
ANFIS model to improve the Iran’s poultry retail price forecasting accuracy
can’t be denied.
In this study, we compare the accuracy of Adaptive NeuroFuzzy Interface System (ANFIS) and Artificial Neuro Network (ANN) as the nonlinear models and GRCH and ARIMA as the linear models to forecasting 2, 4 and 8 days ahead of daily Iran Rial/US$ and Rial/using data collected from the Central Bank of Iran (CBI) website and forecast evaluation criteria include; R^{2}, MAD and RMSE.
MATERIALS AND METHODS
AutoRegressive Integrated Moving Average (ARIMA) Model: Introduced
by Box and Jenkins (1978), in the last few decades the
ARIMA model has been one of the most popular approaches of linear time series
forecasting methods. An ARIMA process is a mathematical model used for forecasting.
One of the attractive features of the BoxJenkins approach to forecasting is
that ARIMA processes are a very rich class of possible models and it is usually
possible to find a process which provides an adequate description to the data.
The original BoxJenkins modeling procedure involved an iterative threestage
process of model selection, parameter estimation and model checking. Recent
explanations of the process (Makridakis et al., 1998)
often add a preliminary stage of data preparation and a final stage of model
application (or forecasting).
The ARIMA (p, d, q) model is as follow:
where, y_{t} and e_{t} are the target value and random error at time t, respectively, φ_{i}(i = 1, 2,…,p) and θ_{j}(1, 2,…,q) are model parameters, p and q are integers and often referred to as orders of autoregressive and moving average polynomials.
Generalized AutoRegressive Conditional Hetroskedasticity (GARCH) model:
Heteroscedasticity is often associated with crosssectional data, whereas time
series are usually studied in the context of homoscedastic processes. In analysis
of macroeconomic data, Engle (1982, 1983)
and Cragg (1982) found evidence that for some kinds
of data, the disturbance variances in timeseries models were less stable than
usually assumed. Engle’s results suggested that in models of inflation,
large and small forecast errors appeared to occur in clusters, suggesting a
form of heteroscedasticity in which the variance of the forecast error depends
on the size of the previous disturbance. He suggested the autoregressive, conditionally
heteroscedastic, or ARCH, model as an alternative to the usual timeseries process.
More recent studies of financial markets suggest that the phenomenon is quite
common. The ARCH model has proven to be useful in studying the volatility of
inflation (Coulson and Robins, 1985), the term structure
of interest rates (Engle et al., 1985, 1987),
the volatility of stock market returns and the behavior of foreign exchange
markets (Domowitz and Hakkio, 1985; Bollerslev
and Ghysels, 1996), to name but a few.
The simplest form of this model is the ARCH (1) model,
where, u_{t} is distributed as standard normal. It follows that E[ε_{t}x_{t}, ε_{t}–1] = 0, so that E[ε_{t}x_{t}] and E[y_{t}x_{t}] = β’x_{t}. Therefore, this model is a classical regression model.
The model of Generalized AutoRegressive Conditional Heteroscedasticity (GARCH) is defined as follows. The underlying regression is the usual one in (Eq. 2). Conditioned on an information set at time t, denoted Ψ_{t}, the distribution of the disturbance is assumed to be:
Where, the conditional variance is:
Define
and
Then
The model in Eq. 4 is a GARCH (p,q) model, where p refers to the order of the autoregressive part and q refers to the order of the moving average part.
Artificial Neural Network (ANN) model: Many economic time series observations
are nonlinear while, a linear correlation structure is assumed among the time
series values therefore, ARIMA and GARCH models can not capture nonlinear patterns
and, approximation of linear models to complex realworld problem is not always
satisfactory. Therefore, the ANN and ANFIS nonlinear models will be introduced
follows.
The major advantage of neural networks is their flexible capability of nonlinear
modeling. With ANN, there is no need to specify a particular model. Rather,
the model is adaptively based on the features presented from the data (Haoffi
et al., 2007). This datadriven approach is suitable for many empirical
researches where no theoretical guidance is available to suggest an appropriate
data generating process. The most common types of ANN models have been shown
in Fig. 1.
For the purposes of this study, the feedforward backpropagation neural network
(also known as a MLP (Multilayer Perceptron) network) is the neural network
model most widely used in time series forecasting, because it is capable of
resolving a wide variety of problems (Sarle, 2002). MLP
network is made up of an input layer, an output layer and one or more hidden
layers of neurons. As the Fig. 2 shows, each input is weighted
with an appropriate w. The sum of the weighted inputs and the bias forms the
input to the transfer function f.Neurons can use any differentiable transfer
function f to generate their output. In general, transfer function introduces
a degree of nonlinearity that is valuable for most ANN applications and ideally,
it should be continuous, differentiable and monotonic. Feedforward networks
often have hidden layer(s) of sigmoid neurons followed by an output layer of
linear neurons. Two stages may be considered in the MLP network: the running
stage, in which an input pattern is presented to the trained network and transmitted
through successive layers of neurons until reaching an output and the training
or learning stage in which the weights or parameters of the network are iteratively
modified on the basis of a set of input–output patterns known as a training
set, in order to minimize the deviance or error between the output obtained
by the network and the user’s desired output.

Fig. 1: 
Most common types of ANN models 

Fig. 3: 
The scheme of adaptive neural fuzzy inference system 
This is why MLP network learning is said to be supervised. The learning rule
commonly used in this type of network is the back propagation algorithm or gradient
descent method, developed and disseminated by Rumelhart et
al. (1986). In this study, we use the following threelayer feedback
networks:
where, F is the output function of the output layer unit, β_{0} is the bias unit (equal to 1), G is the output function of the hidden layer units j, γ_{kj} denotes the weight for the connection linking input k to the hidden unit j, β_{j} is the weight of outputs from the hidden layers in the output layer unit and x is the input vector.
Adaptive NeuroFuzzy Inference System (ANFIS) model: Fuzzy logic is
a form of multivalued logic derived from fuzzy set theory to deal with reasoning
that is approximate rather than precise. In contrast with binary sets having
binary logic, also known as crisp logic, the fuzzy logic variables may have
a membership value of not only 0 or 1. Just as in fuzzy set theory with fuzzy
logic the set membership values can range (inclusively) between 0 and 1, in
fuzzy logic the degree of truth of a statement can range between 0 and 1 and
is not constrained to the two truth values {true (1), false (0)} as in classic
propositional logic (Von Altrock, 1995). Considering these
advantages in contrast with predescribed models the ANFIS model has been carried
out in this study.
The ANFIS is a neurofuzzy system developed by (Jang and
Sun, 1995). It has a feedforward neural network structure where each layer
is a neurofuzzy system component (Fig. 3).
It simulates TSK (TakagiSugenoKang) fuzzy rule of type 3 where the consequent
part of the rule is a linear combination of input variables and a constant.
The final output of the system is the weighted average of each rule’s output
(Sugeno and Kang, 1988). The form of the type 3 rule
simulated in the system is as follows:
IF x_{1} is A_{1} AND x_{2} is A_{2}
AND… AND x_{p} is A_{p
}THEN y=c_{0}+ c_{1} x_{1}+ c_{2}x_{2+…+}
c_{p} x_{p}
where, x_{1} and x_{2} are the input variables, A_{1} and A_{2 }are the membership functions, y is the output variable and c_{0}, c_{1} and c_{2} are the consequent parameters. The neural network structure contains 6 layers.
• 
Layer 0 is the input layer. It has n nodes where n is the
number of inputs to the system 
• 
The fuzzy part of ANFIS is mathematically incorporated in
the form of Membership Functions (Mfs) 
A membership function can be any continuous and piecewise differentiable function that transforms the input value x into a membership degree, that is to say a value between 0 and 1. The most widely applied membership function is the generalized bell (gbell MF), which is described by the three parameters, a, b and c (Eq. 9). Therefore, layer 1 is the fuzzification layer in which each node represents a membership value to a linguistic term as a Gaussian function with the mean;
where a_{i}, b_{i} and c_{i} are parameters of the function. These are adaptive parameters. Their values are adapted by means of the backpropagation algorithm during the learning stage.
As the values of the parameters change, the membership function of the linguistic term, A_{i} changes. These parameters are called premise parameters. In that layer there exist nxp nodes where n is the number of input variables and p is the number of membership functions. For example, if size is an input variable and there exist two linguistic values for size which are SMALL and LARGE then two nodes are kept in the first layer and they denote the membership values of input variable size to the linguistic values SMALL and LARGE.
• 
Each node in layer 2 provides the strength of the rule by
means of multiplication operator. It performs AND operation 
Every node in this layer computes the multiplication of the input values and
gives the product as the output as in the above equation. The membership values
represented by are
multiplied in order to find the firing strength of a rule where the variable
x_{0}
has linguistic value A_{i} and x_{1} has linguistic value B_{i} in the antecedent part of Rule l.
There are p^{n} nodes denoting the number of rules in layer 2. Each node represents the antecedent part of the rule. If there are two variables in the system namely x_{1} and x_{2} that can take two fuzzy linguistic values, SMALL and LARGE, there exist four rules in the system whose antecedent parts are as follows:
IF x_{1} is SMALL AND x_{2} is SMALL
IF x_{1} is SMALL AND x_{2} is LARGE
IF x_{1} is LARGE AND x_{2} is SMALL
IF x_{1} is LARGE AND x_{2} is LARGE
• 
Layer 3 is the normalization layer which normalizes the strength
of all rules according to the equation: 
where, w_{i} is the firing strength of the ith rule which is computed in layer 2. Node i computes the ratio of the ith rule’s firing strength to the sum of all rules’ firing strengths. There are p^{n} nodes in this layer.
• 
Layer 4 is a layer of adaptive nodes. Every node in this layer
computes a linear function where the function coefficients are adapted by
using the error function of the multilayer feedforward neural network. 
where, p^{i}’s are the parameters where i = n + 1 and n is the
number of inputs to the system (i.e., number of nodes in layer 0). In this example,
since there exist two variables (x_{1} and x_{2}), there are
three parameters p_{0}, p_{1} and p_{2}) in layer 4
and is
the output of layer 3. The parameters are updated by a learning step. Kalman
filtering based on leastsquares approximation and backpropagation algorithm
is used as the learning step.
•  Layer 5 is the output layer whose function is the summation of the net outputs of the nodes in layer 4. The output is computed as: 
where, is
the output of node i in layer 4. It denotes the consequent part of rule i. The
overall output of the neurofuzzy system is the summation of the rule consequences.
The ANFIS uses a hybrid learning algorithm in order to train the network. For the parameters in the layer 1, backpropagation algorithm is used. For training the parameters in the layer 4, a variation of leastsquares approximation or backpropagation algorithm is used.
DATA DESCRIPTION AND FORECAST EVALUATION CRITERIA
The exchange rate data used in this study are daily Iran Rial/U$ and Rial/€,
covering the period from 20 Mar. 2002 to 21 Nov. 2008 with a total of 2436 observations,
as shown in Fig. 4. Although, there is no consensus on how
to split the data for neural network applications, the general practice is to
allocate more data for model building and selection. Most studies in research
use convenient ratio of splitting for inand outsamples such as 70:30, 80:20,
or 90:10%. This investigation selects the 70:30% one. We take the daily data
from 20 Mar. 2002 to 20 Oct. 2006 as insample data set with 1706 observations
for training and the remainder as outsample data set with 730 observations
for testing purposes. The study was carried out in Iran through the 2009:12009:4.
We obtained the daily Iran’s exchange rate time series from the website
of Central Bank of Iran (www.CBI.ir).
For space reasons, the original data are not listed hear and detailed data can
be obtained from the website www.CBI.ir.

Fig. 4: 
Daily Iran Rial/US$ and Rial/Changes from 20 Mar. 2002 to
21 Nov. 2008 
Table 1: 
Forecasting evaluation criteria 

Where y_{t}, and
n are the actual value, output value and the number of observations, respectively 
In order to evaluate and compare the forecasting performance, it is necessary to introduce forecasting evaluation criteria. In this research, three criteria include; Rsquared, Mean Absolute Deviations (MAD) and Root Mean Square Error (RMSE) are used. Table 1 shows the R^{2}, MAD and RMSE formulation:
RESULTS AND DISCUSSION
Linear models performance to exchange rate forecasting: In ARIMA model we identified the degree of integration (d) by augmented DickeyFuller and Schwarz criteria and the degree of autoregessive (p) and moving average (q) by Loglikelihood function and Akaike Information Criterion. In GARCH model, the Lagrange Multiplier (LM) test was used to identifying the ARCH effects. In order to model specification the degrees of autoregressive (P) and moving average (q) were identified using normal distribution for conditional error terms.
The forecasting performance of Rial/USD and Rial/EUR exchange rates obtained by the ARIMA and GARCH models is shown in Table 2.
The left side of Table 2 shows the outsample fitness of ARIMA and GARCH models for forecasting 2, 4 and 8 days ahead of Rial/USD and Rial/EUR exchange rates in comparison with the actual observations. And its right side presents the quantity of evaluation criterions to forecast the considered horizons of Rial/USD and Rial/EUR exchange rates.
The results of Table 2 shows that ARIMA and GARCH models provide the better forecasting results for Rial/USD in contrast with Rial/EUR by all three performance measures. Also, Table 2 shows that the forecasting accuracy of ARIMA and GARCH models will reduced through the horizon increscent.
Nonlinear models performance to exchange rate forecasting: In ANN a
singlehiddenlayer feedforward network is used for the training in which sigmoid
transfer function is used in the hidden layer and linear transfer function is
used in the output layer. The weights are initialized to small values based
on the technique of Nguyen and Widrow (1990) and mean
square error is the taken as the cost function in our study. We train the network
by using LevenbergMarquardt backpropagation. The number of input nodes, in
this work, corresponds to the number of lagged past observations. However, we
only experiment with four levels of hidden nodes 2, 4, 6 and 8 across each level
of input node by following previous findings (Hu et al.,
1999) that the forecasting performance of neural networks is not as sensitive
to the number of hidden nodes as to the number of input nodes.
Table 2: 
ARIMA and GARACH performance to exchange rate forecasting 

In ANFIS the hybrid learning algorithm is used to identify the membership function parameters of singleoutput, Sugeno type Fuzzy Inference Systems (FIS). A combination of leastsquares and backpropagation gradient descent methods are used for training FIS membership function parameters to model a given set of input/output data. In Genfis1 which, Generates an initial Sugenotype FIS for ANFIS training using a grid partition the gauss and gauss 2 types of membership function are used for each input and linear membership function is used for output. Also, 3 and 4 numbers of membership functions are used for each input.
Table 3: 
ANNperformance to exchange rate forecasting 

Table 4: 
ANFIS performance to exchange rate forecasting 

The forecasting performances of Rial/USD and Rial/EUR exchange rates obtained by the ANN and ANFIS models are shown in Table 3 and 4.
Similarly, the left side of Table 3 and 4 demonstrates the fitness of the best structures of ANN and ANFIS models for forecasting 2, 4 and 8 days ahead of Rial/USD and Rial/EUR exchange rates in comparison with the actual observations. And its right side represents the quantity of evaluation criterions to forecast the considered horizons of Rial/USD and Rial/EUR exchange rates.
In ANN structure e.g. structure (56543211) for forecasting 2 days ahead
of Rial/USD or Rial/EUR the 101 terms 5, 654321 and 1 represent the number
of input nodes, the number of neuron(s) in each hidden node and the number of
output node, respectively.
Table 5: 
Comparison of ARIMA, GARCH, ANN and ANFIS Performance 

In ANFIS structure e.g., structure (gauss4100) for forecasting 2 days ahead
of Rial/USD or Rial/EUR the terms gauss, 4 and 100 represent the type of membership
function, the number of membership function and the number of training epochs,
respectively.
As can bee seen in Table 3 and 4, the ANN and ANFIS nonlinear models perform very well in Rial/USD and Rial/EUR forecasting. Also, similarly the results of Table 3 and 4 state that ANN and ANFIS models provide the better forecasting results for Rial/USD in contrast with Rial/EUR by all three performance measures. Also, Table 3 and 4 shows that the forecasting accuracy of ANN and ANFIS models will reduced through the horizon increscent.
Comparison of linear and nonlinear models performance to exchange rate forecasting: In order to comparison the performance of considered linear and nonlinear models to Rial/USD and Rial/EUR exchange rates forecasting, we divided the quantity of forecast evaluation criterions of GARCH to ARIMA model, various structures of ANN to GARCH model and various structure of ANFIS to the best structure of ANN model per each horizon. Table 5 shows the results of these comparisons.
As can bee shown in Table 5, the ANN and ANFIS nonlinear models forecasting performance is better in contrast with the ARIMA and GARCH linear models because (1) the RMSE, MSE and MAD divided are less than 1 and (2) the R^{2} divided is more than 1.
Also, as to quantity of divided forecast evaluation criterions, Table 5 indicates that ANFIS model provides the best forecasting results for forecasting judging by all four performance measures.
CONCLUSION
The researches previously done in the same field are usually based on traditional
econometrics or static neural network models which their outputs are negligent
in contrast with the assimilated ANFIS model. In this study, the accuracy of
ANFIS and ANN as the nonlinear models and GARCH and ARIMA as the linear models
has been compare for forecasting 2, 4 and 8 days ahead of daily Iran Rial/_
and Rial/US$ exchange rates. Results indicated that nonlinear models especially
ANFIS model forecasts are considerably more accurate than either the linear
traditional GARCH or specially ARIMA models which used as benchmarks in terms
of error measures, such as RMSE, MSE and MAD. On the other hand, as the R^{2}
criterion is concerned, nonlinear models especially ANFIS model are absolutely
better than linear models especially ARIMA model.
Briefly using forecast evaluation criteria we found that nonlinear models outperform linear models, GARCH outperforms ARIMA model and ANFIS outperforms ANN model. And we cannot deny that the ANFIS model is an effective way to improve the Iran Rial/ and Rial/US$ exchange rates forecasting accuracy.