The dynamic of a physical system is usually determined by solving a diffrential equation. It normally proceeds by reducing the Differential Equation (DE) to a familiar from with known (compact) solutions using suitable coordinate transformations (Joes and Freeman, 1986; Atfken and Weber, 1995). In notrivial cases however, finding the appropriate transformations is difficult and numerical methods are used. A number of algorithm (e.g., Runge-Kutta, finite difference, etc.) are available for calculating the solution accurately and efficiently (Press et al., 1986; Kunz and Lubbers, 1993). However, these methods are sequential and required much memory space and time as a result have great computatinal cost. Hemmessy and petterson (Saif et al., 2006) have estimate that most computational complex application spent 90% of their execution time in only 10% of their code, hence a parallel approach to run an algorithm can be montrivial . Here, we demonstrate a new method of find solutions of differential equation. Clearly we use an unsupervise feedforword Neural Network to solve Burgers equation that is the one-dimentional quasilinear parabolic partial differential equation. The Neural Network methods have been applied successfully to linear (Monterda and Saloma, 1998) and coupled DEs (Cruito et al., 2001) but not yet been tested with this class of differential equations. An intresting theorem that sheds some light on the capabilities of multilayer networks was proven by kolmogorov and is described in (Lorentz, 1976). This theorem states that only continously increasing function of only one veriable. Solving a DE implies the training of an Neural Network that calculates the solution values at any point in the solution space, including those not considered during the Neural Network training. In this sense, the unsupervised Neural Network learns to solve the differential equation analytically. Because the correct form of the differential equation solution is unknown beforehand, the convenient supervised algorithm for training the neural network can not be applied here, so we propose a new modified algorithm to train the neural network. The Neural Network is trained in an unsupervised manner using an energy (error) function that is derived from the differential equation itself and the governing boundary conditions. The proper choice of the energy function is crucial in training the neural network to learn the analytical properties of the solution. The solution of the differential equations in terms of neural network possesses several interesting features as neural networks learns to solve the differential equation analytically, its computational complexity does not increase quickly with number of sampling points, its rapid calculation of the solution values, its superior interpolation properties as compared to well-established methods such as finite elements (Joes and Freeman, 1986; Arfken and Weber, 1995) and finally implementation ability on existing specialized hardware (neuroprocessor), a fact that will result in a significant computational speedup.
MATERIALS AND METHODS
Mathematical definition: As a prelude to our method, some mathematical definitions and results are given. For a detailed exposition, the reader can refer to the books of Cichocki and Unbehauen (1993).
1-Let f (x) = f (x1, x2, ..., xn) be a real
scalar function of a real vector x ε
The derivative of function f (x) with respect to the vector x is an 1 x n vector,
Its gradient is an n x 1 vector,
2-For a real vector function
where, xε its gradient is defined as
The Jacobian matrix is.
3-Let f (x) = h (g (x), where x ε and
The chain rule of differentiation is Jf (x) = Jh (g (x)
Jg (x) and
4-Let F = h (f (w) and f (w) = [f1 (w) f2 (w) ... fm (w)]T Where w ε and h: The
chain rule for the differentiation of the scalar function F with respect to
the matrix w yields
Illustration of the method: A general structure of feedforward neural
network is shown in Fig. 1. The feedforward neural networks
are the most popular architectures due to their structural flexibility, good
representational capabilities and availability of a large number of training
algorithm (Haykin, 1999). This network consists of neurons arranged in layers
which every neuron in a layer is connected to all neurons of next layer (a fully
connected network). The neurons in the input layer simply store the input values.
The hidden layers and output layer neurons each carry out two calculations.
First they multiply all inputs and a bias (equal to a constant value of 1) by
a weight and then they sum the result representing input net of a neuron O
and second the output of neurons a calculated as a function (named
activation function ) of O. The choice of activation function depends
on application of the network but the sigmoid function is normally being used.
The mathematical formulas for a neuron are:
Where, a0 = x is an n0-dimensional input vector, ak
and fk are the nk -dimensional output and the activation
function vector in the kth layer, respectively. The ωk and bk
for k = 1, 2,..., n are:
We feed the network with input data that propagating through layers from input
layer to output layer. This state of the network is called relax state. In the
train state of the network the adjustable parameters of the networks (weights
and biases) are tuned by a proper learning algorithm to minimize the energy
function of the network.
||An N-layer Feedforward Neural Network
We train the network via the Gradient-descend backpropagation
method (Haykin, 1999) with the nonnegative energy function.
The Gradient-descend has a number of advantages over other optimization techniques
such as Conjugate Gradient Method (CGM) and Limited-Memory quasi-Newton Algorithm
(LMNA) which consider the Jacobin (JM) and the Hessian (HM) Matrix, respectively.
HM contains second-order partial derivatives of the error surface with respect
to the weights, while JM (or Gradient vector) contains the first-order partial
derivatives. Empirical studies have shown that both HM and JM are almost rank-deficient
when used in multilayer perceptron Neural Network due to the intrinsically ill-conditioned
nature of the Neural Network training problem (Saarineen et al., 1991).
Higher-order methods may not converge faster and the computations of HM and
JM and their corresponding inverse are also computationally expensive. Both
LMNA and CGM also involves calculation of the inverse of HM and JM, respectively.
HM and JM are not always guaranteed to be non-singular and therefore their inverse
may be uncomputable. According to the Gradient descend algorithm the changes
of the weights and bias, Δwk Δbk through the
minimization of the energy function with respect to the weights wk
and bias, bk, are:
where, η is learning rate. The value of the learning rate is adaptively
varied from 0 to 1 according to the following rule (Haykin, 1999):
||Increases when the gradient of E with respect to a particular
weight does not change algebraic sign after several (in our case, four)
consecutive iterations that enhances the search for the minimum error
||Increases when the gradient of E change algebraic sign after four consecutive
iteration in the search space and that prevents unnecessary oscillations
||Otherwise, the η value remains the same. The rate is adaptively varied
to account for the change of the error surface along different regions of
single-weight dimension (Yam Chow, 2000; Lou, 1991). When the gradient of
the error (energy), E, with respect to the linear output Ok in
the kth Layer is defined as (Hagan and Menhaj, 1994).
The recurrence relation of the gradient can be obtained by:
This recurrence relation is initialized at the final layer
The error back propagated into the network must be reformulated as the parameters
of the neural network. The exact form of the equations depends on the differential
equation that is considered. In brief, we follow these steps to train the network
||Discretizing the Domain and generate training data set.
||Initializing weights and bias.
||Presenting randomly sampled data.
||Calculating energy function with respect to present values of parameters.
||The flowchart for main process and its train subroutine
||Repeating by going to step 3 until the desired minimum energy has been
||Calculating the outputs of the network for a given point from input domain
as an approximation of the solution.
The flowchart for main process and its train subroutine is shown in
Fig. 2. Once the network is trained for a specific equation with its boundary
and initial conditions, the solution of the equation for a desired input is
obtained at just one cycle of relax state of the network.
A case study: To show the procedure, we will examine Burgers equation
of the form
which is the one-dimensional quasilinear parabolic partial differential equation
and v>0 is the coefficient of the Kinematics Viscosity of the fluid. This
equation was intended as an approach to the study of turbulence, shock wave
and gas dynamics. The equation is considered with the following initial and
We rewrite Eq. 7 as the form
For this PDE, a three-layer feedforward neural network is chosen, consisting
of 2 inputs xm where m = 1,2, H hidden units and one output (u).
The 2 input consist of 2 independent variables x1 = x and x2
= n. We use sigmoid functions for hidden layer neurons and linear function for
output neuron. The accuracy of neural network solutions, depends on how close
the energy function is reduces to zero during training. We choose an energy
function of the form
The energy function is chosen such that E is zero if and only if the terms
in the right-hand side of the equation are all identically equal to zero. To
achieve an accurate solution for u (x, t), training must reduce to a value that
is as close as possible to zero.
The vanishing of E1 ensures that u (x, t) exactly satisfies equation
while the reduction of each E2, E3 and E4 terms
to zero guarantees a unique solution for u (x, t). To reformulate this PDE with
neural network, we derive equations as below:
Where the w (i, j) is the weight between jth neuron of a layer and ith neuron
of next layer. Our training set consists of 30x105 unique combination
of x and t in the following range:
The trained network generalizes solution values including points those not
considered during training state to the solution curves with accuracy of 10-12
for energy function. The accuracy can be controlled efficiently by varying the
number of hidden layer neurons. By trial and error we found that the most trainable
three-layer architecture is one with 40 hidden nodes (H = 40).
RESULTS AND DISCUSSION
Let consider the Burgers equation with following problem:
Problem 1: The first problem is Burgers equation with initial
and homogeneous boundary conditions
The exact solution of the Burgers equation with initial condition, Eq.
13 and boundary conditions, Eq. 14, is obtained as:
where a0 and an for n = 1, 2,… are Fourier Coefficients
and they are defined by the following equations, respectively.
where n ≥ 1
Problem 2: The second problem is the Burgers equation with the initial condition
and homogeneous boundary conditions, Eq. 14 and the exact
solution, Eq. 15, with
Numerical results: Table 1 and Table
2 shows the ANN solutions of problem-1 and problem-2, respectively for different
values of viscosity coefficient.
||Comparison of results at different times for v =1, v = 0.1
and v = 0.01
||Comparision of results at different times for v = 1, v = 0.1,
v = 0.01
It is observed that ANN solutions are seen
to be satisfactory in agreement with the exact solutions. The ANN solutions
are more accurate as compared to solution obtained with numerical method by
Kutluay et al. (2004).
In this study a new method based on ANN has been applied to find solutions for Burgers equation, but this method is straightforwardly applicable to nth order partial differential equations. It is observed that ANN solutions are seen to be satisfactory in agreement with the exact solutions. The ANN solutions are more accurate as compare to solution obtained with numerical method. The neural network here allows us to obtain fast solutions of differential equations starting from randomly sampled data sets. The algorithm allows us to achieve good approximation results without wasting memory space and computational time and therefore reducing the complexity of the problem, due to parallel structure of the network. A comparison to the exact solutions shows that proposed method is able to achieve good results both in accuracy and computational complexity.