Artificial Neural Networks (ANN) in general identify patterns according to
their relationship, responding to related patterns with a similar output. They
are trained to classify certain patterns into groups and then are used to identify
the new ones, which were never presented before. If ANN is trained for instance
to identify a shape, it can correctly classify only incomplete or similar patterns
as compared to the trained ones were (Marcek and Marcek, 2006).
But in case a shape is moved or its size is changed in the input matrix of variables
the neural network identification will fail. The principal lack of the ANN identification
in general is the disability of input pattern generalization. ANN is in principle
a simplified form of polynomial neural network (PNN), which combinations of
variables are missing. Polynomial functions apply those to preserve partial
dependencies of variables. Thats why the ANNs identification cant
utilize data relations (including time-series prediction), which are described
lots of complex systems (Zjavka, 2007).
Lets try to look at the vector of input variables as a no pattern but
bound dependent point set of N-dimensional space. Likewise the ANN pattern classification
works, we can reason an identification of any unknown relations of the input
data variables. The neural network response would be the same to all patterns
(dependent sets), which variables behave the trained dependence. It doesnt
matter what values they become. A multi-parametric non-linear function can describe
this relation to each other. So if we want to create such a function with neural
network, its neurons must apply some n-parametric polynomial functions
to catch the partial dependence of its n-inputs. Biological neural cell
seems apply a similar principle. Its dendrites collect signals coming from other
neurons. But unlike the ANNs the signals interact already in the single branches
(dendrites) likewise the variables of a multi-parametric polynomial. So, this
could be modelled with multiplications of some inputs in polynomials of PNN.
Then these weighted combinations are summed in the cell body and transformed
using time-delayed dynamic periodic activation function (activated neural cell
generates in response to its input signals series of time-delayed output pulses)
(Benuskova, 2002). The period of this function depends
on some input variables and seems to represent the derivative part of a partial
derivation of an entire polynomial (as a term of differential equation).
Polynomial neural network for dependence of variables identification (or Differential polynomial neural network - D-PNN, because it constructs a differential equation) describes a functional dependence of input variables (not entire patterns as ANN does). This could be regarded as a pattern abstraction, similar the brain utilizes, which identification is not based on values of variables but only relations of these. D-PNN forms its functional output as a generalization of input patterns.
GMDH POLYNOMIAL NEURAL NETWORK
General connection between input and output variables is expressed by the Volterra
functional series, a discrete analogue of which is Kolmogorov-Gabor polynomial
||No. of variables
|X(x1, x2,. .., xm)
||Vector of input variables
|A(a1, a2,. .., am),.
||Vectors of parameters
This polynomial can approximate any stationary random sequence of observations
and can be computed by either adaptive methods or system of Gaussian normal
equations (Ivakhnenko, 1971).
The starting point of the new neural network type D-PNN development was the
GMDH polynomial neural network, created by a Ukrainian scientist Aleksey Ivakhnenko
in 1968. When the back-propagation technique was not known yet a technique called
Group Method of Data Handling (GMDH) was developed for neural network structure
design and parameters of polynomials adjustment. He attempted to resemble the
Kolmogorov- Gabor polynomial (1) by using low order polynomials Eq.
2 for every pair of the input values (Galkin, 2000):
The GMDH neuron (Fig. 1) has two inputs and its output is
a quadratic combination of 2 inputs, total 6 weights. Thus GMDH network builds
up a polynomial (actually a multinomial) combination of the input components.
Typical GMDH network (Fig. 2) maps a vector input x to a scalar
output y', which is an estimate of the true function f(x) = y. Each neuron of
the polynomial network fits its output to the desired value y for each input
vector x from the training set.
|| GMDH polynomial neuron
The manner in which this approximation is accomplished is through the use of
linear regression (Galkin, 2000).
In the hope to capture the complexity of a process, this neural network attempts
to decompose it into many simpler relationships each described by a processing
function of a single neuron (2). It defines an optimal structure of complex
system model with identifying non-linear relations between input and output
variables. Polynomial Neural Network (PNN) is a flexible architecture, whose
structure is developed through learning. The number of layers of the PNN is
not fixed in advance but becomes dynamically meaning that this self-organising
network grows over the trained period (Oh et al.,
GENERALIZATION OF PATTERNS
Lets consider an ANN trained to identify a shape. It can correctly identify
only incomplete or similar patterns as compared to the trained one, activating
the same areas of neurons. But in a case the shape is moved or its size is changed
in the input matrix, it will seem to the ANN to be an entirely new pattern.
|| Characteristic points of a changeable pattern
A way to be solved this problem is to define several characteristic points
of the pattern (decomposition of a shape forming the input vector) and try D-PNN
to learn their dependences. These relations define non-linear multi-parametric
functions, creating by multiplications (combinations) of input variables of
polynomials, applied to general pattern identification (Zjavka,
You can see, that the condition of the significant point pattern dependence
of a rectangle (Fig. 3) is defined by equal x-coordinates
of the points Ax = Bx and Cx = Dx
and corresponding y-coordinates Ay = Cy and By
= Dy. Another more complicated diagonal dependence applies the x,y-coordinates
of the points A,D and C,B to oblique square determination. There will have to
equal their x and y-position difference Ax-Ay = Dx-Dy
and sum Cx+Cy = Bx+By. D-PNN recognises
the square shape in this 2-way mannered dependence. This corresponds with recent
human brain researches, which proved recognizing shapes are decomposed into
several elementary elements, activating some biological neural cells as characteristic
marks of a pattern. Human brain does not utilize absolute input values but relative
reciprocal ones, creating by periodic dynamic functions of biological neurons
DIFFERENTIAL POLYNOMIAL NEURAL NETWORK
The basic idea of the authors D-PNN is to approximate a differential
Eq. 3, which can define relations of variables (Hronec,
1958), with a special type of root fractional polynomials - for instance
|u = f(x1, x2,,
||Function of input variables
|a, B(b1, b2,,. .., bn),
C(c11, c12,,. ..)
Elementary methods of the common Differential Equation (DE) solution express
the solution in special elementary functions - polynomials (such as Bessels
functions or power series). Numerical integration of differential equations
is based on an approximation of these using:
||Rational integral functions
I have selected the 1st more simple way using method of integral analogues,
by replacing mathematical operators in equations with ratio of pertinent values
(Kunes et al.,1989).
||Combination degree of n- input variables of numerator
||Combination degree of denominator (m<n)
The fractional polynomials Eq. 4, which describe a partial
dependence of n-input variables of each neuron, are applied as terms of DE Eq.
3 construction. They partly create an unknown multi-parametric non-linear
function, which codes relations of variables. The numerator of equation Eq.
4 is a polynomial of complete n-input combinations of a single neuron and
realizes a new function z of formula Eq. 6. The denominator
of Eq. 4 is a derivative part, which gives a partial mutual
change of certain neuron input variables and its polynomial combination degree
m is less then n. It is arisen from the partial derivation of the complete n-variable
polynomial by competent variable(s). In general it is possible this approximation
express in formula Eq. 6 (Hronec, 1958).
||Function of n-input variables
||Weights of terms
Each layer of the D-PNN consists of blocks of neurons (Fig. 4).
Block contains derivative neurons, one for each fractional polynomial (4) of
a derivative combination of variables. Inputs of constant combination degree
(n = 2,3,
) forming certain combination of variables, enter each block,
where they are substituted into polynomials of neurons. The final function Eq.
6 is formed in each block of the last hidden layer the D-PNN, which polynomials
look like the formula Eq. 5. The root functions of denominators
Eq. 5 are lower then n, according to their combination degree
and take the polynomials of neurons into competent power degree. Blocks of other
hidden layers consist of partial derivations (forming by neurons), which describe
the input variable partial dependence Eq. 4.
Their neurons dont affect the block output but are applied only for total
output calculation by the DE composition (using root functions in blocks of
the last hidden layer). Each block also contains a single polynomial (without
derivative part), which forms its output entered the next hidden layer of the
network Each neuron has 2 vectors of adjustable parameters a, b and each block
consists 1 vector of adjustable parameters of single polynomial. There it is
necessary to adjust not only polynomial parameters, but the D-PNNs structure
too. This means some neurons in role of terms of the DE have to be left out.
IDENTIFICATION OF DEPENDENCIES OF VARIABLES
Consider a very simple dependence of 2-input variables, which difference is constant (for example = 5). D-PNN will learn this relation easily according to training data set by means of Genetic Algorithm (GA).
D-PNN will contain only 1 block of 1 polynomial neuron Eq. 7 as a term of DE (Fig. 5). As the input variables are changing constantly, there is not necessary to add the 2nd term (fractional polynomial of derivate variable x2) in the DE (block), which causes occasional output mistakes by identification. Another example can solve 2 inputs, where the multiplicity of the inputs is constant (Fig. 5).
The input variables are not changing constantly, so there will be necessary both terms of the DE - fractional polynomial of derivative variable x2 too Eq. 8:
|| Identification of a constant quotient of 2 variables (x1
|| Relations of chess pieces
Simple D-PNN composed of these 4 blocks (one block for each pair of input variables) can learn the characteristic point dependence of a rectangle (Fig. 3). If x and y-coordinates of lined points equal, their difference = 0 and D-PNN can identify this. The 2-variable blocks also can solve another example, which shows the dependence of chess pieces (Fig. 6). Input of the neural network is formed by their x and y-positions. If the white rook checks the black bishop (their x or y positions equal) the 2-variable dependence comes true.
Dependence of the oblique square is defined through the diagonal coordinates of its characteristic points (Fig. 3). If difference and sum of the x and y-positions of diagonal points equal, the oblique square dependence condition comes true. This can solve D-PNN consisting of 2 blocks of 4 input variables (Fig. 7). Likewise the chess example shows a diagonal dependence of pieces, if the black bishop checks the white rook (Fig. 6). This can learn again one 4-variable block.
There are totally 14 combinations (neurons) of 4 input variables, for all derivative
terms (1,2,3-combinations) of DE in the block. Some of them have to be taken
off, as cause the D-PNN will work amiss. Each DE term also has an adjustable
weight wi. Neurons can form for example the fractions Eq.
|| Identification of dependencies of 4 variables
D-PNN as well is charged by possible 2-sided dependence change of input variables. For example 1+9 = 10 is the same sum as 9+1 = 10. Some mistakes can occur by identification, if GA adjustment finishes in a local error minimum. However it can be clearly seen that the D-PNN operates. If the sum of the 1st pair of variables is less then the 2nd one, the output of the D-PNN is less than desired and the other hand round. So there is shown a separating plane detaching relative classes, which are the same characteristic likewise it is current by ANNs pattern identification.
Multi-layered D-PNN with 2 or 3-combination blocks can also solve previous examples. For simplicity we construct first the D-PNN for the 3-variable sum dependence identification (Fig. 8). The problem of the multi-layered D-PNN construction reside in that we try to create every partial combination term of a complete DE utilizing some fixed lower combination degree (2, 3) of blocks, while the amount of input variables is higher.
Each executable block of the last hidden layer takes part in the total network output calculation (creates its own DE) utilising its own neurons and back neurons of connected blocks the previous layers (Fig. 8). Blocks of other hidden layers create its output using single adjustable polynomial without derivative part (p = polynomial on Fig. 8), but their neurons are applied only for the total DE composition (in blocks of the last hidden layer). First the blocks of the last hidden layer take its own neurons as 2 basic terms (11) of the DE (6). Subsequently they create 4 terms of the 2nd (previous) hidden layer, using neurons and polynomials of bound blocks. They join these 2 blocks and create 4 fractional terms of the DE utilising 4 derivate variables (of 2 previous blocks) for instance Eq. 12.
|| Identification of the 3-variable sum dependence with 2-variable
The backward connection of the previous layer(s) is realized through polynomials
of the linked 2nd (or 1st) layer blocks. These are directly creating derivative
part in numerator of formulas Eq. 12-13.
Likewise we can create terms of the 1st hidden layer Eq. 13.
We attach all its 3 linked blocks, forming 6 terms of the DE (1 duplicated connection
is not used). The multiplication 2* in denominators of formulas Eq.
11-13 is used to decrease the D-PNN total output value.
There is not used every term of all 12 terms of the complete DE, some of them
have to be eliminated. This indicates 0 or 1 for each term in the executable
blocks of the last hidden layer (creating DE) and are ease to use as genes of
GA adjustment (Obitko, 1998). Parameters of polynomials
are represented by real numbers. A chromosome is a sequence of their values,
which can be easy mutated. The searching space probably contains a great amount
of local error solutions, which GA can finish easily. The advantage of D-PNN
adjustment is using only small training data set (likewise the GMDH PNN does),
to learn any dependence.
|| 3-variable combination block D-PNN
The 3-variable combination block D-PNN of 4 input variables will have 3 hidden layers, each consisting of 4 blocks (for each 3-combination), to reach back all the 1st layer input variables from the last layer executable blocks (Fig. 9). Each block consists of 6 neurons (partial derivations) of all its 1 and 2-combinations. Likewise the previous D-PNN type (Fig. 8) we can construct the partial fractional terms of the DE (for each block of the last hidden layer) from back-connected neurons of previous layers. There is possible to apply only some of the connection parts of fractions. We can also construct the 4 dependent input variable D-PNN using 2-combination blocks. There will be 6 blocks of all input combination couples in the 1st hidden layer and consequently the number of them is increasing each next layer. D-PNN totally will consist of 6 hidden layers of blocks. Some problems can cause the composition of derivative terms from fractions because a lot of input combinations may rise and have to be tested. There might be suitable apply some methods of genetic programming to more accurate DE construction especially if D-PNN contains plenty of input variables.
D-PNN is a new neural network type designed by author, which can learn to identify
any unknown dependencies of data set variables (not entire patterns as the ANNs
do). It doesnt utilize absolute values of variables but relative ones,
likewise the brain does. This identification could be regarded as a pattern
abstraction (or generalization), similar human brain utilizes according to data
relations. But it applies the approximation with time-delayed periodic activation
functions of biological neurons in high dynamic system of behaviour (Benuskova,
2002). D-PNN constructs a differential equation, which describes a system
of dependent variables, with rational integral polynomial functions. Instead
of these some periodic functions could be used (sin, cos) for this operation
(Fig. 10) (Kunes et al.,1989).
Changeable periods ω will replace the derivative parts (denominator) of
equations e.g., Eq. 7-8 in this case Eq.
14-15. Activation functions (e.g., sigmoid) of artificial
neuron seem to be periodic functions too, but their period ω = ∞.
So, this might be a special case (applying by ANN) of common periodic function
(Zjavka, 2008). The problem of the D-PNN construction
resides in the method of the partial DE term composition of all possible combinations
and how the partial derivative (dependence) of some input variables is realized
(through fractional or periodic functions).
|| Transformation of absolute values of variables using periodic
Relations of some data variables describe a lot of complex systems. D-PNN could model their behaviour, for example the weather prediction could be based on many unknown generalized relations of the data (such as pressure, temperature, etc.) instead of time-series prediction utilising pattern identification.