INTRODUCTION
Artificial Neural Networks (ANN) in general identify patterns according to
their relationship, responding to related patterns with a similar output. They
are trained to classify certain patterns into groups and then are used to identify
the new ones, which were never presented before. If ANN is trained for instance
to identify a shape, it can correctly classify only incomplete or similar patterns
as compared to the trained ones were (Marcek and Marcek, 2006).
But in case a shape is moved or its size is changed in the input matrix of variables
the neural network identification will fail. The principal lack of the ANN identification
in general is the disability of input pattern generalization. ANN is in principle
a simplified form of polynomial neural network (PNN), which combinations of
variables are missing. Polynomial functions apply those to preserve partial
dependencies of variables. That’s why the ANN’s identification can’t
utilize data relations (including timeseries prediction), which are described
lots of complex systems (Zjavka, 2007).
Let’s try to look at the vector of input variables as a no pattern but
bound dependent point set of Ndimensional space. Likewise the ANN pattern classification
works, we can reason an identification of any unknown relations of the input
data variables. The neural network response would be the same to all patterns
(dependent sets), which variables behave the trained dependence. It doesn’t
matter what values they become. A multiparametric nonlinear function can describe
this relation to each other. So if we want to create such a function with neural
network, its neurons must apply some nparametric polynomial functions
to catch the partial dependence of its ninputs. Biological neural cell
seems apply a similar principle. Its dendrites collect signals coming from other
neurons. But unlike the ANNs the signals interact already in the single branches
(dendrites) likewise the variables of a multiparametric polynomial. So, this
could be modelled with multiplications of some inputs in polynomials of PNN.
Then these weighted combinations are summed in the cell body and transformed
using timedelayed dynamic periodic activation function (activated neural cell
generates in response to its input signals series of timedelayed output pulses)
(Benuskova, 2002). The period of this function depends
on some input variables and seems to represent the derivative part of a partial
derivation of an entire polynomial (as a term of differential equation).
Polynomial neural network for dependence of variables identification (or Differential polynomial neural network  DPNN, because it constructs a differential equation) describes a functional dependence of input variables (not entire patterns as ANN does). This could be regarded as a pattern abstraction, similar the brain utilizes, which identification is not based on values of variables but only relations of these. DPNN forms its functional output as a generalization of input patterns.
GMDH POLYNOMIAL NEURAL NETWORK
General connection between input and output variables is expressed by the Volterra
functional series, a discrete analogue of which is KolmogorovGabor polynomial
(Ivakhnenko, 1971):
Where:
m 
: 
No. of variables 
X(x_{1}, x_{2},. .., x_{m}) 
: 
Vector of input variables 
A(a_{1}, a_{2},. .., a_{m}),.
.. 
: 
Vectors of parameters 
This polynomial can approximate any stationary random sequence of observations
and can be computed by either adaptive methods or system of Gaussian normal
equations (Ivakhnenko, 1971).
The starting point of the new neural network type DPNN development was the
GMDH polynomial neural network, created by a Ukrainian scientist Aleksey Ivakhnenko
in 1968. When the backpropagation technique was not known yet a technique called
Group Method of Data Handling (GMDH) was developed for neural network structure
design and parameters of polynomials adjustment. He attempted to resemble the
Kolmogorov Gabor polynomial (1) by using low order polynomials Eq.
2 for every pair of the input values (Galkin, 2000):
The GMDH neuron (Fig. 1) has two inputs and its output is
a quadratic combination of 2 inputs, total 6 weights. Thus GMDH network builds
up a polynomial (actually a multinomial) combination of the input components.
Typical GMDH network (Fig. 2) maps a vector input x to a scalar
output y', which is an estimate of the true function f(x) = y. Each neuron of
the polynomial network fits its output to the desired value y for each input
vector x from the training set.

Fig. 1: 
GMDH polynomial neuron 
The manner in which this approximation is accomplished is through the use of
linear regression (Galkin, 2000).
In the hope to capture the complexity of a process, this neural network attempts
to decompose it into many simpler relationships each described by a processing
function of a single neuron (2). It defines an optimal structure of complex
system model with identifying nonlinear relations between input and output
variables. Polynomial Neural Network (PNN) is a flexible architecture, whose
structure is developed through learning. The number of layers of the PNN is
not fixed in advance but becomes dynamically meaning that this selforganising
network grows over the trained period (Oh et al.,
2003).
GENERALIZATION OF PATTERNS
Let’s consider an ANN trained to identify a shape. It can correctly identify
only incomplete or similar patterns as compared to the trained one, activating
the same areas of neurons. But in a case the shape is moved or its size is changed
in the input matrix, it will seem to the ANN to be an entirely new pattern.

Fig. 3: 
Characteristic points of a changeable pattern 
A way to be solved this problem is to define several characteristic points
of the pattern (decomposition of a shape forming the input vector) and try DPNN
to learn their dependences. These relations define nonlinear multiparametric
functions, creating by multiplications (combinations) of input variables of
polynomials, applied to general pattern identification (Zjavka,
2010).
You can see, that the condition of the significant point pattern dependence
of a rectangle (Fig. 3) is defined by equal xcoordinates
of the points A_{x} = B_{x} and C_{x} = D_{x}
and corresponding ycoordinates A_{y} = C_{y} and B_{y}
= D_{y}. Another more complicated diagonal dependence applies the x,ycoordinates
of the points A,D and C,B to oblique square determination. There will have to
equal their x and yposition difference A_{x}A_{y} = D_{x}D_{y}
and sum C_{x}+C_{y} = B_{x}+B_{y}. DPNN recognises
the square shape in this 2way mannered dependence. This corresponds with recent
human brain researches, which proved recognizing shapes are decomposed into
several elementary elements, activating some biological neural cells as characteristic
marks of a pattern. Human brain does not utilize absolute input values but relative
reciprocal ones, creating by periodic dynamic functions of biological neurons
(Benuskova, 2002).
DIFFERENTIAL POLYNOMIAL NEURAL NETWORK
The basic idea of the author’s DPNN is to approximate a differential
Eq. 3, which can define relations of variables (Hronec,
1958), with a special type of root fractional polynomials  for instance
Eq. 45.
Where:
u = f(x_{1}, x_{2},, …,
x_{n}) 
: 
Function of input variables 
a, B(b_{1}, b_{2},,. .., b_{n}),
C(c_{11}, c_{12,},. ..) 
: 
Parameters 
Elementary methods of the common Differential Equation (DE) solution express
the solution in special elementary functions  polynomials (such as Bessel’s
functions or power series). Numerical integration of differential equations
is based on an approximation of these using:
• 
Rational integral functions 
I have selected the 1st more simple way using method of integral analogues,
by replacing mathematical operators in equations with ratio of pertinent values
(Kunes et al.,1989).
Where:
n 
: 
Combination degree of n input variables of numerator 
m 
: 
Combination degree of denominator (m<n) 
The fractional polynomials Eq. 4, which describe a partial
dependence of ninput variables of each neuron, are applied as terms of DE Eq.
3 construction. They partly create an unknown multiparametric nonlinear
function, which codes relations of variables. The numerator of equation Eq.
4 is a polynomial of complete ninput combinations of a single neuron and
realizes a new function z of formula Eq. 6. The denominator
of Eq. 4 is a derivative part, which gives a partial mutual
change of certain neuron input variables and its polynomial combination degree
m is less then n. It is arisen from the partial derivation of the complete nvariable
polynomial by competent variable(s). In general it is possible this approximation
express in formula Eq. 6 (Hronec, 1958).
Where:
z 
: 
Function of ninput variables 
w_{i} 
: 
Weights of terms 
Each layer of the DPNN consists of blocks of neurons (Fig. 4).
Block contains derivative neurons, one for each fractional polynomial (4) of
a derivative combination of variables. Inputs of constant combination degree
(n = 2,3,…) forming certain combination of variables, enter each block,
where they are substituted into polynomials of neurons. The final function Eq.
6 is formed in each block of the last hidden layer the DPNN, which polynomials
look like the formula Eq. 5. The root functions of denominators
Eq. 5 are lower then n, according to their combination degree
and take the polynomials of neurons into competent power degree. Blocks of other
hidden layers consist of partial derivations (forming by neurons), which describe
the input variable partial dependence Eq. 4.
Their neurons don’t affect the block output but are applied only for total
output calculation by the DE composition (using root functions in blocks of
the last hidden layer). Each block also contains a single polynomial (without
derivative part), which forms its output entered the next hidden layer of the
network Each neuron has 2 vectors of adjustable parameters a, b and each block
consists 1 vector of adjustable parameters of single polynomial. There it is
necessary to adjust not only polynomial parameters, but the DPNN‘s structure
too. This means some neurons in role of terms of the DE have to be left out.
IDENTIFICATION OF DEPENDENCIES OF VARIABLES
Consider a very simple dependence of 2input variables, which difference is constant (for example = 5). DPNN will learn this relation easily according to training data set by means of Genetic Algorithm (GA).
DPNN will contain only 1 block of 1 polynomial neuron Eq. 7 as a term of DE (Fig. 5). As the input variables are changing constantly, there is not necessary to add the 2nd term (fractional polynomial of derivate variable x_{2}) in the DE (block), which causes occasional output mistakes by identification. Another example can solve 2 inputs, where the multiplicity of the inputs is constant (Fig. 5).
The input variables are not changing constantly, so there will be necessary both terms of the DE  fractional polynomial of derivative variable x_{2} too Eq. 8:

Fig. 5: 
Identification of a constant quotient of 2 variables (x_{1}
= 2x2) 

Fig. 6: 
Relations of chess pieces 
Simple DPNN composed of these 4 blocks (one block for each pair of input variables) can learn the characteristic point dependence of a rectangle (Fig. 3). If x and ycoordinates of lined points equal, their difference = 0 and DPNN can identify this. The 2variable blocks also can solve another example, which shows the dependence of chess pieces (Fig. 6). Input of the neural network is formed by their x and ypositions. If the white rook checks the black bishop (their x or y positions equal) the 2variable dependence comes true.
Dependence of the oblique square is defined through the diagonal coordinates of its characteristic points (Fig. 3). If difference and sum of the x and ypositions of diagonal points equal, the oblique square dependence condition comes true. This can solve DPNN consisting of 2 blocks of 4 input variables (Fig. 7). Likewise the chess example shows a diagonal dependence of pieces, if the black bishop checks the white rook (Fig. 6). This can learn again one 4variable block.
There are totally 14 combinations (neurons) of 4 input variables, for all derivative
terms (1,2,3combinations) of DE in the block. Some of them have to be taken
off, as cause the DPNN will work amiss. Each DE term also has an adjustable
weight w_{i}. Neurons can form for example the fractions Eq.
910:

Fig. 7: 
Identification of dependencies of 4 variables 
DPNN as well is charged by possible 2sided dependence change of input variables. For example 1+9 = 10 is the same sum as 9+1 = 10. Some mistakes can occur by identification, if GA adjustment finishes in a local error minimum. However it can be clearly seen that the DPNN operates. If the sum of the 1st pair of variables is less then the 2^{nd} one, the output of the DPNN is less than desired and the other hand round. So there is shown a separating plane detaching relative classes, which are the same characteristic likewise it is current by ANN’s pattern identification.
MULTILAYERED DPNN
Multilayered DPNN with 2 or 3combination blocks can also solve previous examples. For simplicity we construct first the DPNN for the 3variable sum dependence identification (Fig. 8). The problem of the multilayered DPNN construction reside in that we try to create every partial combination term of a complete DE utilizing some fixed lower combination degree (2, 3) of blocks, while the amount of input variables is higher.
Each executable block of the last hidden layer takes part in the total network output calculation (creates its own DE) utilising its own neurons and back neurons of connected blocks the previous layers (Fig. 8). Blocks of other hidden layers create its output using single adjustable polynomial without derivative part (p = polynomial on Fig. 8), but their neurons are applied only for the total DE composition (in blocks of the last hidden layer). First the blocks of the last hidden layer take its own neurons as 2 basic terms (11) of the DE (6). Subsequently they create 4 terms of the 2nd (previous) hidden layer, using neurons and polynomials of bound blocks. They join these 2 blocks and create 4 fractional terms of the DE utilising 4 derivate variables (of 2 previous blocks) for instance Eq. 12.

Fig. 8: 
Identification of the 3variable sum dependence with 2variable
combination blocks 
The backward connection of the previous layer(s) is realized through polynomials
of the linked 2nd (or 1st) layer blocks. These are directly creating derivative
part in numerator of formulas Eq. 1213.
Likewise we can create terms of the 1st hidden layer Eq. 13.
We attach all its 3 linked blocks, forming 6 terms of the DE (1 duplicated connection
is not used). The multiplication 2* in denominators of formulas Eq.
1113 is used to decrease the DPNN total output value.
There is not used every term of all 12 terms of the complete DE, some of them
have to be eliminated. This indicates 0 or 1 for each term in the executable
blocks of the last hidden layer (creating DE) and are ease to use as genes of
GA adjustment (Obitko, 1998). Parameters of polynomials
are represented by real numbers. A chromosome is a sequence of their values,
which can be easy mutated. The searching space probably contains a great amount
of local error solutions, which GA can finish easily. The advantage of DPNN
adjustment is using only small training data set (likewise the GMDH PNN does),
to learn any dependence.

Fig. 9: 
3variable combination block DPNN 
The 3variable combination block DPNN of 4 input variables will have 3 hidden layers, each consisting of 4 blocks (for each 3combination), to reach back all the 1st layer input variables from the last layer executable blocks (Fig. 9). Each block consists of 6 neurons (partial derivations) of all its 1 and 2combinations. Likewise the previous DPNN type (Fig. 8) we can construct the partial fractional terms of the DE (for each block of the last hidden layer) from backconnected neurons of previous layers. There is possible to apply only some of the connection parts of fractions. We can also construct the 4 dependent input variable DPNN using 2combination blocks. There will be 6 blocks of all input combination couples in the 1^{st} hidden layer and consequently the number of them is increasing each next layer. DPNN totally will consist of 6 hidden layers of blocks. Some problems can cause the composition of derivative terms from fractions because a lot of input combinations may rise and have to be tested. There might be suitable apply some methods of genetic programming to more accurate DE construction especially if DPNN contains plenty of input variables.
CONCLUSION
DPNN is a new neural network type designed by author, which can learn to identify
any unknown dependencies of data set variables (not entire patterns as the ANNs
do). It doesn’t utilize absolute values of variables but relative ones,
likewise the brain does. This identification could be regarded as a pattern
abstraction (or generalization), similar human brain utilizes according to data
relations. But it applies the approximation with timedelayed periodic activation
functions of biological neurons in high dynamic system of behaviour (Benuskova,
2002). DPNN constructs a differential equation, which describes a system
of dependent variables, with rational integral polynomial functions. Instead
of these some periodic functions could be used (sin, cos) for this operation
(Fig. 10) (Kunes et al.,1989).
Changeable periods ω will replace the derivative parts (denominator) of
equations e.g., Eq. 78 in this case Eq.
1415. Activation functions (e.g., sigmoid) of artificial
neuron seem to be periodic functions too, but their period ω = ∞.
So, this might be a special case (applying by ANN) of common periodic function
(Zjavka, 2008). The problem of the DPNN construction
resides in the method of the partial DE term composition of all possible combinations
and how the partial derivative (dependence) of some input variables is realized
(through fractional or periodic functions).

Fig. 10: 
Transformation of absolute values of variables using periodic
function 
Relations of some data variables describe a lot of complex systems. DPNN could model their behaviour, for example the weather prediction could be based on many unknown generalized relations of the data (such as pressure, temperature, etc.) instead of timeseries prediction utilising pattern identification.