INTRODUCTION
The binocular stereo is the process of obtaining depth information from a pair of images of a scene taken from two different view points. More these visual angles are important the more is B/H and the more altimetric precision is large, however, stereo matching is all the more difficult.
We generally choose B/H about 0.6 which represents a best compromise between
quality of threedimensional reconstruction and stereo matching simplicity (Paparoditis,
1998). In present case, we have pair of images having a ratio B/H about
0.53.
Thus, the good deduction of the relief is necessary in various fields such as satellite photogrammetry for a town planning as example.
The process of stereovision is often divided into three stages:
• 
Preprocessing: extraction of the points of interest
which satisfy some defined characteristics 
• 
Stereo matching: The correspondence of the points of
interest 
• 
Three dimensional reconstruction: Construction of the
three dimensional model of the observed scene 
The result of each stage influences on the next one, in this study, we are interested at the first two stages: preprocessing and stereo matching applied to urban highresolution satellite panchromatic images (about 1 m).
CONTEXT AND METHODOLOGY
Several methods treat the two phases: preprocessing (segmentation) and stereo matching. The choice of one method depends mainly on the concerned application as well as the images’s type to treat.
There exist many researches in the image segmentation field, in particular
the region segmentation reported by Pal and Pal (1993),
Pan (1994) and Coquerez and Phillips
(1995). We choose a mixed region segmentation approach called SplitMerge
which realizes in our casea good compromise between segmentation quality and
algorithmic cost. We think that this method is applied for the first time in
the field of the highresolution satellite imagery. The quality of this preprocessing
stage results influences automatically on the following one: the stereo matching.
Concerning stereo matching, central phase of the stereovision, particularly
complex, it is also the subject of several researches which uses various approaches
(Umesh and Aggarwal, 1989; Koschan,
1993). Those are generally divided into two types: intensity based and feature
based matching approaches. The intensity based approaches consist in calculating
a resemblance measurement between the matched points, it allows an overall dense
restitution but it manages with difficulties the discontinuities and homogeneous
zones, the absence of semantic information harms a good management of coherence
and scene structure (Baillard, 1997). The feature based
matching approaches extract primitives more or less structured before putting
them in correspondence such as the edge elements (Jordan,
1992), the junctions used by Herman and Kamade (1986)
and Chehata (2005) or the homogeneous regions (Randiamasy
and Gagalowicz, 1991; Wrobel, 1987; Wang
et al., 2004; Todorovic and Narendra, 2007).
In the case of urban satellite images, the presence of superstructures (chimneys for example) or repetitive shapes involve the intensity based matching approaches sensitive and relatively heavy in spite of several constraints use (epipolar, uniqueness, order, etc.). To cure these problems, we choose a feature based matching technique using a region as a primitive and including new constraints. We also show in this study the various acquired advantages of this choice: on one hand, the high dimensionality of the region primitive makes of it a rich descriptor of the images constituent elements (surfaces, grayscales, gravity centers, etc.), on the other hand, it allows a reduction volume to treated data, thus, we lead to a simple, fast and relatively effective stereo matching. The use of one neural network such as the Hopfield one in our work, makes it possible to improve the classical matching results.
The principal contribution of the neural networks in general and the Hopfield
networks in particular, resides in the errors correction system by the minimization
of an energy function and the coherence of matching through the network connections.
Several researchers realized a Hopfield networks stereo matching systems such
as Lee et al. (1994), which match the pixels,
(Nasrabadi and Choo, 1992) which use as primitive the
characteristic points and (Nichari, 1994) which correspond
edge points. All these studies based on a Hopfield neural network use some constraints
among: the similarity constraint, the continuity constraint, the order constraint,
the epipolar constraint, etc. The implementation of these constraints is not
always very easy, in particular the epipolar one which requires the knowledge
of the intrinsic and extrinsic parameters of the acquisition system (Horaud
and Monga, 1995). To divert these problems, we are inspired by similarity
constraint principle to use for the first time to our knowledgea neural system
with a whole of specific constraints for a region stereo matching applied on
a pair of high resolution satellite images. These new constraints include a
set of region characteristics, it is: the similarity constraint, the elongation
constraint, the grayscale constraint and the gravity center constraint. From
there, we use root mean square deviation criterion RMSD to measure region pairs
compatibility: it is a classical matching whose results are introduced in the
initialization phase of the Hopfield neural system.
We show in this study that the used approach, considered as a combination of classical and neural matching is simple, fast and allows to improve the results of a classical correspondence used only.
SEGMENTATION
Problem position: It is always difficult, even impossible to segment in a biunivocal way two images of the same scene in order to match them. Therefore, the segmentation is not an end in itself but a preliminary stage whose quality conditions the success of the results of the next treatments results (Stereo matching). As it doesn’t exist an universal segmentation method which can apply to any image type successfully, we try to choose a method that can adapt to our images’s type and considered to be satisfactory if it realizes a good compromise between algorithmic cost and segmentation quality. It is the SplitMerge method.
SplitMerge segmentation method: The regions segmentation techniques
are as diverse as numerous. They consist in breaking up the image into a whole
of regions, each one of them being homogeneous within the meaning of beforehand
definite attribute. The SplitMerge segmentation method begins with an onsegmentation
from the image in disjoined regions, then achieves a fusion of these areas if
they satisfied an homogeneity criterion. The majority of these methods require
the use of particular data architecture. The quadtree is the much known one,
it is simple and has a weak treatment time. The quadtree used in splitting phase
means that the root corresponds to the full image and a node has exactly four
nodes wire (Fig. 1). An example of SplitedMerged image is
shown in Fig. 2ac.

Fig. 1: 
Pyramidale structure of the quadtree 

Fig. 2: 
Example of SplitMerge application method, (a) Original image,
(b) illustration of oversegmentation effect obtained after the phase of
quadtree splitting and (c) elimination of oversegmentations after the merging
phase 
The SplitMerge algorithm is given as follows:
SplitMerge segmentation parameters: The existence of some adjustable parameters in the SplitMerge segmentation makes it a semiautomatic method. The choice of the one of these parameters such as: the difference of average grayscale, the similar values, the variance, etc make it possible to define a homogeneity criterion or a predicate. We choose the variance for splitting since it is the strongest homogeneity criterion, thus, we use the difference of average grayscale like merge criterion.
Post segmentation: Just after the segmentation phase and before starting the stereo matching one, we apply a global thresholding to the images’s pair in order to be able to eliminate as possible a maximum of details (cars, roads, etc) useless in our application and to label the regions of interest.
REGION STEREO MATCHING USING THE HOPFIELD NETWORK
Criteria and constraints used in the stereo matching: The purpose of the stereo matching method is to correspond the regions of images’s pair resulting from the segmentation phase. We propose a technique which is based on new constraints including the geometric and photometric properties of regions. These constraints make it possible to reduce the number of homologous to be treated: surface, elongation, average grayscale and gravity center constraints.
• 
Surface similarity constraint: It specifies that the
surfaces differences between each regions pair should not exceed a fixed
threshold of surface 
• 
Region elongation constraint: It is the ratio length
over the width of the rectangles exinscrits which include a region (such
as their sides are parallel to the eigenvectors of the inertia matrix of
each region) 
• 
Average grayscale constraint: It is the difference
of the average grayscale between each pair of regions 
• 
Gravity center constraint: The coordinates of the gravity
centers of each regions must be close 
The Hopfield network used is characterized by assumptions and correspondence constraints in the following way:
The whole of the assumptions H_{i} = {h_{i}}, h_{i}
corresponds to a potential correspondence of two regions resulting from the
two images. For each hypothesis hi, a neuron numbered i will be formed and will
correspond to potential correspondence of two regions.
For each hypothesis h_{i}, we allow a value function or confidence degree T (h_{i}) = T_{i} with –1≤T_{i}≤1. It represents the neuron state.
The constraints are W_{ij} connections between neurons i and j. The neurons are in agreement if W_{ij} > 0 and disagreement if W_{ij} < 0.
An initial state, which is the I_{ij} input is allocated to each neuron.
With:
T_{ij} 
= 
Value function: neuron state: it indicates if the potential
matching of regions’ couple (i, j) is valid or not 
i, j 
= 
Regions numbers respectively of the right and the left image 
W_{ij,kl} 
= 
Weight or connection value between the neurons (i, j) and (k, l),
indicates that if the couple’s correspondence (i, j) and (k, l)
are compatible, the value of connection is strong 
I_{ij} 
= 
Initial value or input network. It contains the potential matching
value 
Choice of several couple of regions candidates to neural matching: The choice of the several couple of regions candidates to neural matching makes it possible to initialize the Hopfield network, i.e., that makes it possible to determine the initial state I_{ij} and the network value function T_{ij}.
To choose these couples, we do a classical correspondence by calculating the similarities between regions according to the surface, the elongation, the average of grayscale and the gravity center position criteria. Let us note that the comparison of the differences in attributes is done on each pair of regions by using the criterion of root mean square difference:
(for the surface of a regions’ pair for example) such as:
Where:
i = 1,…, M, M 
: 
Full regions number in right image 
j = 1,…, N, N 
: 
Full regions number in left image 
S(i)–S(j) 
: 
Difference between surfaces 
el(i)–el(j) 
: 
Elongation difference of exinscrits rectangles 
Mg(i)–Mg(j) 
: 
Difference between average grayscales 
cg_{x}(i)–cg_{y}(j) 
: 
Difference between the x coordinate position of the regions' gravity
centers 
cg_{y}(i)–cg_{y}(j) 
: 
Difference between the y coordinate position of the regions' gravity centers 
seuil_s, seuil_el, seuil_mg, seuil_cgx, seuil_cgy: Initial thresholds
The network Hopfield is consisted by the whole of selected couples, each candidate couple represents a network neuron.
Initialization of the Hopfield network: The matrix containing the regions’ couples candidates to neural matching represents the network input I_{ij}.
The state of the T_{ij} neurons is fixed according to the following thresholding:
Where:
i = 1,…, m, m 
: 
Full number of regions in right image resulting from classical
matching 
j = 1,…, n, n 
: 
Full number of regions in left image resulting from classical matching 
S(i)–S(j) 
: 
Difference between surfaces 
El(i)–El(j) 
: 
Elongations difference of exinscrits rectangles 
Mg(i)–Mg(j) 
: 
Difference between averages grayscales 
Cg_{x}(i)–CG_{y}(j) 
: 
Difference between the x coordinate position of the regions' gravity centers 
Cg_{y}(I)–CG_{y}(J) 
: 
Difference between the y coordinate position of the regions' gravity centers 
Seuil_SN, seuil_ELN, seuil_MGN, seuil_CGXN, seuil_CGYN: New thresholds used for the attribution of the neurons state values (if the second thresholds values

Fig. 3: 
Bidimensional Hopfield network 
are as equals as the first thresholds, all the neurons of the network will be put at 1), the first thresholds permit to limit the number of neurons to be built.
Hopfield network stereo matching process: In our application, the Hopfield
network is organized in a two dimensional matrix NxM (Fig. 3),
where N and M are the total number of interesting regions in the right and left
image, respectively. After the network initialization, we compute the energy
function, we take the whole of the network neurons, the energy of each neuron
is calculated, so the Lyapunov function for a two dimensional binary (two state)
Hopfield network is given by the relation given by Hopfield
(1982) and Nasrabadi and Choo (1992):
Where:
T_{ij}, T_{kl} 
: 
Neurons states (ij) and (kl) 
W_{ij, kl} 
: 
Synaptic weight which connects the two neurons (ij) and (kl), knowing
that W_{ij, kl }= W_{kl, ij}, the self feedback to each
neuron is W_{ij, ij }= W_{kl, kl }= 0 
I_{ij} 
: 
Initial input to each neuron (ij) 
The energy formula is broken up for each T_{ij} neuron in the following way:
A change in the state of neuron ΔT_{ij} causes an energy change ΔE_{ij}:
This process proceeds repeatedly until arriving at a network stability, at this stage, the neurons values doesn’t change between two successive iterations (which corresponds to a minimum of energy). The iteration count is limited. The stability of the network represents the problem solution.
Thus, we can deduce a Hopfield network matching process as the follows:
(1) 
Neurons initialization: Computing the initial input I_{ij},
the neurons state T_{ij} and the connections weights W_{ij, kl.} 
(2) 
Calculation of E_{ij} energy according to the Eq.
2 for each network neuron 
(3) 
Modification of each neuron state by the following thresholding
formula: 
(4)  If
a neuron state T_{ij} changes, then go to step 2, else, stop 
RESULTS AND DISCUSSION
The characteristics of images:
• 
Spatial resolution: 1 m 
As the satellite images which we have are very bulky and varied (Fig.
4, 5) (urban areas, rural, occulted, etc.), we use only
portions of them (Fig. 6a, b) (of urban
type) to test our application. This one can be divided into two great parts:
the first consists of the images segmentation and the second comprises matching
process.
In circles existing on Fig. 4 and 5, we have shown portions used in our application.
From the first part of application, we find that the number of areas resulting
from the right image is 149 and the region’s number resulting from the
left image is 160 (Fig. 7a, b).
In the second part of application, the process carries out a classical matching method based on new constraints which is detailed previously. It leads to a number of regions’ couples equal to 93, knowing that the thresholds used in this step are:

Fig. 4: 
Full right satellite image 

Fig. 5: 
Full left satellite image 
seuil_s = 354; seuil_el = 0.15; seuil_mg = 28; seuil_cgx = 4.5; seuil_cgy = 6.6
As shown at Fig. 7, the regions 7 and 4 for example are detected
thanks to the classical matching technic like a pair of homologous areas, so,
the same label 2 was allotted to them (Fig. 8a, b).
While proceeding in the same way for the other areas, we obtain 93 pairs of
homologous regions, it corresponds to a rate of matching equal to 60.27%. Unfortunately,
we obtain also matching ambiguities: 23 ambiguous areas in the right image and
13 ambiguous areas in the left image (Case 1 of Table 1).
Number 93 obtained constitutes the whole of the neurons to be built.
In the third part of application, the Hopfield network will make it possible
to minimize these ambiguities, it reduces them to 09 ambiguous regions in the
right image and 08 ambiguous regions in the left image and increases the rate
of matching to 79.57% (Fig. 9a, b).

Fig. 6: 
Portions of original images used. (a) Right original image
and (b) left original image 
The thresholds used in this case are:
seuil_s = 352; seuil_el = 0.11; seuil_mg= 27.12; seuil_cgx =4.15; seuil_cgy = 6.59
To show the contribution of the neural system once again, we use in the fourth
part of application a classical matching technic alone, we allow to him the
thresholds used by the neural system (in the third part of application), we
notice that the number of ambiguities will certainly lower with a value of 09
ambiguous regions in the right image and 08 ambiguous regions in the left image,
but the rate of matching will also decrease to 47.95% (Fig. 10a,
b), so, a classical matching technic used alone will make
it possible to recognize less homologous areas with less ambiguities (Case 2
of Table 1).

Fig. 7: 
Segmentation and thresholding of the images’ pair. (a)
Right image segmented and thresholded and (b) left image segmented and thresholded 

Fig. 8: 
Classical matching of the images’ pair. (a) Classical
matching of right image and (b) classical matching of left image 

Fig. 9: 
Hopfield neural matching of the images’ pair. (a) Hopfield
neural matching of right image and (b) Hopfield neural matching of leftt
image 

Fig. 10: 
Classical matching applied to images’s pair. (a) Classical
matching applied alone to right image and (b) classical matching applied
alone to left image 
Table 1: 
Recapitulation of the results 

Thus, the advantage of using a Hopfield neural system initialized with the classical matching method guarantees recognition of a maximum homologous regions, so, we obtain a high matching rate with less ambiguities.
CONCLUSIONS
We treat in this article a stereo matching problem of pair of urban highresolution
satellite images by using a Hopfield neural system and while proposing new constraints:
a surface, an elongation, an average grayscale and a gravity center. Those have
the particularity to characterize well the region primitive which has several
interests compared to the pixel (such as the high level of interpretation, speed
of processing, etc.). These new constraints have also an easier use than the
other constraints used in previous works (Lee et al.,
1994; Nasrabadi and Choo, 1992; Nichari,
1994), such as the epipolar one which requires the knowledge of the intrinsic
and extrinsic parameters of the acquisition system (Horaud
and Monga, 1995).
Present study includes two principal phases, the first is the region segmentation and the second is the stereo matching. As the quality of matching strongly depends on that of the segmentation, we choose to carry out a region segmentation type called Divisionfuion which we find adapted well to the images’ type which we have (build). This segmentation is mixed combines the two methods Split and Merge and mitigates their disadvantages. A thresholding as posttreatment after the segmentation phase is necessary to remove as possible the zones of not interested regions (roads, cars, etc.) and to label those which remain.
For matching, we carry out two methods: the first is a Hopfield neural method
which has the particularity to be initialized by a classical matching, this
one determines a whole of potential couples, so it restricts the search space
on the basis of new constraints quoted above, it is an initialization of the
Hopfield network, this one makes it possible to carry out a neural matching
whose principle consists in minimizing a cost function by using the Hebb rule
training. Thus, as soon as the imposed constraints are satisfied, the network
converges towards a stable state. According to this first method, we obtain
a reduced number of ambiguities and a high matching rate. The second method
consists of a classical matching used alone, to which we introduce the thresholds
of the neural system (a means of privilege), in spite of that, the results are
in this case less good than those obtained by the first method (number of ambiguities
reduced but a low matching rate).
Thus, neural approach studied in this paper provides both: high matching rate and reduced ambiguities, it is simple, fast and guarantees a good results for the type of treated images.
It can be used in several applications of stereovision such as: localization of objects, pattern recognition and three dimensional reconstructions.
A use of proposed method in stereovision if calibration parameters are available permit to construct three dimensional urban models for example, these models have large and various interests like in calculation of electromagnetic wave propagation, in navigation of virtual universes and in detection of urban change for updating three dimensional data base.