Subscribe Now Subscribe Today
Research Article

A Neural Method based on New Constraints for Stereo Matching of Urban High-resolution Satellite Imagery

E. Zigh and M.F. Belbachir
Facebook Twitter Digg Reddit Linkedin StumbleUpon E-mail

This study presented a simple and fast method for the stereo matching of urban high-resolution satellite images’ pairs. We are interested in the built and the primitive used to match the images is the region, for that, our approach includes two phases: the first one consists of a region segmentation of the images, the second is a neural region matching which is based on new constraints including the geometric and photometric regions properties. In this second phase, the Hopfield neural network used for the matching has the particularity to be initialized by a classical matching method. The interest to make this combination is double: the classical region matching makes it possible to ensure a better initialization of the Hopfield network and this last comes to improve the stereo matching rate and to minimize ambiguities resulting from the classical matching. This network solves the optimization problem by minimizing a cost function whose minimum value represents the best solution, the nodes are the assumptions (the possible correspondences) and the connections between them are the constraints. We compared the method given above with another: classical region matching method applied alone to which we allow the thresholds of the neural system, the results are less good than those of the first one (reduced number of ambiguities but a weak matching rate). Thus, the proposed method is effective to ensure at the same time a reduction in ambiguities and an elevation of the matching rate. It is simple and has a weak cost in computing.

Related Articles in ASCI
Similar Articles in this Journal
Search in Google Scholar
View Citation
Report Citation

  How to cite this article:

E. Zigh and M.F. Belbachir, 2010. A Neural Method based on New Constraints for Stereo Matching of Urban High-resolution Satellite Imagery. Journal of Applied Sciences, 10: 2010-2018.

DOI: 10.3923/jas.2010.2010.2018

Received: April 15, 2010; Accepted: June 10, 2010; Published: July 05, 2010


The binocular stereo is the process of obtaining depth information from a pair of images of a scene taken from two different view points. More these visual angles are important the more is B/H and the more altimetric precision is large, however, stereo matching is all the more difficult.

We generally choose B/H about 0.6 which represents a best compromise between quality of three-dimensional reconstruction and stereo matching simplicity (Paparoditis, 1998). In present case, we have pair of images having a ratio B/H about 0.53.

Thus, the good deduction of the relief is necessary in various fields such as satellite photogrammetry for a town planning as example.

The process of stereovision is often divided into three stages:

Preprocessing: extraction of the points of interest which satisfy some defined characteristics
Stereo matching: The correspondence of the points of interest
Three dimensional reconstruction: Construction of the three dimensional model of the observed scene

The result of each stage influences on the next one, in this study, we are interested at the first two stages: preprocessing and stereo matching applied to urban high-resolution satellite panchromatic images (about 1 m).


Several methods treat the two phases: preprocessing (segmentation) and stereo matching. The choice of one method depends mainly on the concerned application as well as the images’s type to treat.

There exist many researches in the image segmentation field, in particular the region segmentation reported by Pal and Pal (1993), Pan (1994) and Coquerez and Phillips (1995). We choose a mixed region segmentation approach called Split-Merge which realizes in our case-a good compromise between segmentation quality and algorithmic cost. We think that this method is applied for the first time in the field of the high-resolution satellite imagery. The quality of this pre-processing stage results influences automatically on the following one: the stereo matching.

Concerning stereo matching, central phase of the stereovision, particularly complex, it is also the subject of several researches which uses various approaches (Umesh and Aggarwal, 1989; Koschan, 1993). Those are generally divided into two types: intensity based and feature based matching approaches. The intensity based approaches consist in calculating a resemblance measurement between the matched points, it allows an overall dense restitution but it manages with difficulties the discontinuities and homogeneous zones, the absence of semantic information harms a good management of coherence and scene structure (Baillard, 1997). The feature based matching approaches extract primitives more or less structured before putting them in correspondence such as the edge elements (Jordan, 1992), the junctions used by Herman and Kamade (1986) and Chehata (2005) or the homogeneous regions (Randiamasy and Gagalowicz, 1991; Wrobel, 1987; Wang et al., 2004; Todorovic and Narendra, 2007).

In the case of urban satellite images, the presence of superstructures (chimneys for example) or repetitive shapes involve the intensity based matching approaches sensitive and relatively heavy in spite of several constraints use (epipolar, uniqueness, order, etc.). To cure these problems, we choose a feature based matching technique using a region as a primitive and including new constraints. We also show in this study the various acquired advantages of this choice: on one hand, the high dimensionality of the region primitive makes of it a rich descriptor of the images constituent elements (surfaces, grayscales, gravity centers, etc.), on the other hand, it allows a reduction volume to treated data, thus, we lead to a simple, fast and relatively effective stereo matching. The use of one neural network such as the Hopfield one in our work, makes it possible to improve the classical matching results.

The principal contribution of the neural networks in general and the Hopfield networks in particular, resides in the errors correction system by the minimization of an energy function and the coherence of matching through the network connections. Several researchers realized a Hopfield networks stereo matching systems such as Lee et al. (1994), which match the pixels, (Nasrabadi and Choo, 1992) which use as primitive the characteristic points and (Nichari, 1994) which correspond edge points. All these studies based on a Hopfield neural network use some constraints among: the similarity constraint, the continuity constraint, the order constraint, the epipolar constraint, etc. The implementation of these constraints is not always very easy, in particular the epipolar one which requires the knowledge of the intrinsic and extrinsic parameters of the acquisition system (Horaud and Monga, 1995). To divert these problems, we are inspired by similarity constraint principle to use for the first time to our knowledge-a neural system with a whole of specific constraints for a region stereo matching applied on a pair of high resolution satellite images. These new constraints include a set of region characteristics, it is: the similarity constraint, the elongation constraint, the grayscale constraint and the gravity center constraint. From there, we use root mean square deviation criterion RMSD to measure region pairs compatibility: it is a classical matching whose results are introduced in the initialization phase of the Hopfield neural system.

We show in this study that the used approach, considered as a combination of classical and neural matching is simple, fast and allows to improve the results of a classical correspondence used only.


Problem position: It is always difficult, even impossible to segment in a bi-univocal way two images of the same scene in order to match them. Therefore, the segmentation is not an end in itself but a preliminary stage whose quality conditions the success of the results of the next treatments results (Stereo matching). As it doesn’t exist an universal segmentation method which can apply to any image type successfully, we try to choose a method that can adapt to our images’s type and considered to be satisfactory if it realizes a good compromise between algorithmic cost and segmentation quality. It is the Split-Merge method.

Split-Merge segmentation method: The regions segmentation techniques are as diverse as numerous. They consist in breaking up the image into a whole of regions, each one of them being homogeneous within the meaning of beforehand definite attribute. The Split-Merge segmentation method begins with an on-segmentation from the image in disjoined regions, then achieves a fusion of these areas if they satisfied an homogeneity criterion. The majority of these methods require the use of particular data architecture. The quadtree is the much known one, it is simple and has a weak treatment time. The quadtree used in splitting phase means that the root corresponds to the full image and a node has exactly four nodes wire (Fig. 1). An example of Splited-Merged image is shown in Fig. 2a-c.

Fig. 1: Pyramidale structure of the quadtree

Fig. 2: Example of Split-Merge application method, (a) Original image, (b) illustration of over-segmentation effect obtained after the phase of quadtree splitting and (c) elimination of over-segmentations after the merging phase

The Split-Merge algorithm is given as follows:

Split-Merge segmentation parameters: The existence of some adjustable parameters in the Split-Merge segmentation makes it a semi-automatic method. The choice of the one of these parameters such as: the difference of average grayscale, the similar values, the variance, etc make it possible to define a homogeneity criterion or a predicate. We choose the variance for splitting since it is the strongest homogeneity criterion, thus, we use the difference of average grayscale like merge criterion.

Post segmentation: Just after the segmentation phase and before starting the stereo matching one, we apply a global thresholding to the images’s pair in order to be able to eliminate as possible a maximum of details (cars, roads, etc) useless in our application and to label the regions of interest.


Criteria and constraints used in the stereo matching: The purpose of the stereo matching method is to correspond the regions of images’s pair resulting from the segmentation phase. We propose a technique which is based on new constraints including the geometric and photometric properties of regions. These constraints make it possible to reduce the number of homologous to be treated: surface, elongation, average grayscale and gravity center constraints.

Surface similarity constraint: It specifies that the surfaces differences between each regions pair should not exceed a fixed threshold of surface
Region elongation constraint: It is the ratio length over the width of the rectangles exinscrits which include a region (such as their sides are parallel to the eigenvectors of the inertia matrix of each region)
Average grayscale constraint: It is the difference of the average grayscale between each pair of regions
Gravity center constraint: The coordinates of the gravity centers of each regions must be close

The Hopfield network used is characterized by assumptions and correspondence constraints in the following way:

The whole of the assumptions Hi = {hi}, hi corresponds to a potential correspondence of two regions resulting from the two images. For each hypothesis hi, a neuron numbered i will be formed and will correspond to potential correspondence of two regions.

For each hypothesis hi, we allow a value function or confidence degree T (hi) = Ti with –1≤Ti≤1. It represents the neuron state.

The constraints are Wij connections between neurons i and j. The neurons are in agreement if Wij > 0 and disagreement if Wij < 0.

An initial state, which is the Iij input is allocated to each neuron.


Tij = Value function: neuron state: it indicates if the potential matching of regions’ couple (i, j) is valid or not
i, j =

Regions numbers respectively of the right and the left image

Wij,kl = Weight or connection value between the neurons (i, j) and (k, l), indicates that if the couple’s correspondence (i, j) and (k, l) are compatible, the value of connection is strong
Iij =

Initial value or input network. It contains the potential matching value

Choice of several couple of regions candidates to neural matching: The choice of the several couple of regions candidates to neural matching makes it possible to initialize the Hopfield network, i.e., that makes it possible to determine the initial state Iij and the network value function Tij.

To choose these couples, we do a classical correspondence by calculating the similarities between regions according to the surface, the elongation, the average of grayscale and the gravity center position criteria. Let us note that the comparison of the differences in attributes is done on each pair of regions by using the criterion of root mean square difference:

(for the surface of a regions’ pair for example) such as:


i = 1,…, M, M : Full regions number in right image
j = 1,…, N, N :

Full regions number in left image

S(i)–S(j) :

Difference between surfaces

el(i)–el(j) :

Elongation difference of exinscrits rectangles

Mg(i)–Mg(j) :

Difference between average grayscales

cgx(i)–cgy(j) :

Difference between the x coordinate position of the regions' gravity centers

cgy(i)–cgy(j) : Difference between the y coordinate position of the regions' gravity centers

seuil_s, seuil_el, seuil_mg, seuil_cgx, seuil_cgy: Initial thresholds

The network Hopfield is consisted by the whole of selected couples, each candidate couple represents a network neuron.

Initialization of the Hopfield network: The matrix containing the regions’ couples candidates to neural matching represents the network input Iij.

The state of the Tij neurons is fixed according to the following thresholding:


i = 1,…, m, m : Full number of regions in right image resulting from classical matching
j = 1,…, n, n :

Full number of regions in left image resulting from classical matching

S(i)–S(j) :

Difference between surfaces

El(i)–El(j) : Elongations difference of exinscrits rectangles
Mg(i)–Mg(j) :

Difference between averages grayscales

Cgx(i)–CGy(j) : Difference between the x coordinate position of the regions' gravity centers
Cgy(I)–CGy(J) : Difference between the y coordinate position of the regions' gravity centers

Seuil_SN, seuil_ELN, seuil_MGN, seuil_CGXN, seuil_CGYN: New thresholds used for the attribution of the neurons state values (if the second thresholds values

Fig. 3: Bidimensional Hopfield network

are as equals as the first thresholds, all the neurons of the network will be put at 1), the first thresholds permit to limit the number of neurons to be built.

Hopfield network stereo matching process: In our application, the Hopfield network is organized in a two dimensional matrix NxM (Fig. 3), where N and M are the total number of interesting regions in the right and left image, respectively. After the network initialization, we compute the energy function, we take the whole of the network neurons, the energy of each neuron is calculated, so the Lyapunov function for a two dimensional binary (two state) Hopfield network is given by the relation given by Hopfield (1982) and Nasrabadi and Choo (1992):



Tij, Tkl : Neurons states (ij) and (kl)
Wij, kl : Synaptic weight which connects the two neurons (ij) and (kl), knowing that Wij, kl = Wkl, ij, the self feedback to each neuron is Wij, ij = Wkl, kl = 0
Iij :

Initial input to each neuron (ij)

The energy formula is broken up for each Tij neuron in the following way:


A change in the state of neuron ΔTij causes an energy change ΔEij:


This process proceeds repeatedly until arriving at a network stability, at this stage, the neurons values doesn’t change between two successive iterations (which corresponds to a minimum of energy). The iteration count is limited. The stability of the network represents the problem solution.

Thus, we can deduce a Hopfield network matching process as the follows:

Neurons initialization: Computing the initial input Iij, the neurons state Tij and the connections weights Wij, kl.
Calculation of Eij energy according to the Eq. 2 for each network neuron
Modification of each neuron state by the following thresholding formula:



If a neuron state Tij changes, then go to step 2, else, stop


The characteristics of images:

Panchromatic images
Spatial resolution: 1 m
Size: 2364x2364 pixels
Coding: 08 bits/pixel

As the satellite images which we have are very bulky and varied (Fig. 4, 5) (urban areas, rural, occulted, etc.), we use only portions of them (Fig. 6a, b) (of urban type) to test our application. This one can be divided into two great parts: the first consists of the images segmentation and the second comprises matching process.

In circles existing on Fig. 4 and 5, we have shown portions used in our application.

From the first part of application, we find that the number of areas resulting from the right image is 149 and the region’s number resulting from the left image is 160 (Fig. 7a, b).

In the second part of application, the process carries out a classical matching method based on new constraints which is detailed previously. It leads to a number of regions’ couples equal to 93, knowing that the thresholds used in this step are:

Fig. 4: Full right satellite image

Fig. 5: Full left satellite image

seuil_s = 354; seuil_el = 0.15; seuil_mg = 28; seuil_cgx = 4.5; seuil_cgy = 6.6

As shown at Fig. 7, the regions 7 and 4 for example are detected thanks to the classical matching technic like a pair of homologous areas, so, the same label 2 was allotted to them (Fig. 8a, b). While proceeding in the same way for the other areas, we obtain 93 pairs of homologous regions, it corresponds to a rate of matching equal to 60.27%. Unfortunately, we obtain also matching ambiguities: 23 ambiguous areas in the right image and 13 ambiguous areas in the left image (Case 1 of Table 1).

Number 93 obtained constitutes the whole of the neurons to be built.

In the third part of application, the Hopfield network will make it possible to minimize these ambiguities, it reduces them to 09 ambiguous regions in the right image and 08 ambiguous regions in the left image and increases the rate of matching to 79.57% (Fig. 9a, b).

Fig. 6: Portions of original images used. (a) Right original image and (b) left original image

The thresholds used in this case are:

seuil_s = 352; seuil_el = 0.11; seuil_mg= 27.12; seuil_cgx =4.15; seuil_cgy = 6.59

To show the contribution of the neural system once again, we use in the fourth part of application a classical matching technic alone, we allow to him the thresholds used by the neural system (in the third part of application), we notice that the number of ambiguities will certainly lower with a value of 09 ambiguous regions in the right image and 08 ambiguous regions in the left image, but the rate of matching will also decrease to 47.95% (Fig. 10a, b), so, a classical matching technic used alone will make it possible to recognize less homologous areas with less ambiguities (Case 2 of Table 1).

Fig. 7: Segmentation and thresholding of the images’ pair. (a) Right image segmented and thresholded and (b) left image segmented and thresholded

Fig. 8: Classical matching of the images’ pair. (a) Classical matching of right image and (b) classical matching of left image

Fig. 9: Hopfield neural matching of the images’ pair. (a) Hopfield neural matching of right image and (b) Hopfield neural matching of leftt image

Fig. 10: Classical matching applied to images’s pair. (a) Classical matching applied alone to right image and (b) classical matching applied alone to left image

Table 1: Recapitulation of the results

Thus, the advantage of using a Hopfield neural system initialized with the classical matching method guarantees recognition of a maximum homologous regions, so, we obtain a high matching rate with less ambiguities.


We treat in this article a stereo matching problem of pair of urban high-resolution satellite images by using a Hopfield neural system and while proposing new constraints: a surface, an elongation, an average grayscale and a gravity center. Those have the particularity to characterize well the region primitive which has several interests compared to the pixel (such as the high level of interpretation, speed of processing, etc.). These new constraints have also an easier use than the other constraints used in previous works (Lee et al., 1994; Nasrabadi and Choo, 1992; Nichari, 1994), such as the epipolar one which requires the knowledge of the intrinsic and extrinsic parameters of the acquisition system (Horaud and Monga, 1995).

Present study includes two principal phases, the first is the region segmentation and the second is the stereo matching. As the quality of matching strongly depends on that of the segmentation, we choose to carry out a region segmentation type called Division-fuion which we find adapted well to the images’ type which we have (build). This segmentation is mixed combines the two methods Split and Merge and mitigates their disadvantages. A thresholding as post-treatment after the segmentation phase is necessary to remove as possible the zones of not interested regions (roads, cars, etc.) and to label those which remain.

For matching, we carry out two methods: the first is a Hopfield neural method which has the particularity to be initialized by a classical matching, this one determines a whole of potential couples, so it restricts the search space on the basis of new constraints quoted above, it is an initialization of the Hopfield network, this one makes it possible to carry out a neural matching whose principle consists in minimizing a cost function by using the Hebb rule training. Thus, as soon as the imposed constraints are satisfied, the network converges towards a stable state. According to this first method, we obtain a reduced number of ambiguities and a high matching rate. The second method consists of a classical matching used alone, to which we introduce the thresholds of the neural system (a means of privilege), in spite of that, the results are in this case less good than those obtained by the first method (number of ambiguities reduced but a low matching rate).

Thus, neural approach studied in this paper provides both: high matching rate and reduced ambiguities, it is simple, fast and guarantees a good results for the type of treated images.

It can be used in several applications of stereovision such as: localization of objects, pattern recognition and three dimensional reconstructions.

A use of proposed method in stereovision if calibration parameters are available permit to construct three dimensional urban models for example, these models have large and various interests like in calculation of electromagnetic wave propagation, in navigation of virtual universes and in detection of urban change for updating three dimensional data base.

Baillard, C., 1997. Analyse d’images aériennes stéréoscopiques pour la restitution 3D des milieux urbains, détection et caractérisation du sursol (Analyzes of stereoscopic air images for the 3D restitution of urban environments, detection and characterization of the sursol). Doctorat Thesis, High National School of Telecommuncations. France.

Chehata, N., 2005. Modelisation 3D de scenes urbaines a partir d'images satellitaires a tres haute resolution (3D Modeling of urban scene from high resolution satellite images). Doctorat Thesis, University of Rene Descartes, Paris 5, France.

Coquerez, J.P., S. Phillips and P. Bolon, 1995. Image Analysis: Filtering and Segmentation. Masson, Paris, ISBN-13: 9782225849237, pp: 457.

Herman, M. and T. Kamade, 1986. Incremental reconstruction of 3d scenes from multiple, complex images. Artificial Intell., 30: 289-341.
CrossRef  |  

Hopfield, J., 1982. Neural Network and physical systems with emergent collective computational abilities. Proc. Nat. Acad. Sci., 79: 2554-2558.
Direct Link  |  

Horaud, R and O. Monga, 1995. Vision Par Ordinateur : Outils fondamentaux (Computer Vision: Fundamental Tools). Hermes Edition Publisher, Paris.

Jordan, M., 1992. Analyse stereoscopique de vues aeriennes, elaboration d'une description volumique des scenes (Stereoscopic analysis of sights air development of scenes voluminal description). Ph.D. Thesis, Paris-Sud University, Orsay Center.

Koschan, A., 1993. What is New in Computational Stereo since 1989 : A Survey On Current Stereo Papers. Technical University of Berlin, Berlin.

Lee, J.J., J.C. Shim and Y.H. Ha, 1994. Stereocorrespondence using the Hopfield neural network of a new energy function. Pattern Recognition, 27: 1513-1522.
CrossRef  |  

Nasrabadi, N.M. and Y.C. Choo, 1992. Hopfield network for stereo vision correspondence. IEEE Trans. Neural Network., 3: 5-13.
PubMed  |  

Nichari, S., 1994. Solving the correspondence problem using a Hopfield network. IEEE. Int. Conf. Comput. Intell., 6: 4107-4112.
CrossRef  |  

Pal, N.R. and S.K. Pal, 1993. A review on image segmentation techniques. Pattern Recog., 9: 1277-1294.
CrossRef  |  Direct Link  |  

Pan, H.P., 1994. Tow level global optimization of image segmentation. J. Photogrammetery Remote Sens., 49: 21-32.
CrossRef  |  

Paparoditis, N., 1998. Reconstruction tridimensionnelle de paysages peri-urbains en imagerie stereoscopique satellitale haute resolution (tridimensional reconstruction of rural images in high-resolution satellite stereo vision). Doctorat Thesis, University of Nice-sophia Antipolis, France.

Randiamasy, S. and A. Gagalowicz, 1991. Region based stereo matching oriented image processing. Proc. IEEE Conf. Computer Vision Pattern Rec., pp: 736–737.

Todorovic, S. and A. Narendra, 2007. Region based hierarchical image matching. Int. J. Comput. Vision, 78: 47-66.
CrossRef  |  

Umesh, R.D. and J.K. Aggarwal, 1989. Structure from stereo: A review. IEEE Trans. Syst. Man Cybernetics, 19: 1489-1510.

Wang, T., Y. Rui and J.G. Sun 2004. Constraint based region matching for image retrieval. Int. J. Comput. Vision, 56: 37-45.
CrossRef  |  

Wrobel, B., 1987. Stereovision: Cooperation entre l'extraction et la mise en correspondance symbolique des. In Actes MARI’87. Paris, pp: 181-188.

©  2020 Science Alert. All Rights Reserved