Novel RBPF for Mobile Robot SLAM Using Stereo Vision

Journal of Artificial Intelligence

Year: 2008 | Volume: 1 | Issue: 1 | Page No.: 1-11
DOI: 10.3923/jai.2008.1.11

Novel RBPF for Mobile Robot SLAM Using Stereo Vision

Zesu Cai, Jian Cao, Lining Sun and Maohai Li

Abstract: This study presents the novel RBPF for mobile robot SLAM using stereovision to extract landmark information. The particle filter is combined with Gaussian Mixture Unscented Particle Filters (GMUPF) to extending the path posterior by sampling new poses that integrate the current observation that drastically reduces the uncertainty about the robot pose. The landmark position estimation and update is also implemented through GMUPF which a single update step from moving and sensing can be done and the change to the map certainty can be done in constant time. Furthermore, the number of resampling steps is determined adaptively, which seriously reduces the particle depletion problem. The 3D natural point landmarks are structured with matching Scale Invariant Feature Transform (SIFT) feature pairs. The matching for multi-dimension SIFT features is implemented with a KD-tree which introduce the Mahalanobis distance instead of the Euclidean distance for matching features in the time cost of O (log2^N). Experiment results on real robot in our indoor environment show the advantages of our methods over previous approaches.

Fulltext PDF Fulltext HTML

How to cite this article

Zesu Cai, Jian Cao, Lining Sun and Maohai Li, 2008. Novel RBPF for Mobile Robot SLAM Using Stereo Vision. Journal of Artificial Intelligence, 1: 1-11.

Keywords: Mobile robot, SLAM, RBPF, SIFT and stereovision

INTRODUCTION

A key prerequisite for a truly autonomous robot is that it can simultaneous localizes itself and accurately map its surroundings. The problem of achieving this is one of the most active areas in mobile robotics research, which is known as Simultaneous Localization and Mapping (SLAM). Most of the present research on SLAM is based on Extended Kalman filter (EKF) (Leonard et al., 2002), whose primary advantage is that it estimates the fully posterior over landmark maps and robot poses in an online fashion. However, the use of the EKF in practice has two well-known drawbacks. The first one is that linearization can produce highly unstable filters if the assumptions of local linearity is violated. The second one is that the derivation of the Jacobian matrices is nontrivial in most applications and often leads to significant implementation difficulties.

Recently, particle filter have been at the core of solutions to higher dimensional robot problems such as SLAM, which, when phrased as a state estimation problem. RBPF (Murphy and Russell, 2001) is proposed as an effective way of representing alternative hypotheses on robot paths and associated maps. More recently, RBPF is used widely to building map (Stachniss et al., 2005). Daily and Parnichkun (2005) describe the application of Fast SLAM using a trinocular stereo camera. Se et al. (2002) demonstrate the use of SIFT point features as landmarks for the SLAM problem using a trinocular Stereo vision. Gil et al. (2006) describes an approach that builds three dimensional maps using visual landmarks extracted from stereo images of an unmodified environment and solves the SLAM problem.

In this study, we present an investigation into the use of stereo vision (Elinas et al., 2006) for SLAM in indoor environment with 3D feature landmarks, which are structured from the SIFT feature matching pairs. These 2D SIFT features are used to structure 3D landmarks because they are invariant to image scale, rotation and translation as well as partially invariant to illumination changes and affine or 3D projection and their description is implemented with multi-dimensional vector (Lowe, 2004). This combination can result in many highly distinctive landmarks from environment, which simplifies the data association problem to only distinguishing unique landmarks. We present a fast and efficient algorithm which introduce the Mahalanobis distance instead of the Euclidean distance for matching features in a KD-Tree in the time cost of O(log2^N). Following, our approach applies RBPF to estimate a posterior of the path of the robot, where each particle has associated with an entire map, in which each landmark is estimated and updated by a novel algorithm to recursively update the posterior density of SLAM called the Gaussian Mixture Unscented Particle Filters (GMUPF) (Wendel et al., 2005). The filter has equal or batter estimation performance when compared to standard particle filters and the Unscented Particle Filter (UPF) (Julier et al., 2000), at a largely reduced computational cost. The GMUPF combines an Importance Sampling (IS) based measurement update step with an Unscented Kalman Filter (UKF) based Gaussian sum filter for time update and proposal density generation.

NOVEL RBPF FOR SLAM

Consider the case of a mobile robot moving through an unknown environment consisted of a set of landmarks. The landmark n is denoted θ_n. The robot moves according to a known probabilistic motion model , where, s_t denotes the robot state at time t and the control input u_t carried out in the time interval [t-1,t]. As the robot moves around, it takes measurements z_t of its environment through observation model , where, θ is the set of all landmarks and n_t is the index of the particular landmark observed at time t.

The SLAM problem is to recover the posterior distribution p(s^t, θ₁,...,θ_M, z^t, u^t, n^t), where, M is the number of landmarks observed so far and the notation s^t denotes s_l..., s_t (and similarly for other variables). Murphy and Russel (2001) provide an implementation of RBPF for SLAM:

(1)

This can be done efficiently, since the factorization decouples the SLAM problem into a path estimation problem and individual conditional landmark location problems and the p(θ_n,s^t,z^t,n^t) can be computed analytically once s^t and z^t are known and the amount of computation needed for each incremental update stays constant, regardless of the path length. Each map is constructed given z^t and the trajectory s^t represented by the corresponding particle. Each particle is of the form , Where (i) indicates the index of the particle; s^t,(i) is its path estimate, μ⁽ⁱ⁾_m,t and '⁽ⁱ⁾_m,t are the mean and variance of the Gaussian representing the m-th landmark location.

In our methods, the i-th new pose s_t⁽ⁱ⁾ is drawn from the posterior , which takes the measurement z_t into consideration, along with the landmark n_t and s^t-1,(i) is the path up to time t-1 of the i-th particle. It can be shown that can be approximated as closely as desired by a Gaussian Mixture Model (GMM) of the following form:

(2)

Where:

G	=	The number of mixing components
α^(g)	=	The mixing weights

is a normal distribution with mean vector and positive define covariance matrix p_t^(g),(i).

GMUPF Algorithm

By using the UT transformation (Miaohai et al., 2006), we follow GMUPF algorithm to extend the path s^t,(i) by sampling the new poses s^t(i) from the posterior .

Calculate the Sigma Points

(3)

Using Motion Model to Predict
At time t-1, assume the posterior state density is approximated by the G-component GMM:

(4)

and the process and observation noise densities are approximated by the following I and J component GMMs, respectively

(5)

(6)

Where:

V_t-1	=	The process noise
O_t-1	=	The observation or measurement noise

For j = 1,…,J, set . For j = 1,…,J, set and for g = 1,…,G, set .

For = 1,...,", use the motion model to calculate a g″ = 1,...,G″ Gaussian approximate and update the mixing weights. For g″ = 1,...,G″, complete the measurement update step to calculate a Gaussian approximate , update the mixing weights.

Generate the Proposal Distribution
The predictive state density is now approximated:

(7)

The proposal distribution is approximated by the GMM:

(8)

Measurement Update
Draw N samples from the proposal distribution and calculate their corresponding importance weights:

(9)

Normalize the weights:

Use a weighted EM (WEM) algorithm to fit a G-component GMM to the set of weighted particles :

(10)

Inference

The conditional mean state estimate and the corresponding error covariance can be calculated in this ways:

(11)

IMPLEMENTATION DETAILS OF USING STEREO VISION

SIFT Feature Extraction Based Stereo Vision
Let I^L_t and I^R_t denote the right and left gray scale images captured with a stereo camera at a time t. we compute image points of interest from both images by selecting maximal points in the scale space pyramid of a Difference of Gaussians. We extract natural landmarks that correspond to points in the 3-dimensional space. Each landmark is a vector l = {P^G, C^G, α.,s,f}, such that P^G = {X^G,Y^G,Z^G} is a 3-dimensional position vector in the map’s global coordinate frame, C^G is the 3x3 covariance matrix for P^G and α,s,f describe an invariant feature based on the Scale Invariant Feature Transform (SIFT). Parameter α is the orientation of the feature, s is its scale and f is the 128-dimensional key vector which represents the histogram of local edge orientations. Each point is accompanied by its SIFT descriptor and then matched across images. The matching procedure is constrained by the well-known epipolar geometry of the stereo rig and the Euclidean distance of their SIFT keys. In addition, a comparison between SIFT descriptors is used to avoid false correspondence. As a result, at a time t we obtain a set of B observations denoted:

(12)

Where, is the image coordinates of the point and .

KD-Tree Based Feature Matching
Here we describes KD-tree algorithm for determining the matched SIFT feature pairs of left and right images captured along the robot’s path by a stereo vision system. We use each associated SIFT descriptor to improve the data associated process. Each SIFT descriptor is a 128-long vector computed from the image gradient at a local neighbourhood of the interest point. In the approaches (Miaohai et al., 2006), data association is based on the squared Euclidean distance between descriptors. However, when the same point is viewed from significantly different viewpoints, the difference in the descriptor is remarkable. In the presence of similar looking landmarks, this fact causes a significant number of false correspondences. In order to reduce the false correspondence problem, we propose a different method to present the distance function. We consider different views of the same visual landmark as different patterns belonging to C_i. Whenever a landmark is found, it is tracked along p frames and its descriptors d₁,d₂,...,d_p are stored. Then, for each landmark C_i, we compute a mean value and estimate a covariance matrix P_i, assuming the elements in the SIFT descriptor independent. Based on this data we computer the Mahalanobis distance:

(13)

Where, p_i is a diagonal covariance matrix associated with the class C_i with mean value . We computer the Mahalanobis distance for all the landmarks in the map of each particle and assign the correspondence to the landmark that minimizes Mahalanobis distance. Then we can use the following equation to judge the matching for two key-points:

(14)

Where, λ is constant and 0 < λ < 1, if this inequality is satisfied, then the matching is successful and simultaneously eliminates the false matching. Figure 1 shows an example of SIFT feature matching for a pair image from the labor with stereo vision and we obtain 76 matched pairs which the matching accurate rate is higher than 90%.


Fig. 1:	The SIFT feature matches based on KD-tree and the matching pairs are represented by red rectangle. The mismatching pairs are represented by blue +. The matched key-points are 76. The corrected are 68

Observation Model Based Stereo Vision
In the stereovision case, we perform a linear search of the keys in the left image for the best match to each key in the right, subject to epipolar constraints and determine its 3D position and covariance according to the well-known stereo equations:

(15)

Where, f is the focal length of the camera, B is the based-line of the stereo vision, d is the disparity between SIFT keys in the left and right images and [u,v] is the pixel position of the key in the right camera.

An observation is defined as a set of k correspondences between landmarks in the map and the current view. Let F_t = {f₁,...,f_k} be the k SIFT feature key-points observed at time t, in which there are n key-points matching with the 3D landmarks in the map database: and there are m key-points matching the 2D SIFT feature key-points which observed at time t-1 and are not reconstructed and added to the map database: . Then the likelihood of the observation Z_t being obtained as:

Where, z^l_t represented the observation F_l = {f_l,...,f_n} and z_t^v represents the observation F_v = {f_n+1,...,f_n+m}, represents the likelihood of the observation z^l_t given the matching relation n^l_t and represents the likelihood of the observation z^v_t given the matching relation n^v_t, these two likelihood can be calculated separately as follows:

Where, represents the likelihood of the observation being f_j when robot at pose s_t⁽¹⁾ observing the landmark L_fj, represents the likelihood of the observation being f_j when robot at pose s_t⁽ⁱ⁾ observing the landmark V_fj. A pose of the camera, s_t, defines a transformation [R,T]_st from the camera to the global coordinate frame. Let the 3D coordinates of the landmark L_fj be (X^(j), Y^(j), Z^(j)), using the Mahalanobis distance metric, we can obtain as follows:

(16)

Where:

J	=	The Jacobian matrix of the observation equation
G_fj	=	The covariance of L_fj

The maximum observation innovation T₁ is constant (in our case, 3.0), which is selected so as to prevent outlier observations from significantly affecting the observation likelihood.

While the feature V_fj has no 3D spatial information, is only calculated according to epipolar constrain:

Where, I_fj is the image coordinate of the feature V_fj, H_fj is the epipolar line on the image plane corresponding to V_fj at time t and H_j is the epipolar line on the image plane corresponding to the feature f_j at time t-1, dist(•) is the function of the distance between point and line.

After calculating the observation model , which can be used to evaluate the i-th particle weight w_t⁽ⁱ⁾ and w_t⁽ⁱ⁾ is taken as follows:

(17)

Motion Model
An essential component to the implementation of RBPF is the specification of the robot’s motion model. Let I_t and T_t-1 represent the pairs of stereo images taken with the robot’s camera at two consecutive intervals with the robot moving between the two. For each pair of images we detect points of interest, compute SIFT descriptors for them and perform stereo matching, resulting in 2 sets of landmarks L_t-1 and L_t. We compute the camera motion using the Levenberg-Marquardt (LM) algorithm minimizing the re-projection error of the 3D coordinates of the landmarks.

Let be the 3-dimensional vector corresponding to a given [R, T]. Our goal is to iteratively compute a correction term χ:

such as to minimize the vector of error measurement ε for a known stereo vision calibration matrix K, ε is defined as:

(18)

Given an initial estimate for the parameters, we use LM algorithm to solve for χ that minimizes ε in three steps:

•	According to the vector of residuals ε(χ), compute its matrix of derivatives J with respect to the components of χ.

(19)

•	Compute the update rule:

(20)

•

Set

Where, the value of λ controls the distance traveled along the gradient direction and controls the convergence of the solution by switching between pure gradient descent and Newton’s method.

RESULTS AND DISCUSSION

The experiments are performed on a Pioneer 3-DX Mobile Robot incorporating an 800 MHz intel Pentium processor as shown in Fig. 2A. Motor control is performed on the on-board computer, while a 2.4GHz PC connected to the robot by a wireless link provides the main processing power for stereo vision processing and the SLAM software. A Point Grey Bumblebee Stereo-vision mounted at the front of the robot is used for detecting the landmarks. The test environment is a robot laboratory with limited space in Fig. 2B.

One of the most difficult issues in a SLAM implementation is the data association problem, where the hypothesis of assigning an observation to an existing map location is tested. In our implementation the KD-tree Based SIFT and Mahalanobis distance solves this problem. For illustrating the advantages of our methods over previous approaches, we implement SLAM with our novel RBPF and previous method. This experiment is described as follows.

Firstly, the robot is set at the distance of 4 m from the lab door and the robot orientation is parallel with the door, at the same time, there are some obstacles in the front of the robot. While the robot is moving ahead, the stereo pair image frames are captured and processed, building the map of the indoor environment.

In Fig. 3, the experiment results. In the map, S represents the start point of robot path and ‘E’ represents the end point of the robot path, the red points represent the robot path particle, the 2D view of 3D landmarks in the map is represented with blue points. As shown in Fig. 3A, if we increase the number of particles, the performance of conventional RBPF will be improved largely, however, the storage requirement and calculation burden is severely aggravated, owning to each particle associated with a view of the map. Figure 3B shows the built map with the novel RBPF, which adopts 50 particles and 100 particles. The performance of the novel RBPF changes a little with increasing the number of particle, so we can build precise map only with few particles.

Another experiment was carried out in our single lab room, where the compact map is built with our novel RBPF method and 186 pair stereo image frames of size 320x240 are captured.


Fig. 2:	(A) Pioneer mobile robot and (B) experimental environment


Fig. 3:	Experiment results of map building based (A) conventional RBPF: (i) 100 particles, (ii) 500 particles; (B) novel RBPF: (iii) 50 particles, (iv) 100 particles


Fig. 4:	Part of an image from the stereo camera (A) and the corresponding disparity image (B)


Fig. 5:	The Comparison of the three paths, red line present the measured path of robot, green line present the odometer’s path, the blue show the modifying path by Novel RBPF

The real environment of our room is shown in Fig. 4A, this is a typical image of our test environment where the pathways are very narrow with little texture throughout the camera’s field of view. Fig. 4B is the disparity image generated by the stereo correspondence algorithm with lighter shades indicating objects closer to camera.

In order to measure the accuracy of our novel RBPF, we have measured ground truth pose data for different locations. The starting location is node A and the last location is node I, the others are labeled nodes B, C, D, E, F, G, H. We have labeled these locations on the learned map shown in the Fig. 5. The comparison of the three paths that comprise odometer path (green dashed line) and modified path by the novel RBPF (blue solid line) and the real measures path (red solid line) showed in the Fig. 5. From the Fig. 5, we can see that the odometer data is less deviation from the real measure data in the short term, however, with the robot moving, the deviation become larger. But for the modified data by novel RBPF, the deviation changes a little with the robot moving.

CONCLUSION

This article described a novel algorithm for SLAM problem using stereo vision. Like many previously published SLAM algorithms, our method calculates posterior probability distributions over 3D SIFT featured maps and robot locations. It does so recursively based on a key property of the SLAM problem: The conditional independence of feature estimates given the robot path. This conditional independence gives rise to a factored representation of the posterior using a combination of particle filters for estimating the robot path and the GMUPF for estimating the map. Furthermore, the number of resampling steps is determined adaptively, which seriously reduces the particle depletion problem. In order to reduce the false correspondence in the data association the Mahalanobis distance is introduced in the KD-tree based feature matching instead of the Euclidean distance. We have shown that we can construct accurate metric maps of 3D point landmarks from dense correlation-based stereo without loop closure. Experiment results on real robot in our indoor environment show that the advantages of our methods over previous approaches.

REFERENCES

Dailey, M.N. and M. Parnichkun, 2005. Landmark-based simultaneous localization and mapping with stereo vision. Proceedings of the 4th Asian Conference on Industrial Automation and Robotics, Maty 11-13, 2005, Thailand, pp: 108-113.

Elinas, P., R. Sim and J.J. Little, 2006. σOslam: Stereo vision slam using the rao-blackwellised particle filter and a novel mixture proposal distribution. Proceedings of the International Conference on Robotics and Automation, May 15-19, 2006, Orlando, FL., pp: 1564-1570.

Gil, A., O. Reinoso, C. Fernandez, M. Vicente, A. Rottmann and O. Martinez-Mozos, 2006. Simultaneous localization and mapping in unmodified environments using stereo vision. International Conference on Informatics in Control, Automation and Robotics (ICINCO), pp: 302-309.

Julier, S., J. Uhlmann and H.F. Durrant-Whyte, 2000. A new method for the nonlinear transformation of means and covariances in filters and estimators. IEEE Trans. Auto. Control, 45: 477-482.
Direct Link

Leonard, J., J.D. Tard'os, S. Thrun and H. Choset, 2002. Workshop notes of the icra workshop on concurrent mapping and localization for autonomous mobile robots. Proceedings IEEE International Conference Robotics and Autonomous, Washington, DC.

Lowe, D.G., 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 60: 91-110.
CrossRef Direct Link

Li, M., B. Hong, Z. Cai and R. Luo, 2006. Novel rao-blackwellized particle filter for mobile robot SLAM using monocular vision. Int. J. Int. Technol., 1: 63-69.

Murphy, K. and S. Russell, 2001. Rao-Blackwellized Particle Filtering for Dynamic Bayesian Networks, in Sequential Monte Carlo Methods in Practice. Springer Verlag.

Se, S., D. Lowe and J. Little, 2002. Mobile Robot localization and mapping with uncertainty using scale-invariant visual landmarks. Int. J. Robotics Res., 21: 735-758.
CrossRef Direct Link

Stachniss, C., G. Grisetti and W. Burgard, 2005. Recovering particle diversity in a rao-blackwellized particle filter for SLAM after actively closing loops. Proceedings of the International Conference Robotics and Automation, April 18-22, 2005, Barcelona, Spain, pp: 655-660.

Wendel, J., J. Metzger, R. Moenikes, A. Maier and G.F. Trommer, 2005. A performance comparison of tightly coupled GPS/INS navigation systems based on extended and sigma point kalman filters. Proceedings of the ION GNSS, September 13-16, 2005, Long Beach, California, pp: 456-466.

HOME JOURNALS CONTACT

Journal of Artificial Intelligence

Year: 2008 | Volume: 1 | Issue: 1 | Page No.: 1-11 DOI: 10.3923/jai.2008.1.11

Novel RBPF for Mobile Robot SLAM Using Stereo Vision

Zesu Cai, Jian Cao, Lining Sun and Maohai Li

How to cite this article

Zesu Cai, Jian Cao, Lining Sun and Maohai Li, 2008. Novel RBPF for Mobile Robot SLAM Using Stereo Vision. Journal of Artificial Intelligence, 1: 1-11.

Keywords: Mobile robot, SLAM, RBPF, SIFT and stereovision

REFERENCES

Year: 2008 | Volume: 1 | Issue: 1 | Page No.: 1-11
DOI: 10.3923/jai.2008.1.11