|
Imperial College,
London
information systems engineering year
2:
Surprise 1997
|
Real world applications envisaged in most current research projects however, demand more detailed sensor information to provide the robot with better environment-interaction capabilities. Visual sensing can provide the robot with an incredible amount of information about its environment. Visual sensors are potentially the most powerful source of information among all the sensors used on robots to date. Hence, at present, it seems that high resolution optical sensors hold the greatest promises for mobile robot positioning and navigation.
The most common optical sensors include laser-based range finders and photometric cameras using CCD arrays.
However, due to the volume of information they provide, extraction of visual features for positioning is far from straightforward. Many techniques have been suggested for localisation using vision information, the main components of which are listed below:
The environment is perceived in the form of geometric information such as landmarks, object models and maps in two or three dimensions. Localisation then depends on the following two inter-related considerations:
Photometric cameras using an optical lens can be modelled as a pin-hole camera. The co-ordinate system (X, Y, Z) is a three-dimensional camera co-ordinate system, and (x, y) is a sensor (image) co-ordinate system. A three-dimensional feature in an object is projected onto the image plane (x, y). The relationship for this projection is given by (where f is the focal length of the lens):
|
|
The range information is lost in this projection, but the angle or orientation of the object point can be obtained if the focal length is known and the lens does not cause distortion.
The Intrinsic camera parameters include the effective focal length and the image scanning parameters and are used to estimate the physical size of the image plane.
The six Extrinsic camera parameters are used to describe the orientation and position (three for each) of the camera co-ordinate system (X, Y, Z). They represent the relationship between the camera co-ordinates (X, Y, Z) and the real world or object co-ordinates (XW, YW, ZW). Landmarks and maps are usually represented in the real world co-ordinate system.
The problem of localisation is to determine the position and orientation of a mobile robot by matching the sensed visual features in one or more image(s) to the object features provided by landmarks or maps. To obtain accurate estimates for position and orientation, multiple features are required. Depending on the type of sensors, sensing schemes, and representations of the environment, localisation techniques vary significantly.
If it is possible to identify unique features and their positions then the position and orientation of the pin-hole camera can be determined as illustrated in the figure below.
|
|
However, it is not always easy (or possible) to uniquely identify simple features such as points and lines in an image.
If there are only two landmark points, the measurement of angles between the corresponding rays restricts the possible camera position to part of a circle as shown in Figure 3.a
|
|
Three landmark points uniquely determine the camera position which is one of the intersections of the two circles in Figure 3.b.
The point location algorithm first establishes a correspondence between the three landmark points in the environment and three observed features in the image. Then, the algorithm measures the angles between rays.
The main problem is that the two-dimensional observations and the three-dimensional world models are in different forms. This is basically because of the following problems in object recognition in computer vision:
Another technique is to estimate the robots position and heading using it's other/conventional sensors. The approximate position is then used to generate a two-dimensional scene from the stored three-dimensional real world model. The features of this generated scene are then matched against those extracted from the observed image. This technique of image matching speeds up the process of obtaining a position estimate.
|
|
In order to make the matching process simpler, configurations of distinctive and easily identifiable features are matched first. Using a group of features cuts down dramatically on the number of possible comparisons. Making use of rare and easily spotted features is obviously advantageous to making an efficient match..
For constructing the environment model, vision systems usually use image features detected at one or more robot positions. The object features detected in a sensor location become the relative reference for the subsequent sensor locations (ie. a form of dead-reckoning). When correspondences are correctly established, vision methods can provide higher accuracy in position estimation than odometry or inertial navigation systems. However, an important point to note is that odometry and inertial sensors can provide reliable position information up to a certain degree and this can be used to assist the establishment of correspondence by narrowing down the search space for feature matching. A map based on visual object features is clearly an inadequate description of environment structure.
For mobile robots to be of more use in the future, they will need the ability to work in environments which are unknown at the design time of the robot and hence cannot be modelled in advance.
A project which has attempted to implement a method for selforganising the visual perception of a mobile robot to adapt it to the surroundings without the need to define and model the relevant aspects in the environment seems to provide promising possibilities.
The system is able to transform a continuous flow of images by means of a selforganisation process into a limited number of discrete perceptions which can be used for navigation purposes. Every image or scene is analysed in order to determine a set of features which characterise the scene. To define a standard set of features which characterise a scene precisely enough and at the same time be extracted from the image with reasonable effort, in a natural or not specially prepared environment, seems impossible. The various images are grouped into discrete perceptions by means of a quantization of features in a scene. This is achieved by means of a Growing Neural Gas Network (GNG Network). Basically the network chooses certain features and groups them into classes which define the environment it is in.
This system provides a relatively new approach to vision based navigation for mobile robots. It does not use models a priory information of the environment and hence makes no restricting assumptions about its structure. This type of approach seems to hold a key to great possibilities for truly autonomous mobile robots with unusual flexibility.
Clearly vision-based positioning is directly related to most computer vision methods, especially object recognition. So as research in this area progresses, the results can be applied to vision-based positioning. Other relevant areas include structure from stereo motion and contour.
Another approach not mentioned in the above discussion is the idea of global vision. This makes use of several cameras placed at fixed locations in the environment to extend the local sensing capabilities of a mobile robot. The cameras 'track' the robot (and objects moving in its environment) and relay this data to it. The data is analysed to provide appropriate position and navigation information. For a detailed more detailed study of Global Multi-Perspective Perception for Autonomous Mobile Robots, please refer to Arun Katkere's report.
| Odometry | This method uses encoders to measure wheel rotation and/or steering orientation. It is always capable of providing the vehicle with an estimate of its position. |
| Inertial Navigation | This method uses gyroscopes and sometimes accelerometers to measure rate and rotation of acceleration. |
| Active Beacons | This method computes the absolute position of the robot from measuring the direction of incidence of three or more actively transmitted beacons. They must be located at known sites in the environment |
| Artificial Landmark
Recognition |
In this method distinctive artificial landmarks are placed at known locations in the environment. They are usually designed for optimal detectability even under adverse environmental conditions. |
| Natural Landmark
Recognition |
Here landmarks are distinctive features in the environment. There is no need to prepare the environment, but it has to be known in advance. |
| Model Matching | In this method information acquired from the robots onboard sensors is compared to a map or world model of the environment. If the features from the sensor-based map and the world model map match, then the robots absolute location can be estimated. |