Real-Time Camera Tracking: When Is High Frame-Rate Best?
Ankur Handa, Richard A. Newcombe, Adrien Angeli and Andrew J. Davison

ECCV Paper

Download Commands Jürgen Sturm's RGB-D dataset Personal Notes

We'd like to announce the release of data-set of our paper on analysing the performance of direct iterative whole image alignment based camera tracking for varying frame-rates and image resolution under stringent processing budgets. We used popular ray-tracer POVRay to render our sequences. POVRay is an open source ray-tracer and has been used in computer vision previously. It can be downloaded from POVRay's homepage. Our scene model is obtained from Jaime Vives Piqueres's website. The office scene we used in our paper is available here.

A new set of trajectories will be uploaded very soon. Some of them are tiled in the frame below

	A 30 second long trajectory Download[Images and Poses only]

Our work began with only analysing the performance of 6DoF camera tracking in a known 3D rigid scene under varying camera frame-rate (or strictly exposure time in computational photography parlance). This was mostly driven by our intuitions that high frame-rate should be better because image motion between consecutive frames reduces considerably when frame-rate setting of the camera is cranked up. Any tracking algorithm that is aimed towards real-time performance would obviously prefer the images where it has to do less work. Additionaly, since many direct tracking algorithms that work on linearising the cost function to obtain a convex approximation, the linearisations become increasingly more valid because of small motion assumption at high frame-rates. Till then, we only wanted to find out answers to the following very simple questions.

If we have a limited computational budget available on our processor, What is the optimal frame-rate for tracking? Optimal here means mostly in statistical sense the frame-rate that performs best for long term performance.
If we want to work on a given frame-rate, What kind of processor should we use?

However, when doing so, we quickly realised that there are few more parameters we can change that can affect the performance of tracker. These parameters are intertwined with frame-rate when it comes to performance evaluation. The first parameter that springs in mind is image-resolution. We, then decided to be more specific with our questions and changed them to

If we have a limited computational budget available on our processor, What is the optimal frame-rate and image resolution for tracking?
If we want to work on a given frame-rate and image resolution, What kind of processor should we use?
If processing budget allows, we can run more iterations of any tracking algorithm X we are using to obtain more accurate results. So how many more iterations can we run?

Keeping that in mind, we then needed a framework where we can vary all the parameters continuously and compare the performance against a perfect ground-truth to judge which frame-rate is optimal? After debating about real and synthetic framework to use for experiments, we realised that collecting real image data is not very easy because

We cannot obtain perfect ground truth depth-map as well ground truth camera poses using inexpensive equipment for our analysis.
We would like to vary frame-rate and image resolution continuously and most cameras offer standard image resolutions. Also, since we would like to also obtain images at frame-rates as high as 200-400Hz, we then need a sensor that can sample as high as 400Hz to give us ground truth. This is quite a tricky thing to do.
We also realised that in real world scenes, lighting cannot remain same all the time and therefore we needed a way to control scene lighting.
We need a repeatable motion of camera so that all frame-rates are on the same camera trajectory. This demands a sensor that can give us repeatable motion which we can mount our camera on. This is again very hard.

With all these limitations, synthetic framework was the first obvious choice as it allows us to exercise full control of all the parameters we are interested in varying and figure out the answers to our questions (which we can verify in real experiments later on). Our main concern then was to make sure that the synthetic images look as realistic as possible. This is mainly adding realistic motion blur and camera noise in the images.

Remarks: Camera tracking forms the front end of many systems in vision and notably real-time SLAM. The applicability in many different domains is probably the raison d'être of camera tracking still as an active research problem and its widespread popularity. Therefore, it is imperative if we are using camera tracking, we know where it works best and where it breaks. The main goal of our research is to develop systems that can track super-fast motion and can be used to build models of the scene later as we throw the camera in.

You may also be interested in looking at the RGB-D dataset from Jürgen Sturm available here at this website

We are grateful to ERC for their funding and our colleagues Hauke Strasdat, Steven Lovegrove and Margarita Chli and Thomas Pock for fruitful discussions and exchange of ideas. Our acknowledgments must also go to POVRay artist Jaime Vives Piqueres whose open source scene model we have been using. His advice and suggestions regarding tweaking of his code and prompt replies of our emails demanding various explanations, have been very helpful.