About

Our work began with only analysing the performance of 6DoF camera tracking in a known 3D rigid scene under varying camera frame-rate (or strictly exposure time in computational photography parlance). This was mostly driven by our intuitions that high frame-rate should be better because image motion between consecutive frames reduces considerably when frame-rate setting of the camera is turned up. Any tracking algorithm that is aimed towards real-time performance would obviously prefer the images where it has to do less work. Additionaly, since many direct tracking algorithms that work on linearising the cost function to obtain a convex approximation, the linearisations become increasingly more valid because of small motion assumption at high frame-rates. Till then, we only wanted to find out answers to the following very simple questions.

If we have a limited computational budget available on our processor, What is the optimal frame-rate for tracking? Optimal here means mostly in statistical sense the frame-rate that performs best for long term performance.
If we want to work on a given frame-rate, What kind of processor should we use?

However, when doing so, we quickly realised that there are few more parameters we can change that can affect the performance of tracker. These parameters are intertwined with frame-rate when it comes to performance evaluation. The first parameter that springs in mind is image-resolution. We, then decided to be more specific with our questions and changed them to

If we have a limited computational budget available on our processor, What is the optimal frame-rate and image resolution for tracking?
If we want to work on a given frame-rate and image resolution, What kind of processor should we use?
If processing budget allows, we can run more iterations of any tracking algorithm X we are using to obtain more accurate results. So how many more iterations can we run?

Keeping that in mind, we then needed a framework where we can vary all the parameters continuously and compare the performance against a perfect ground-truth to judge which frame-rate is optimal? After debating about real and synthetic framework to use for experiments, we realised that collecting real image data is not very easy because

We cannot obtain perfect ground truth depth-map as well ground truth camera poses using inexpensive equipment for our analysis.
We would like to vary frame-rate and image resolution continuously and most cameras offer standard image resolutions. Also, since we would like to also obtain images at frame-rates as high as 200-400Hz, we then need a sensor that can sample as high as 400Hz to give us ground truth. This is quite a tricky thing to do.
We also realised that in real world scenes, lighting cannot remain same all the time and therefore we needed a way to control scene lighting.
We need a repeatable motion of camera so that all frame-rates are on the same camera trajectory. This demands a sensor that can give us repeatable motion which we can mount our camera on. This is again very hard.

With all these limitations, synthetic framework was the first obvious choice as it allows us to exercise full control of all the parameters we are interested in varying and figure out the answers to our questions (which we can verify in real experiments later on). Our main concern then was to make sure that the synthetic images look as realistic as possible. This is mainly adding realistic motion blur and camera noise in the images.

Remarks: Camera tracking forms the front end of many systems in vision and notably real-time SLAM. The applicability in many different domains is probably the raison d'être of camera tracking still as an active research problem and its widespread popularity. Therefore, it is imperative if we are using camera tracking, we know where it works best and where it breaks. The main goal of our research is to develop systems that can track super-fast motion and can be used to build models of the scene later as we throw the camera in.