Numerous applications for mobile devices require3D vision capabilities, which in turn require depth detectionsince this enables the evaluation of an object’s distance,position and shape. Despite the increasing popularity of depthdetection algorithms, available solutions need expensive hardwareand/or additional ASICs, which are not suitable for low-costcommodity hardware devices. In this paper, we propose alow-cost and low-power embedded solution to provide highspeed depth detection. We extend an existing off-the-shelf VLIWimage processor and perform algorithmic and architecturaloptimizations in order to achieve the requested real-timeperformance speed. Experimental results show that by addingdifferent functional units and adjusting the algorithm to take fulladvantage of them, a 640x480 image pair with 64 disparities1 canbe processed at 36.75 fps on a single processor instance, whichis an improvement of 23% compared to the best state-of-the-artimage processor.