Researchers from the Indian Institute of Technology (IIT), Madras and US-based Northwestern University have developed deep learning algorithms that can significantly improve depth perception and 3D effects in videos shot with smartphone cameras.
According to officials, such algorithms will prevent mobile phone images from being “flat” and give a realistic 3D feel. A crucial advantage of the developed algorithm is that it does not require expensive equipment or an array of lenses to capture videos with depth.
“It is a common complaint, especially among amateur and professional photographers, that photos and videos taken with smartphone cameras have a flat, two-dimensional appearance. Apart from the flat appearance, some 3D features such as the Bokeh effect, the aesthetic background blur that is easy with DSLR camera, a challenge in smartphone cameras,” said Kaushik Mitra, assistant professor, Department of Electrical Engineering, IIT Madras.
“While some mid- and high-end smartphone cameras are now programmed to incorporate such effects in photos, especially in portrait mode, it is not yet possible to display them in videos shot with smartphones,” he added. .
Mitra explained that the advanced professional cameras capture information about both the intensity and direction of light in a scene, known as Light Field (LF), to give the perception of depth.
“The LF shot is achieved through the use of an array of micro-lenses placed between the main camera lens and the camera sensor. Multiple microlenses cannot be placed on mobile phones due to lack of space. Instead, algorithms are being developed that can post-process the image captured by the existing mobile cameras.”
“Artificial intelligence and machine learning techniques are used for such image manipulation. Our team has explored this issue and built a deep learning algorithm that converts the stereo images captured with a smartphone into LF images,” he said.
The research is published in the ‘Proceedings of International Conference on Computer Vision (ICCV), 2021’. “The algorithm first captures two videos (called a stereo pair) simultaneously using the two adjacent cameras present in many smartphones today. These stereo pairs go through a series of steps with deep learning models. The stereo pairs are converted into a 7X7 grid of images, which mimic a 7X7 array of cameras, producing the LF image,” Mitra explained.
“A crucial advantage of the algorithm developed by our team is that it doesn’t require expensive equipment or an array of lenses to capture videos with depth. The Bokeh and other such 3D aesthetic effects can be achieved with a smartphone equipped with a dual camera system. “In addition to providing depth, our algorithm allows us to view the same video not just from one point of view, but from any of the 7×7 grids of points of view,” he said.