Current advances in object recognition and scene understanding have been enabled by the availability of large number of labeled images. Similar advances in RGB‐D image understanding are hampered by the current lack of large labeled datasets in this domain. We have a developed a new technique “cross‐modal distillation” which enables us to transfer supervision from RGB to RGB‐D datasets. We use representations learned from labeled RGB images as a supervisory signal to train representations for depth images, and observe a 6% relative gain in performance for object detection with RGB‐D images, and a 20% relative improvement when only using the depth image
Discussion(0)
No comments yet. Be the first to comment.