In computer vision, homography is a transformation matrix H when applied on a projective plane maps it to another plane (or image). In the case of Inverse Perspective Mapping (IPM), we want to produce a birds-eye view image of the scene from the front-facing image plane.
In the field of autonomous driving, IPM aids in several downstream tasks such as lane marking detection, path planning and intersection prediction solely from using a monocular camera as this orthographic view is scale-invariant. Emphasising the importance of this technique.
IPM first assume the world to be flat on a plane. Then it maps…
Since the development of convolutional neural networks, object detection has been dominating by anchor-based methods such as FasterRCNN, RetinaNet and SSD. These methods rely on a large number of preset anchors tiled onto the image. Each anchor predicting if an object is contained and the refinement of the coordinates.
Recently, more attention has been geared towards eliminating the requirements for preset anchors, which requires manual tuning on the scale, aspect ratio and number of anchors. To do that, an effective method, FCOS , was proposed which directly find objects based on points tiled on the image.
The main characteristics of…
This post was last updated on 10/08/2020
Measuring distance relative to a camera remains difficult but absolutely key to unlocking exciting applications such as autonomous driving, 3D scene reconstruction and AR. In robotics, depth is a key prerequisite to perform multiple tasks such as perception, navigation, and planning.
Creating a 3D map would be another interesting application, computing depth allows us to back project images captured from multiple views into 3D. Then, registration and matching of all the points can perfectly restructure the scene.
Lidars and cameras are two essential sensors for perception and scene understanding. They build an environment in tandem and provide a means for detection and localisation of other objects, giving robots rich semantic information required for safe navigation. Many researchers have started exploring multi-modal approaches for precise 3D object detection. An interesting example would be an algorithm developed by Aptiv, PointPainting
Camera outperforms LIDAR when it comes to capturing denser and richer representation. From fig 2, looking at the sparse point cloud alone, it is relatively difficult to correctly identify the black box as a pedestrian. However, paying attention to…
When an image of a scene is captured by a camera, we lose depth information as objects and points in 3D space are mapped onto a 2D image plane. This is also known as a projective transformation, in which points in the world are converted to pixels on a 2d plane.
However, what if we want to do the inverse? That is, we want to recover and reconstruct the scene given only 2D image. To do that, we would need to know the depth or Z-component of each corresponding pixels. Depth can be represented as an image as shown in…
Geometric transformation is an essential image processing techniques that have wide applications. For example, a simple use case would be in computer graphics to simply rescale the graphics content when displaying it on a desktop vs mobile.
It could also be applied to projectively warp an image to another image plane. For instance, instead of looking at a scene straight ahead, we wish to look at it from another viewpoint, perspective transformation is applied in this scenario to achieve that.
One other exciting application is in training deep neural networks. Training deep model requires vast amount of data. And in…
AV Machine Learning Engineer