Correcting Perspectivity

Figure 1. Left: Frontal view RGB. RIght: BEV from IPM

Homography based IPM

In computer vision, homography is a transformation matrix H when applied on a projective plane maps it to another plane (or image). In the case of Inverse Perspective Mapping (IPM), we want to produce a birds-eye view image of the scene from the front-facing image plane.

In the field of autonomous driving, IPM aids in several downstream tasks such as lane marking detection, path planning and intersection prediction solely from using a monocular camera as this orthographic view is scale-invariant. Emphasising the importance of this technique.

How does IPM work?

IPM first assume the world to be flat on a plane. Then it maps…

Going anchor-free for object detection

Source: Detection on Cityscapes


Since the development of convolutional neural networks, object detection has been dominating by anchor-based methods such as FasterRCNN, RetinaNet and SSD. These methods rely on a large number of preset anchors tiled onto the image. Each anchor predicting if an object is contained and the refinement of the coordinates.

Recently, more attention has been geared towards eliminating the requirements for preset anchors, which requires manual tuning on the scale, aspect ratio and number of anchors. To do that, an effective method, FCOS [1], was proposed which directly find objects based on points tiled on the image.

The main characteristics of…

Photo by Shea Rouda on Unsplash

Depth Is Essential For 3D Vision

Measuring distance relative to a camera remains difficult but absolutely key to unlocking exciting applications such as autonomous driving, 3D scene reconstruction and AR. In robotics, depth is a key prerequisite to perform multiple tasks such as perception, navigation, and planning.

Creating a 3D map would be another interesting application, computing depth allows us to back project images captured from multiple views into 3D. Then, registration and matching of all the points can perfectly restructure the scene.

Figure 1. Lidar points on image (source)

Lidars and cameras are two essential sensors for perception and scene understanding. They build an environment in tandem and provide a means for detection and localisation of other objects, giving robots rich semantic information required for safe navigation. Many researchers have started exploring multi-modal approaches for precise 3D object detection. An interesting example would be an algorithm developed by Aptiv, PointPainting[1]

So why is this 2 sensor complimentary?

Camera outperforms LIDAR when it comes to capturing denser and richer representation. From fig 2, looking at the sparse point cloud alone, it is relatively difficult to correctly identify the black box as a pedestrian. However, paying attention to…

Fig 1: 3D points back-projected from a RGB and Depth image

Depth and Inverse Projection

When an image of a scene is captured by a camera, we lose depth information as objects and points in 3D space are mapped onto a 2D image plane. This is also known as a projective transformation, in which points in the world are converted to pixels on a 2d plane.

However, what if we want to do the inverse? That is, we want to recover and reconstruct the scene given only 2D image. To do that, we would need to know the depth or Z-component of each corresponding pixels. Depth can be represented as an image as shown in…

The math and code behind Image warping

Source: Geometric Transformations

Geometric transformation is pervasive in Computer Vision

Geometric transformation is an essential image processing techniques that have wide applications. For example, a simple use case would be in computer graphics to simply rescale the graphics content when displaying it on a desktop vs mobile.

It could also be applied to projectively warp an image to another image plane. For instance, instead of looking at a scene straight ahead, we wish to look at it from another viewpoint, perspective transformation is applied in this scenario to achieve that.

One other exciting application is in training deep neural networks. Training deep model requires vast amount of data. And in…

Daryl Tan

AV Machine Learning Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store