Design of a collision avoidance system by fusing data from lidar and camera. Camera and Lidar sensor data is taken from KITTI dataset.
Objective of the project is to estimate time to collision based on the inputs from camera and lidar sensors.
Following are the four major tasks to do this:
Object detection and classification is out of the scope of this project, for sake of simplicity quick and simple off-the-shelf deep learning based object detection and classification framework YOLO.
All the images from camera sensor is fed in to YOLO framework to obtain bounding boxes as shown in the image.
Points cloud received from Lidar sensor (10 Hz) is cropped to focus on ego lane
Points in homogeneous system and projected to camera plane using calibration matrices (Provided by KITTI)
// assemble vector for matrix-vector-multiplication
X.at<double>(0, 0) = it1->x;
X.at<double>(1, 0) = it1->y;
X.at<double>(2, 0) = it1->z;
X.at<double>(3, 0) = 1;
// project Lidar point into camera
Y = P_rect_xx * R_rect_xx * RT * X;
cv::Point pt;
pt.x = Y.at<double>(0, 0) / Y.at<double>(0, 2); // pixel coordinates
pt.y = Y.at<double>(1, 0) / Y.at<double>(0, 2);
By analyzing and comparing classic computer vision concepts HARRIS corner detection, SHITHOMASI against modern algorithms like FAST, SIFT, BRISK etc.., a robust mechanis is developed that can detect keypoints and generate descriptors.
Descriptors from two images (current and previous) are then matched using suitable macthing (Brute Force or FLANN) and selector types (Nearest Neightbour or K-Nearest Neighbour).
Following are the different detectors, descriptors algorithms supported and compared in the project for speed and accuracy.
enum class DetectorTypeIndex
{
FAST = 0,
BRISK = 1,
ORB = 2,
AKAZE = 3,
SIFT = 4,
SHITOMASI = 5,
HARRIS = 6
};
enum class DescriptorTypeIndex
{
BRISK = 0,
BRIEF = 1,
ORB = 2,
FREAK = 3,
AKAZE = 4,
SIFT = 5,
};
Detailed information about comparions between different detectors and descriptors can be found here,
Based on the matched keypoints, bounding boxes between teo consecutive images are compared and thier indices are matched based on the keypoint information.
Based on the matched bounding box information time to collision is calculated separately using lidar and camera data.
Implemented the estimation in a way that makes it robust against outliers which might be way too close and thus lead to faulty estimates of the TTC. For the lidar points that was processed earlier, time to collision is calculated by calculating the median of the lidar points along x axis (driving direction).
maxDistanceFromMedian
are filtered to remove the outliers.In the final non-technical part, calculated TTC information from lidar and camera is rendered using cv::imshow
, bounding box is rendered in red if the TTC is less than 7 seconds.
-D OPENCV_ENABLE_NONFREE=ON
cmake flag for testing the SIFT and SURF detectors.$> mkdir build && cd build
$> cmake -G Ninja ..
$> ninja -j400
$> ./3D_object_tracking
Based on Udacity’s SFND_3D_Object_Tracking