On-road obstacle detection and classification are the key tasks in the perception system of self-driving vehicles. Since vehicle tracking involves localization and association of vehicles between frames, detection and classification of vehicles is necessary. Vision-based approaches are popular for this task due to cost-effectiveness and usefulness of appearance information associated with the vision data. I proposed a deep learning system using region-based convolutional neural network trained with PASCAL VOC image dataset for the detection and classification of on-road obstacles such as vehicles, pedestrians, and animals. The implementation of the system on a Titan X GPU achieves a processing frame rate of at least 10 fps for a VGA resolution image frame. This sufficiently high frame rate using a powerful GPU demonstrate the suitability of the system for highway driving of autonomous cars. The detection and classification results on images from KITTI and iRoads, and Indian roads show the performance of the system invariant to object's shape and view, and different lighting and climatic conditions.
Objective:
To model the network using of ZFNet
To train the network for on-road objects like cars, buses, trucks, humans, motorbikes and bicycles from IMAGENET, PASCAL VOC 20012 datasets.
To implement the detection algorithm on GPU using CUDA libraries by CAFFE
To test the detection algorithm with raw input images with and without objects in the network
To find the accuracy and detection error of the network and compare with the existing benchmarks and footprints
Implementation:
Implemented in Titan X GPU
Caffe in python
PASCAL VOC 2007 - 20 object classes
Objects - aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, dining table, dog, horse, motorbike, person, potted plant, sheep, sofa, train, TV-monitor
On-road object Mask
C++ wrapper for module integration
Cars detected in Bangalore Highways in 90ms
mean Average Precision (mAP) of the system
Animals detected along with pedestrians and cars in 60ms
Max speed of the car for the system:
Min 10 FPS for different images ->Max 100ms per image
10 FPS->100ms
If 100ms->1m to hit the obstacle
Then, 1s->10m/s -> 36Km/hr
If 100ms->2m to hit the obstacle
Then, 1s->20m/s -> 72Km/hr
Demo Video:
Conclusion and Future Work:
Deep Learning is robust to variation in object’s view, lighting and climatic condition
Fails to detect some vehicles like auto rickshaw on Indian roads
Retrain the network on Indian road dataset
Usage of deeper networks like VGG16, GoogleNet, ResNet
Testing the project in real time on a car on an embedded GPU platform like Jetson TX1