A survey of datasets for object detection in autonomous driving

No matter how spectacular your AI model architecture is, it is not going to perform without well prepared real-world data. Achieving a complete and adequate data set for the purpose of training, validation and testing is definitely an expensive and time-consuming activity. Therefore, below you can find a summary of some publicly available datasets for object detection (with a setup those were acquired with) on which you can verify and enhance your model’s performance.

  1. Waymo Open Dataset — consists of 1950 segments, one of which contains 200000 frames. Segments were acquired using the following configuration: 5 cameras (front and sides) one mid-range lidar, four short-range lidars and under multiple different Operational Design Domains (day/night, sun/rain, urban/sub-urban). Data labeling is provided as 3D bounding boxes on lidar data and 2D bounding boxes on camera data. Published under a Waymo Dataset License Agreement for Non-Commercial Use licence.

  2. AMUSE — data provided from an omnidirectional multi-camera, IMU, velocity and height sensors and GPS. Sequences were recorded in various environmental conditions, e.g. urban and suburban areas, with intensive and low traffic, low altitude of sun combined with snowfall. Published under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported license.

  3. Oxford RobotCar Dataset — available data reflects over 100 drives of a consistent route that were acquired. Therefore, different combinations of environmental conditions with different traffic are available along with longer term changes, e.g. roadworks. However, this dataset does not contain the ground truth annotations of the objects. Published under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.

  4. nuScenes — provides 1000 scenes that consist of camera images, radar and lidar sweeps, with over 1.4M annotated object bounding boxes. Published under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 license.

  5. KITTI Vision Benchmark — available data from two high-resolution color and grayscale video cameras, laser scanner and a GPS localization system, scenarios from both rural areas and highways. It consists of 7481 training images and 7518 test images. Published under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 license.

  6. Ford Campus Vision and Lidar Data Set — data recorded from the The vehicle is outfitted with two IMU , three lidar scanners and an omnidirectional camera system. Provided scenarios consist of several small and large-scale loop closures. Licencing was not specified.

  7. Udacity Dataset — available data comes from monocular color cameras, 3D lidar, GPS and IMU sensors. Scenes were recorded in both overcast and sunny conditions. Published under a MIT license.

  8. Cityscapes Dataset — available data from two cameras with additional ego-vehicle velocity, localization from GPS and outside temperature. Published under its own license and it can be used for free for non-commercial purposes.

  9. PandaSet —  contains 100 short video sequences (8 sec), that were acquired by a mechanical spinning LiDAR, forward-facing LiDAR, 6 cameras and on-board GPS/IMU (overall approximately 16000 LiDAR sweeps and 48000 camera images). The database contains samples within 28 annotation classes, which were covered with 37 semantic segmentation labels. Published under a Creative Commons Attribution 4.0 International Public licence.

  10. ApolloScape Dataset — dataset acquired with two laser scanners, two front camera systems and on-board IMU/GNSS sensors. It consists of nearly 100000 image frames and 80000 lidar point clouds, mostly acquired in an urban environment. Frames were annotated at pixel-level, with additional depth-maps provided for static background. User licence is specified here.

  11. CamVid — data provided from a single monocular camera mounted on the dashboard of the vehicle. Recordings took place mostly in urban environment. Licencing was not specified.

  12. Berkeley DeepDrive — contains video sequences with GPS locations, IMU data and timestamps, which were acquired multiple different Operational Design Domains (i.e. weather conditions, time of day, driving scenario). The database consists of multiple types of labels, i.e. 2D bounding boxes, pixel-level object or road lane segmentation or road lines bounding boxes. Licencing was not specified.

  13. Daimler Pedestrian Benchmark Data Sets — provides multiple subsets with different configuration setup (e.g. either monocular or stereo cameras). The sets are oriented on pedestrian and cyclist detection and contain even corner-cases (e.g. occluded cases). Licencing was not specified.

  14. Caltech Pedestrian Detection Benchmark — provides over 250,000 frames with approximately 350,000 bounding boxes acquired by setup with two cameras. The data set contains also cases in which objects are occluded and temporal relation among bounding boxes is also reflected in labeling. Licencing was not specified.

For sure, you can find a dataset that matches your setup and use it to improve the model you are currently working on!

Piotr Serwa