ICCV 2019 Round: Completed

ICCV 2019: Learning-to-Drive Challenge

Immitation Learning for Autonomous Driving

1 Authorship/Co-Authorship

IMPORTANT: How to win the challenge!

  • Challenge Participants are required to submit a 4 or 4+ page document using the ICCV 2019 paper template to describe data processing steps, network architectures, other implementation details such as hyper-parameters of the network training and the results.

  • Please submit the document through the CMT system at  https://cmt3.research.microsoft.com/ADW2019 by 20-10-2019 [11:59 p.m. Pacific Standard Time] in order to win the challenge.

  • It is allowed to include the names of the authors and the affiliations.

  • Please be aware that the novelty of the method is also evaluated. The winners will be invited to present their work in the workshop.


  • The top-performing teams will be required to include a four+ page write-up describing their method and code to reproduce their results to the claim victory. The detailed procedure for releasing the code is to be determined.

  • Username and password to download the csv and image data are located in the Resources->Drive360 Credentials text file.
  • Added zipped starter-kit to resources tab.

Why Learn-to-Drive?

Autonomous driving has seen a surge in popularity not only in the academic community but also from industry, with millions and millions of industry dollars poured into the development of level 5 autonomous vehicles.

While the traditional autonomous driving stack based on explicit perception, path planning and control systems has been continuously developed and is widely deployed in level 3 autonomous cars nowadays, new angles of approach, for solving the level 5 problem have, nonetheless, emerged.

One such approach is end-to-end driving via imitation learning. The model learns to imitate and drive like the human driver. This approach aims to compliment the traditional autonomous driving stack with a fully learned driving model that implicitly handles perception, path planning and control.

Such a two-system architecture (traditional stack and learned driving model) allows for better redundancy due systemic errors present in one system not propagating to the other as both systems are inherently different, one being model and the other being data driven.

In essence an end-to-end driving model is a neural network that takes some subset of the available sensor data from the vehicle and predicts future control output.

Learning-to-Drive Challenge

Welcome to the ICCV 2019: Learning-to-Drive Challenge part of our Autonomous Driving workshop hosted at ICCV 2019.

The goal of this challenge is to develop state of the art driving models that can predict the future steering wheel angle and vehicle speed given a large set of sensor inputs.

Sensor Data

We supply the Drive360 dataset consisting of the following sensors:

  • 4xGoPro Hero5 cameras (front, rear, right, left)
  • Visual map from HERE Technologies
  • Visual map from TomTom (may not be perfectly synced)
  • Semantic map from HERE Technologies

The Drive360 dataset is split into a train, validation and test partition.


Challenge participants are tasked to design, develop and train a driving model that is capable of predicting the steering wheel angle and vehicle speed obtained from the vehicles CAN bus 1 second in the future.

Challenge participants can use any combination of camera images, visual map images and semantic map information as input to their models. It is also allowed to use past sensor information in addition to the present observations, however it is NOT allowed to use future sensor information.

A detailed summary of the dataset is given in the Drive360 section.


Driving Models will be evaluated on the test partition using the mean-squared-error (MSE) performance for both steering wheel angle (‘canSteering’) and vehicle speed (‘canSpeed’) predictions with the human ground truth as a metric. Thus best performing driving models will drive identical to the human driver in these situations.

Drive360 Dataset

Our Drive360 dataset contains around 55 hours of recorded driving around Switzerland. We recorded camera images, routing/navigation information and the human driving manoeuvrers (steering wheel angle and vehicle speed) via the CAN bus.


The dataset is structured into runs that typically last 1-4 hours of continuous driving, these usually have place names such as Appenzell, Bern, Zurich, etc.

We have further split each run into 5 minute chapters, thus the run Aargau which lasted for around 50 minutes will have 10 chapters, while the run Appenzell which lasted for around 3 hour will have 37 chapters.

In total we have 682 chapters for our 27 routes. Out of this total we then randomly sample 548 chapters for training, 36 for validation and 98 for testing, giving us our dataset splits.

Data Types

We supply a specific csv file for each of the three phases (training, validation and testing) along with a zip file of our camera and map images, adhering to our run and chapter structure.

The columns of the csv file specify the data that is available, while each row is a time synchronized collection of the data from the available sensors.

IMPORTANT: We have already projected the targets (canSteering and canSpeed) 1s into the future, thus simply read a single row and predict the targets specified on that row. Everything is sampled at 10Hz.


Check out our GettingStarted.ipynb Jupyter book in the starter-kit for an example on how to use the dataset to train a model.

This starter-kit also contains a dataset.py file with a Drive360 python class that handles all dataset filtering and scaling (optional but strongly recommended).

The Drive360 class also appropriately handles chapter boundaries when using temporal sequences of historic data. This is particularly important when generating a submission file, as we truncate the beginning of each chapter in the test partition by 10 seconds to allow more flexibility to the challenge participants when choosing the length of historic data they would like to use.

HERE Semantic Maps

We give a short overview of the available semantic map information generously provided by HERE Technologies.

To obtain this data, we use a Hidden-Markov-Model path matcher to snap our noisy GPS trace, that we recorded during our drive, onto the underlying HERE road network. We then use the map matched positions to query the HERE database.

alt text

Group 1:

  • hereSignal: road-distance to next traffic signal. (m)
  • hereYield [1d]: road-distance to next yield instance. (m)
  • herePedestrian[1c]: road-distance to next pedestrian crossing. (m)
  • hereIntersection: road-distance to next intersection. (m)
  • hereMmIntersection: road-distance to next intersection but using map matched localization instead of recorded GPS coordinates. (m)

Group 2:

  • hereSpeedLimit[2a]: speed limit from ADAS Map. (km/h)
  • hereSpeedLimit_2: speed limit from Navigation Map. (km/h)
  • hereFreeFlowSpeed[2b]: average driving speed based on underlying road geometry. Measured by HERE. (km/h)

Group 3:

  • hereCurvature: inverse radius of the approximated road geometry. (1/m)

Group 4:

  • hereTurnNumber: index of road at next intersection to travel (counter-clockwise).

Group 5:

  • hereSegmentExitHeading: heading of the current road our car is on at next intersection. (degrees)
  • hereSegmentEntryHeading[5a]: heading of the road that our car will take at next intersection. (degrees)
  • hereSegmentOthersHeading[5b]: heading of all other roads at next intersection. (degrees)

Group 6:

  • hereCurrentHeading: current heading. (degrees)
  • here1mHeading: relative heading of map matched GPS coordinate in 1 meter. (degrees)
  • here5mHeading: … in 5 meters. (degrees)
  • here10mHeading[6c]: … in 10 meters. (degrees)
  • here20mHeading[6d]: … in 20 meters. (degrees)
  • here50mHeading: … in 50 meters. (degrees)

Getting Started

Please sign up for the challenge, download the dataset and then check out our starter kit which should get you going quite quickly.

If you have any questions please don’t hesitate to contact us, we are happy to help.


  • Round 1: July 29th ~ October 20th, 2019
    • Top placing participants will be announced at our workshop.
    • The top-performing teams will be required to include a four+ page write-up describing their method and code to reproduce their results to the claim victory. The detailed procedure for releasing the code is to be determined.
  • Round 2: October 20th, 2019 ~ Open End


In addition to the Challenge rules, outlined when you participate, don’t forget to cite our work if you use the Drive360 dataset for your work, thanks!

[1] Hecker, Simon, Dengxin Dai, and Luc Van Gool. “End-to-end learning of driving models with surround-view cameras and route planners.” Proceedings of the European Conference on Computer Vision (ECCV). 2018.

[2] Hecker, Simon, Dengxin Dai, and Luc Van Gool. “Learning Accurate, Comfortable and Human-like Driving.” arXiv preprint arXiv:1903.10995 (2019).


  • Simon Hecker, heckers@vision.ee.ethz.ch
  • Denxgin Dai, daid@vision.ee.ethz.ch



05 jf_binvignat 958.389