Trajnet++ (A Trajectory Forecasting Challenge)
Trajectory forecasting in crowded scenes has become an important topic in recent times because of the increasing demands of emerging applications of artificial intelligence like autonomous cars and service bots. One important challenge in trajectory forecasting is to effectively model the social interactions between agents. In the past few years, several novel designs have been proposed to model agent-agent interactions. However, these methods have been evaluated on different subsets of the available data without proper sampling of trajectories making it difficult to objectively compare the forecasting techniques.
We introduce TrajNet++, a large scale interaction-centric trajectory-based benchmark. Researchers have to study how their method performs in explicit agent-agent scenarios. Our challenge provides not only proper sampling of trajectories but also a unified extensive evaluation system to test the gathered methods for a fair comparison.
What do we provide?
We present a framework for the fair evaluation of trajectory forecasting algorithms, explicitly in agent-agent scenarios. We provide:
- A large collection of agent-agent centric datasets
- A defined categorisation of trajectories
- A common evaluation tool providing several performance measures
- An easy way to compare the performance of state-of-the-art methods.
Data Description
The dataset files contain two different data representations:
1.Scene
{“scene”: {“id”: 266, “p”: 254, “s”: 10238, “e”: 10358, “fps”: 2.5, “tag”: 2}}
- id: scene id
- p: pedestrian ID
- s, e: starting and ending frames id of pedestrian “p”
- fps: frame rate.
- tag: trajectory type. Discussed in detail below.
Note: Corresponding to each scene, there exists a primary pedestrian denoted by the pedestrian ID of the scene. The scene is categorised (tag) with respect to this primary pedestrian.
2.Track
{“track”: {“f”: 10238, “p”: 248, “x”: 13.2, “y”: 5.85, “pred_number”: 0, “scene_id”: 123}}
- f: frame id
- p: pedestrian ID
- x, y: x and y coordinates in meters of pedestrian “p” in frame “f”.
- pred_number: prediction number. This is useful when you are providing multiple predictions as opposed to a single prediction. Max 3 predictions allowed
- scene_id: This is useful when you are providing predictions of other agents in the scene as opposed to only primary pedestrian prediction.
For a more detailed description, we provide the following helper code: Tools for Trajnet++
Trajectory Categorization
We explicitly categorise the primary pedestrian trajectory of the scene into different types. The definition of each type is provided below:
Static (Type I): If the euclidean displacement of the primary pedestrian in the scene is less than 1 meter
Linear (Type II): If the trajectory of the primary pedestrian can be correctly predicted with the help of an Extended Kalman Filter (EKF). A trajectory is said to be correctly predicted by EKF if the FDE between the ground truth trajectory and predicted trajectory is less than 0.5 meter.
Non-Linear: The rest of the scenes are classified as ‘Non-Linear’. We further divide non-linear scenes into Interacting (Type III) and Non-Interacting (Type IV).
We further sub-categorize the Interacting (Type III) trajectories as follows:
Leader Follower: Leader follower phenomenon refers to the tendency to follow pedestrians going in relatively the same direction. The follower tends to regulate his/her speed and direction according to the leader. If the primary pedestrian is a follower, we categorize the scene as Leader Follower.
Collision Avoidance: Collision avoidance phenomenon refers to the tendency to avoid pedestrians coming from the opposite direction. We categorize the scene as Collision avoidance if primary pedestrian to be involved in collision avoidance.
Group: The primary pedestrian is said to be a part of a group if he/she maintains a close and roughly constant distance with atleast one neighbour on his/her side during prediction.
Others: Trajectories where the primary pedestrian undergoes social interactions other than Leader Follower, Collision Avoidance and Group. We define social interaction} as follows: We look at an angular region in front of the primary pedestrian. If any neighbouring pedestrian is present in the defined region at any time-instant during prediction, the scene is classified as having a presence of social interactions.
If a trajectory of primary pedestrian is non-linear and undergoes no social interactions during prediction, the trajectory is classified as Non-Interacting (Type 4).
During evaluation, we provide the evaluation of the submitted model with respect to each of above categories to provide insight into the model performance in different scenarios.
We rely on the spirit of crowdsourcing, and encourage researchers to submit their sequences to our benchmark, so the quality of trajectory forecasting models can keep increasing in tackling more challenging scenarios.
Metrics
A good benchmark requires not only a standard dataset but also important evaluation metrics to provides insights regarding the model performance through different perspectives. We describe the evaluation metrics for this challenge:
Unimodal Metrics: Single Prediction
Average Displacement Error (ADE): Average L2 distance between the ground truth and prediction of the primary pedestrian over all predicted time steps. Lower is better.
Final Displacement Error (FDE): The L2 distance between the final ground truth coordinates and the final prediction coordinates of the primary pedestrian. Lower is better
Prediction Collision (Col-I): Calculates the percentage of collisions of primary pedestrian with neighbouring pedestrians in the scene. The model prediction of neighbouring pedestrians is used to check the occurrence of collisions. Lower is better.
Ground Truth Collision (Col-II): Calculates the percentage of collisions of primary pedestrian with neighbouring pedestrians in the scene. The ground truth of neighbouring pedestrians is used to check the occurrence of collisions. Lower is better.
Multimodal Metrics: Multiple Prediction
Topk Average Displacement Error (Topk_ADE): Given k output predictions for an observed scene, the metric calculates the ADE of the prediction which is closest to the groundtruth trajectory in terms of ADE. Lower is better. In this challenge, k=3
Topk Final Displacement Error (Topk_FDE): Given k output predictions for an observed scene, the metric calculate the FDE of the prediction which is closest to the groundtruth trajectory in terms of ADE. Lower is better. In this challenge, k=3
Average NLL (NLL): Given n output predictions for an observed scene, the metric calculates the average negative log-likelihood of groundtruth trajectory in the model prediction distribution over the prediction horizon. Higher is better. In this challenge, n=50.
Data
The training and test datasets can be found here.
Submission
We strongly encourage all participants to use only the sequences from the training set for finding parameters and report results on the provided test scenarios to enable a meaningful comparison of forecasting methods.
File Format
To have your predictions evaluated, you need to submit a single .zip file containing the exact same directory structure and file names as the test file. Specifically, you will be given a single .zip with folders ‘real_data’ and ‘synth_data’ within the parent folder ‘test’. Each of these folders will contain one or more .ndjson files. In every file, corresponding to each “scene” (length = 21 frames), you are supposed to predict the coordinates of the primary pedestrian and the corresponding neighbours in the last 12 frames (Tpred = 12), given the observations for first 9 frames (Tobs = 9) only.
Please note: Your submission is supposed to have your predicted tracks (TrackRows) along with the test scenes (SceneRows). The observed test tracks do not have the pred_number and scene_id attributes (set to None). The predicted tracks (last 12 frames) MUST have the pred_number (numbering starts from 0) and scene_id (corresponding to the id of scene being predicted) attributes, even when outputting a single prediction corresponding to each scene.
Your submitted file may contain multiple predictions corresponding to each scene. For unimodal metrics evaluation, the first prediction (prediction_number=0) will be considered. For top3_ADE and top3_FDE metrics, the first 3 predictions (prediction_number=0, 1 and 2) will be considered. Likewise, for NLL, the first 50 predictions will be considered.
An example of input test file and output prediction file is provided here.
As mentioned above, please submit a single .zip file that matches exactly the format given to you for testing.
Evaluation
Once your files are correctly submitted, they will be graded with multiple criteria. The primary and secondary grades correspond to the final displacement error in the real test dataset and synthetic test dataset respectively. A figure comparing the submitted model to baseline Vanilla LSTM model and a table containing a detailed model evaluation will also be provided.
The result of the baseline: Vanilla LSTM Baseline Score
Resources
In this section, participants can find useful resources for the Trajnet++ challenge.
Kick-Starter Guide (NEW)
A starter guide to using TrajNet++ framework can be found here
Visualisations
We provide visualisations for the datasets provided in order to better understand the data. The visualisations capture attributes of human motion as well as nature of interactions in the different datasets.
Baselines
We provide baseline codes of important papers in trajectory prediction.
Baseline algorithms for TrajNet++
Workshops
Round 3 continues to remain the active round for submission.
Round 3 of this challenge was a part of Workshop on Benchmarking Trajectory Forecasting Models, ECCV 2020.
Round 2 of this challenge was a part of 2nd Workshop on Long-term Human Motion Prediction, ICRA 2020
Round 1 of this challenge was a part of Applied Machine Learning Days, EPFL 2020
Organizers
Luan Po-Chien