🏆 Winner's Solutions
🔍 Discover released models and source code in our MDX track and CDX track papers' "Notes" section.
🗣️ Explore teams' model announcements on the discussion forum for additional insights.
🕵️ Introduction
Cinematic sound separation is the task of separating movie audio into the three tracks “dialogue”, “sound effects” and “music”. It has many applications ranging from language dubbing to upmixing of old movies to spatial audio and user interfaces for flexible listening.
📜 The Task
There are two Leaderboards in Cinematic Sound Demixing Track
(CDX):
- Systems that are trained only on the training (
tr
) and validation (cv
) part of DnR are eligible for Leaderboards A. - Systems that are trained on any other data (e.g., also using the test part
tt
of the DnR dataset) are eligible for Leaderboard B.
📁 Datasets
Cinematic source separation is the task of separating movie audio into the three tracks “dialogue”, “sound effects” and “music”. It has many applications ranging from language dubbing to upmixing of old movies to spatial audio.
For the training of the system, participants can use either the training data of the “Divide-and-Remaster” (DnR) dataset (Leaderboard A) or any data that they have at their disposal (Leaderboard B). The DnR dataset consists of 3,406 mixtures (∼ 57 h) for the training set, 487 mixtures (∼ 8 h) for the validation set, and 973 mixtures ( ∼16 h) for the test set, along with their isolated ground-truth stems.
For the evaluation and ranking of the submissions, we use a newly created hidden dataset of real audio from 11 Sony Picture Entertainment movies. The data is stereo and sampled at 44.1 kHz. You can find the dataset files over here.
💰 Prizes
🥁 Cinematic Sound Demixing Track (CDX) 10,000 USD
Leaderboard - Divide and Remaster (DnR) dataset : 5,000 USD
- 1st prize:
2500 USD
- 2nd prize:
1500 USD
- 3rd prize:
1000 USD
Participants need to opensource their training + inference code
Leaderboard - Standard Cinematic Sound Separation(Open Track): 5,000 USD
- 1st prize:
2500 USD
- 2nd prize:
1500 USD
- 3rd prize:
1000 USD
This is an Open Track where you can use any data you want.
Please refer to the Challenge Rules for more details about the Open Sourcing criteria for each of the leaderboards to be eligible for the associated prizes.
🖊 Evaluation Metric
As evaluation metric, we are using signal-to-distortion ratio (SDR), which is defined as
is the waveform of the ground truth and sinstr(n) denotes the waveform of the estimate. The higher the SDR score, the better the output of the system is.
In order to rank systems, we will use the average SDR computed by
for each song. Finally, the overall score SDRtotal is given by the average over all songs in the hidden test set. There will be a separate leaderboard for each round.
For an academic report about the challenge, the organizers will get access to the separations of the top-10 submitted entries (i.e., their output) for each leaderboard in order to compute more source separation metrics (e.g., signal-to-interference ratio).
📅 Timeline
The SDX23
Cinematic Sound Demixing Track will take place in 2 Rounds which differ in the evaluation datasets used for ranking the submitted systems.
- Warmup Round: 8th December 2022
- Phase I: 23rd January 2023
- Phase II: 6th March 2023
- Challenge End: 1st May 2023
📱 Challenge Organising Committee
Cinematic Sound Demixing Track (CDX)
- Yuki Mitsufuji, Stefan Uhlich, Hirano Masato, Shusuke Takahashi (Sony)
- Jonathan Le Roux, Gordon Wichern (Mitsubishi Electric Research Labs)