Badges
Activity
Ratings Progression
Challenge Categories
Challenges Entered
Audio Source Separation using AI
Latest submissions
See Allgraded | 220381 | ||
graded | 220368 | ||
graded | 220196 |
Participant | Rating |
---|---|
chy_chen | 0 |
Participant | Rating |
---|
Sound Demixing Challenge 2023
Post-challenge discussion
Over 1 year agoHello @ZFTurbo , unfortunately due to the use of internal data we are not able to release the model weights. We did have a smaller version of the model trained only with MUSDB18HQ mentioned in the original post above, and @faroit has already coordinated with the organizers to mark it as a baseline in the Leaderboard now (thanks for the help!). Maybe that could serve as a starting point for others to use their own internal data, either labeled or unlabeled, to work on the model.
Submission Times
Over 1 year agoHi @dipam,
Thanks for the reply. There are still few points I would like to receive more comments on, and it would be helpful if you could elaborate more:
Therefore, if only one member of your team submits, they can utilize all 5 submissions for that day. If multiple people submit, it counts towards the entire teamβs quota.
I donβt know much for other teams, but we did try to submit more than 4 submissions a day (counting the total number of submissions per team) but failed due to the system only allowed 1 submission per team member. I think we can find some previous error messages when we tried to make the 5th submission if needed.
Upon reviewing all the submissions made in the challenge, we can confirm that no team has exceeded the limit of 5 submissions in a day within the fixed window of quota reset (UTC 00:00:00 to UTC 23:59:59). Your screenshots will also confirm the same.
We actually posted another comment in this thread above: Submission Times - #5 by JusperLee, where 6 successful submissions were in the submission page between UTC 00:00:00 to UTC 23:59:59 (weβd like to emphasize that we are in UTC + 8 and the webpage shows local time, so it should count from 08:00:00 in our local time zone), so the screenshots do not confirm the same. There was another comment saying that the submission quota changed to 5 during the last week: Submission Times - #7 by yoyololicon. I think more in-depth investigation might be important and necessary to reconfirm whether the submission system was functioning well.
Post-challenge discussion
Over 1 year agoHey guys,
Since the challenge has ended, just open this thread to see if anyone is interested in some post-challenge discussions on any thoughts, findings or things to share throughout the challenge.
As a starter, let me share some info and thoughts on behalf of our team βJusperLeeβ. We are from Tencent AI Lab and this is Yi Luo posting this thread. We participated in both tracks and here is some info for the systems we submitted.
Model arch: we use and only use BSRNN in both MDX and CDX, as this is the model we proposed and we would like to see how it performs comparing with all the other possible systems in the wild. We did make some modifications comparing with our original version, and we will describe them in future papers.
Data for MDX: we follow the pipeline we mentioned in our BSRNN paper, which only uses the 100 training songs in MUSDB18HQ with 1750 additional unlabeled songs for semi-supervised finetuning.
Data for CDX: things are little bit tricky here. For Leaderboard A (DnR-only track), we found that sometimes the music and effect data might contain speech which can greatly harm the training of the model, so we used the MDX model above to preprocess the DnR data to remove the βvocalβ part from music and effect. We actually did not know whether this was permitted, as it said that βonly DnR data can be used to train the systemsβ, but indeed we did not find specific rules clarifying this. So we simply went through the DnR data and train the CDX model with the preprocessed DnR-only data and found great performance improvement. For Leaderboard B, we added ~10 hrs of cinematic sound effect and ~100 hrs of cinematic BGM (both internal data) to the preprocessed DnR data.
Some observations and guesses: the interesting thing on the CDX challenge is that, when we use a pretty strong speech enhancement model on our side, the SDR score for the βdialogβ track was always below 13 dB. We listened to the model outputs from real dramas and movies and thought that the quality was actually pretty good, so we struggled for really a while on what we should do. One day we randomly tried to use our MDX model to extract the βvocalβ part to serve as βdialogβ, and suddenly the SDR score goes to ~15 dB. We know that our MDX model may fail to remove some sound effects or noises when directly applied to speech, but the much better SDR scores made us assume that the βdialogβ tracks in the hidden test data, which assumably are directly collected from real movies, contain some noise as they might be directly recorded on the film set instead of in a recording studio (I guess the βdialogβ tracks in the demo audio clips were recorded in a studio?). That might explain why our enhancement model is worse than this MDX model here but is far better in almost all our other internal test sets. I personally do hope that the organizers can share some information about the evaluation dataset, particularly on whether they are clean (in terms of environmental sounds) or not.
The things we enjoyed:
- It is the first time for us to participate in such source separation challenges and it was really an action-packed competition, especially in the last week where we and many other participants were trying the best to improve the scores on the leaderboard.
- It is good to know the performance of our models on real-world evaluation recordings (real stems or cinematic tracks), which could shed light on the future directions to make improvements on our systems.
The things we were confused:
- I received an email from the organizers asking whether we could provide an implementation of BSRNN to serve as a baseline for the challenge. We did submit one system and made the entry publicly available (https://gitlab.aicrowd.com/Tomasyu/sdx-2023-music-demixing-track-starter-kit, submission #209291), but it seems that the organizers havenβt marked it as a baseline like several other baseline models until now. We did not get followups about whether the info about this system has been shared with any participants since we submitted it.
- We have four members in our team and each of us had one account. Although it was mentioned in the systems that each team was able to make 5 submissions per day, we found that each of us could only make 1 submission per day, so only a total of 4. We thought that this might be some misinfo in the system, but in the last (extended) week of the challenges we found that many other teams might have up to 10 successful submissions per day (Submission Times - Cinematic Sound Demixing Track - CDXβ23 - AIcrowd Forum). We also found in one thread where the AIcrowd team mentioned that βThe submission quotas are checked against the number of submissions made by your (or any of your team members) in the last 24 hour windowβ (No Submission Slots Remaining - Cinematic Sound Demixing Track - CDXβ23 - AIcrowd Forum), and another thread that βHence weβll be increasing the number of submissions per day from 5 to 10, starting from Monday - April 3rd onwards. This increase will only be valid for a week, and the submission slots will be reduced back to 5 per day from April 10th onwardsβ(Phase 1 scores for new submissions - Cinematic Sound Demixing Track - CDXβ23 - AIcrowd Forum). We were pretty confused on how many submissions did each team have throughout the challenges, as the quota for our team was always 4 in the final month but it seems like different teams did have different quotas, at least given the information in the submission page (AIcrowd | Cinematic Sound Demixing Track - CDXβ23 | Submissions, one can easily count how many successful submissions a team made in the last 24 hour window). The response we got from the organizers is βit is possible that the higher number of submissions occurred during the one-week period when the submission quota was temporarily increasedβ, but that was actually contradictory to the announcement above saying that the submission quota went back to 5 after that one-week period but there could still be more than 5 successful submission within a 24 hour slot in the final week of the challenge. I donβt know if this is an AIcrowd issue or something else.
- Our result on CDX final Leaderboard A has been removed (others are still there). According to the response from the organizers it is because βthe use of pretrained models is strictly prohibited in the challenge as they may have been trained on datasets not specified in the competition guidelinesβ, so I think maybe this was indeed not allow in such limited-data tracks. We would like to apologize if this is a common sense in challenges as we indeed do not have much experience on it, but we also hope that it could be clearly clarified in the challenge rules. As our result for CDX Leaderboard A is still here, maybe the organizers can also remove that if necessary.
Submission Times
Over 1 year agoHi there,
Many thanks for the reply here and congrats too for the good job on the contest - we know that if you have combined the 3 best systems for the 3 tracks you should be #1 there. We are definitely not attacking you for getting more submissions in some secret ways, our purpose is more about questioning the entire submission system, as now it seems like different teams were indeed having different numbers of submission quotas. We hope the organizers and the AIcrowd team can provide an explanation, at least provide some information about how the submission quotas were decided and, if they have been changed during the contest, the way they changed them and whether they were identical to all participants.
Submission Times
Over 1 year agoHello @dipam, thanks for the updated info and now we have a better understanding on the issue. Yes we directly used the provided doc from the very beginning and did not change that throughout the challenge.
Just to confirm: given that our submissions with βdebug=Trueβ were successfully scored and updated on the Leaderboards, does it mean that all teams actually had 6 submissions per day as you mentioned (1 with βdebug=Trueβ and 5 with βdebug=Falseβ)?
Anyway hopefully this could be fixed ASAP to minimize its impact on other ongoing and upcoming challenges.