## The Second DIHARD Speech Diarization Challenge

DIHARD II is the second in a series of diarization challenges focusing on "hard" diarization; that is, speaker diarization for challenging recordings where there is an expectation that the current state-of-the-art will fare poorly. As with other evaluations in this series, DIHARD II is intended to both:

• support speaker diarization research through the creation and distribution of novel data sets
• measure and calibrate the performance of systems on these data sets.

Following in the success of the First DIHARD Challenge,
we are pleased to announce This Second DIHARD Challenge (DIHARD II)

The task evaluated in the challenge is speaker diarization; that is, the task of determining "who spoke when" in a multispeaker environment based only on audio recordings. As with DIHARD I, development and evaluation sets will be provided by the organizers, but there is no fixed training set with the result that participants are free to train their systems on any proprietary and/or public data. Once again, these development and evaluation sets will be drawn from a diverse sampling of sources including monologues, map task dialogues, broadcast interviews, sociolinguistic interviews, meeting speech, speech in restaurants, clinical recordings, extended child language acquisition recordings from LENA vests, and YouTube videos. However, there are several key differences from DIHARD I:

• two tracks evaluating diarization of multi-channel recordings have been added; these tracks will use recordings of dinner parties provided by the organizers of CHiME-5
• the evaluation period has been lengthened (from 4 weeks to 16 weeks)
• Jaccard Error Rate replaces mutual information as the secondary metric
• baseline systems and results will be provided to participants

The challenge will run from February 14th, 2019 through July 1, 2019 and results will be presented at a special session at Interspeech 2019 in Graz - Austria. Participation in the evaluation is open to all who are interested and willing to comply with the rules laid out in the evaluation plan. There is no cost to participate, though participants are encouraged to submit a paper to the corresponding Interspeech 2019 special session.

## Evaluation plan

For all details concerning the overall challenge design, tasks, scoring metrics, datasets, rules, and data formats, please consult the latest version of the official evaluation plan:

## Important dates

Event Date
• Registration period
• January 30 through March 15, 2019
• Launch: release of DIHARD II development and evaluation sets + scoring code
• February 28, 2019
• Scoring server opens
• March 12, 2019
• Baselines released
• Week of March 11, 2019
• March 29, 2019
• April 5, 2019
• End of challenge/final Interspeech deadline
• July 1, 2019
• Interspeech 2019 special session
• September 15-19, 2019

The deadline for submission of final system outputs is July 1, 2019 midnight.

## Scoring

The official scoring tool is maintained as a github repo. To score a set of system output RTTMs sys1.rttm, sys2.rttm, ... against corresponding reference RTTMs ref1.rttm, ref2.rttm, ... using the un-partitioned evaluation map (UEM) all.uem, the command line would be:

				  
\$ python score.py -u all.uem -r ref1.rttm ref2.rttm ... -s sys1.rttm sys2.rttm ...


The overall and per-file results for DER and JER (and many other metrics) will be printed to STDOUT as a table. For additional details about scoring tool usage, please consult the documentation for the github repo.

## Baseline systems

We provide three software baselines for speech enhancement, speech activity detection, and diarization:

• Speech enhancement
The speech enhancement baseline was prepared by Lei Sun and is based on the system used by USTC and iFLYTEK in their submission to DIHARD I:
Sun, Lei, et al. "Speaker diarization with enhancing speech for the First DIHARD Challenge." (2018). Proceedings of INTERSPEECH 2018. 2793-2797. (paper)
It is available on github.
• Speech activity detection
The speech activity detection baseline uses WebRTC operating on output of audio processed by the speech enhancement baseline and is maintained as part of that github repo.
• Diarization
The diarization baseline was prepared by Sriram Ganapathy and is based on the system used by JHU in their submission to DIHARD I with the exception that it omits the Variational-Bayes refinement step:
Sell, Gregory, et al. (2018). "Diarization is Hard: Some experiences and lessons learned for the JHU team in the Inaugural DIHARD Challenge." Proceedings of INTERSPEECH 2018. 2808-2812. (paper)
The x-vector extractor and PLDA parameters were trained on VoxCeleb I and II using data augmentation (additive noise), while the whitening transformation was learned from the DIHARD I development set.

The trained system, as well as recipes to produce the baseline results for each track, is available on github.

## Registration

To register for the evaluation, participants should email dihardchallenge@gmail.com with the subject line "REGISTRATION" and the following details:

• Organization – the organization competing (e.g., NIST, BBN, SRI)
• Team name – the name to be displayed on the leaderboard; you need to use that same team name when you register for the competition on CodaLab (see under Results submission)
• Tracks – which tracks they will be competing in

One participant from each site must sign the data license agreement and return it to LDC: (1) by email to ldc@ldc.upenn.edu or (2) by facsimile, Attention: Membership Office, fax number (+1) 215-573-2175. They will also need to create an LDC Online user account, which will be used to download the dev and eval releases.

Once the process is complete, this will give you access to all annotation plus the non-CHiME audio.

Participants of tracks 3 and 4 need to apply separately to Sheffield for the CHiME 5 data regardless of whether you participated in CHiME 5. To apply for the multi-channel data, visit

Non-profit organizations should sign the non-commercial license. Everyone else, regardless of use case (even if they are only using the data for non-commercial research), should apply for the commercial license.

## Account creation

• For system submission and scoring, this year we are using an instance of CodaLab hosted at:
• Each team should create one (and only one) account, which will then be used for submitting ALL of that team’s results for scoring. In CodaLab, the daily and lifetime submission limits are tied to user accounts, so it is imperative that each team use a SINGLE account to make ALL submissions.
• To create an account, navigate to: and fill out the following fields:
• email -- the contact email address you provided when registering for DIHARD; if you use a different email, when you later attempt to register for a track your request will not be approved
• Accept the terms and conditions, and click Sign Up. A confirmation email will then be sent to the email address that you entered. To activate your account, click on the confirmation link in this email.

## Troubleshooting

• If you do not see a confirmation email, check that it has not been caught by your email provider’s spam filter. You may find it by searching for the subject line “[CodaLab] Confirm email address for your CodaLab account”
• If you still do not see a confirmation email, try prompting CodaLab to resend it:
• If you still are unable to get a confirmation email, try using a different email address. Please then let us know at dihardchallenge@gmail.com which address you are using so that we may make a note of this on your registration. This will ensure that when you later register for the tracks, your requests are not denied.
• Finally, if none of the above work, contact us by email and we will attempt to resolve your issue.

## Setting up your team name

• In order for your team name to appear next to each submission on the leaderboard, you will need to add it to your CodaLab user profile. Please use the same name you used when registering for the challenge.
• Access the User Settings page by selecting Settings from your user menu (always found in the top right of the page with your username).
• Scroll down to the Competition settings section and look for the box titled Team name. Enter your team name into this box.
• Click Save Changes.

## Registering for tracks

• Due to limitations of CodaLab, each track has been created as a separate competition. The pages for the four competitions are:
• Before submitting to a track, you will have to register for it via our CodaLab instance. To register, navigate to the competition page of the track, click on the Participate tab, accept the terms and conditions, and click Register. A member of the DIHARD team will then review your registration request and approve it. Upon acceptance you will receive an email titled “Accepted into DIHARD Challenge...”.
• IMPORTANT: Your CodaLab account MUST use the same email address that you provided during DIHARD registration. If the addresses differ, your request will be denied.

## Results zip archive format

• System output for each track should be submitted as a zip file that expands into a single directory of RTTM files containing one RTTM file for each session in that track’s evaluation set. For instance, for tracks 1 and 2 this directory should contain one RTTM file for each FLAC file:
• 					      
DH_0001.rttm
DH_0002.rttm
...
DH_0194.rttm

and for tracks 3 and 4 this directory should contain one RTTM file for each Kinect array:
					      
S01_U01.rttm
S01_U02.rttm
...
S21_U06.rttm

RTTMs should be present for all sessions. If any RTTMs are missing your submission will NOT be scored.
• Examples of valid zip files for each track:
• To validate the RTTMs in your submission before creating the zip file, use the validate_rttm.py script from the dscore repo) with the command:
                                                
python validate_rttm.py rec1.rttm rec2.rttm …

• To validate your zip file’s structure, use the validate_submission.py script with the command:
                                                
python validate_submission.py track submission.zip

where track is the name of the track you are submitting to (one of “track1”, “track2”, “track3”, “track4”) and submission.zip is the name of your zip file.

## Submitting results via CodaLab

• Navigate to the competition page for the track you are submitting to and click on the Participate tab. This will bring up a page that allows you to make new submissions and see previous submissions.
• In the Method name field, enter the name of the system that you are submitting results for.
• Click Submit and select the zip file you wish to submit. This will upload the zip file for processing.

• Below the Submit button you will see a table listing all submissions you have made up to the current date with the following information for each:
• # -- ordinal number of submission in system; your first submission will be listed as 1
• SCORE -- DER for the submission; if the scoring is in progress or failed, this will read "---"
• METHOD NAME -- the name of the system that produced the submission
• FILENAME -- name of the zip file you submitted
• SUBMISSION DATE -- date and time of submission in MM/DD/YYY HH:MM:SS format (all times are UDT)
• STATUS -- the current status of your submission, which may be one of
• Submitting -- zip file is being uploaded
• Running -- upload is successful and scoring script is running
• Finished -- scoring script finished successfully and results posted to leaderboard
• Failed -- scoring script failed
• checkmark -- indicates whether or not submission is on the leaderboard
• If scoring failed for your submission, click the + symbol to the right of its entry in the table. This will display the following, which may be used for debugging purposes:
• Method name -- the method name you entered into the form
• View scoring output log -- the scoring program’s output to STDOUT
• View scoring error -- the scoring program’s output to STDERR

• After your submission finishes scoring (status “Finished”) it will post to the leaderboard, which is viewable from the Results tab.
• The leaderboard lists the most recent submission for each system by each team, ranked in ascending order by DER.
• For each submission on the leaderboard, the following fields are displayed:
• # -- ranking of system
• User -- the username for the account that submitted the result
• Entries -- total number of entries by account that submitted result
• Date of Last Entry -- date of last entry by user that submitted result in MM/DD/YY format
• Team Name -- name of team associated with user that submitted result; this is taken from the Team listed on the user’s profile
• Method Name -- the method name entered at submission time
• DER -- diarization error rate (in percent) of submission; ranking of this result is indicated in parentheses
• JER -- Jaccard error rate (in percent) of submission; ranking of this result is indicated in parentheses

## Rules

• Each team MUST use a SINGLE account to submit all results
• The team name listed in that user’s profile must be identical to the one you registered with.
• Each team is limited to 6 submissions per day.
• Submissions that are not scored (status shows as “Failed”) do not count against this limit

## Paper submission

For challenge participants contributing papers to the Interspeech special session, the deadlines for abstract submission and final paper submission are:
• Abstract submission -- March 29, 2019, midnight Anywhere on Earth
Time remaining:
• Paper submission -- April 5, 2019, midnight Anywhere on Earth
Time remaining:
Please follow instructions provided on: As topic, you should choose ONLY the special session:
13.13 The Second DIHARD Speech Diarization Challenge (DIHARD II)
• IMPORTANT: Papers must be registered in the Interspeech submission system by March 29 (midnight Anywhere on Earth). While the title, abstract, authors list, and pdf may all be changed after this date, a version MUST be submitted to the system with the correct topic by midnight on March 29.
• Papers should not repeat the descriptions of the tasks, metrics, datasets, or baseline systems, but should cite the challenge paper (available soon) using the following citation:
Ryant et al. (2019). The Second DIHARD Diarization Challenge: Dataset, task, and baselines. Proceedings of INTERSPEECH 2019. ISCA. Graz, Autria.
• All papers MUST cite the DIHARD II and SEEDLingS corpora using the following citations:
• Bergelson, E. (2016). Bergelson Seedlings HomeBank Corpus. doi: 10.21415/T5PK6D.
• Ryant et al. (2019). DIHARD Corpus. Linguistic Data Consortium.
• Accepted papers may update their results on the development and evaluation sets during the paper revision period (up to July 1).
• ## System Descriptions

At the end of the evaluation, all participating teams must submit a full description of their system with sufficient detail for a fellow researcher to understand the approach and data/computational requirements. For more details, please consult the evaluation plan.

## Final results

At the end of the evaluation, all final system outputs will be archived on Zenodo. Additional details will be provided at the completion of the evaluation.

## Results

During the evaluation, all results will be displayed on the CodaLab competition leaderboards: For each track we maintain two leaderboards:
• one consisting of results submitted prior to the Interspeech paper deadline on April 5th
• one consisting of all results

1. Must I participate in all tracks in the challenge?
No, researchers may choose to participate in a subset of the tracks. All participants MUST register for at least one of track 1 or track 3 (diarization from reference SAD). Participation in tracks 2 and 4 is optional. For example, you may participate only in track 1; only in track 3; or in tracks 3 and 4. (Other combinations are possible.)

2. Must I submit a paper to the Interspeech special session?
No, you are not required to submit to the special session in order to participate. Submission to the session is strongly encouraged, but not mandatory.

3. My team wishes to submit a paper to the Interspeech special session. What should we include?
Papers submitted to the special session should include preliminary results on the development and evaluation sets; these results may be updated during the paper revision period. If they choose to, papers may also report results on other corpora. Papers should not repeat descriptions of the tasks, metrics, datasets, or baseliens, but instead cite the challenge paper. For more details, please consult the paper submission instructions.

4. Are there any limitations about the training data?
Participants have the freedom to choose their own training data, whether it is publicly available or not. The only exception is that you should not use data that overlaps with the evaluation set. See the rules section of the evaluation plan for a listing of these sources. Please also note that clear descriptions of the data used are required in the final system descriptions document.

5. My team previously has acquired access to the full SEEDLingS corpus. Can we use this data for training or development?
No, the SEEDLingS data, whether acquired via HomeBank or some other route, is off limits for all purposes. This includes training and tuning, but also acoustic adaptation.

6. My team participated in DIHARD I. Can we use the DIHARD I development and evaluation sets for training or development?
The DIHARD I evaluation set is off limits for ALL PURPOSES. The DIHARD I development set may be used however you wish, though given that it is a subset of the DIHARD II development set, we expect it to have limited utility.

7. Can I use the DIHARD II development set to do data simulation and augmentation?
Yes, development data is free to be used in any way you see fit, including for tuning your current diarization system or augmenting training data.

8. How can I upload the results?
Please see the results submission instructions.

9. Which files should I submit?
All submissions should consist exclusively of RTTMs output by your system. For tracks 1 and 2 there should be one RTTM per FLAC file in the single channel evaluation set. For tracks 3 and 4, there should be one RTTM per Kinect array in each CHiME-5 evaluation set session. For full details about what to submit and formatting of your submission, please consult the results submission instructions.

10. For the multichannel tracks (tracks 3 and 4), should we produce one RTTM per Kinect array or one for the entire session
Please refer to the previous question.

11. For the multichannel tracks (tracks 3 and 4), can we use multiple Kinect arrays to produce each RTTM? That is, could we opt to use audio from arrays U01, U02, and U03 to produce the RTTM for array U01?
Participants should produce ONE RTTM per Kinect array, each the output of the system when considering ONLY the channels from that array. For instance, for session S21 they should produce the following RTTMs:
• S21_U01.rttm -- produced using only the channels from array U01
• S21_U02.rttm -- produced using only the channels from array U02
• S21_U03.rttm -- produced using only the channels from array U03
• S21_U04.rttm -- produced using only the channels from array U04
• S21_U05.rttm -- produced using only the channels from array U05
• S21_U06.rttm -- produced using only the channels from array U06

12. What should I report in the system descriptions document?
Clear documentation of each system on the final leaderboard is required, providing sufficient detail for a fellow researcher to understand the approach and data/computational requirements. This includes, as mentioned above, explanation of any training data used. For further details, consult the system descriptions instructions.

13. Are teams with members from multiple organizations allowed?
Yes, teams spanning multiple organizations are allowed, though one person from each organization within the team must sign and return the LDC Data License Agreement. One individual should serve as the team's point of contact for DIHARD, but every organization with access to the data must sign the evaluation agreement.

14. I attempted to register an account with CodaLab, but am unable to get a confirmation email. What should I do?
Please consult our registration troubleshooting tips.