Open ASR Leaderboard incorporates private data

Since September 2023 the community has been using the Open ASR Leaderboard to compare speech-recognition models. Ever wondered what happens when someone optimizes for the test instead of improving real-world performance? Hugging Face's answer was to add a set of private datasets to reduce benchmaxxing and better measure robustness in conversational settings and diverse accents.

What changed and why

The new addition: Appen Inc. and DataoceanAI contributed several high-quality English datasets (scripted and conversational) that are not published to avoid contaminating the test set. Why keep them private? Because when public tests are too available, some teams can tweak their models specifically for those examples and get high scores without real production improvements.

Important: by default, the Average WER on the leaderboard is still calculated only with public datasets. You can enable an option to include the private data and see how the metrics change.

Dataset	Accent	Duration [h]	Male (%) / Female (%)	Style	Transcription
Appen Scripted AU	Australian	1.42	49 / 51	Read	Punctuated, cased.
Appen Scripted CA	Canadian	1.53	52 / 48	Read	Punctuated, cased.
Appen Scripted IN	Indian	1.02	49 / 51	Read	Punctuated, cased.
Appen Scripted US	American	1.45	49 / 51	Read	Punctuated, cased.
Appen Conversational IN	Indian	1.37	51 / 49	Conversational, spontaneous	Punctuated, disfluencies.
Appen Conversational US003	American	1.64	49 / 51	Conversational, spontaneous	Punctuated, cased, disfluencies.
Appen Conversational US004	American	1.65	49 / 51	Conversational, spontaneous	Punctuated, disfluencies.
DataoceanAI Scripted US	American	2.43	54 / 46	Read	Punctuated, cased (proper nouns), disfluencies.
DataoceanAI Scripted GB	British	2.43	47 / 53	Read	Punctuated, disfluencies.
DataoceanAI Conversational US	American	8.82	NA	Conversational, spontaneous	Punctuated, disfluencies.
DataoceanAI Conversational GB	British	5.96	NA	Conversational, spontaneous	Punctuated, disfluencies.

What changed and why

The datasets: technical details

How this affects metrics (WER and averages)

Process to upload and verify your model

Risks, mitigations and limitations

What this means for you as a developer or user

Original source

Stay up to date!

Open ASR Leaderboard incorporates private data