Hebrew Speech Recognition Leaderboard

Welcome to the Hebrew Speech Recognition Leaderboard! This is a community-driven effort to track and compare the performance of various speech recognition models on Hebrew language tasks.

This leaderboard is maintained by ivrit.ai, a project dedicated to advancing Hebrew language AI technologies. You can find our work on GitHub and Hugging Face.

Motivation

Hebrew presents unique challenges for speech recognition due to its rich morphology, absence of written vowels, and diverse dialectal variations. This leaderboard aims to:

Provide standardized benchmarks for Hebrew ASR evaluation
Track progress in Hebrew speech recognition technology
Foster collaboration in the Hebrew NLP community
Make Hebrew speech technology more accessible

Benchmarks

The following datasets are used in our evaluation:

ivrit-ai/eval-d1

Size: 2 hours
Domain: Manual transcription of a single podcast episode featuring an informal conversation between two speakers (male and female). Audio is segmented into approximately 5-minute chunks.
Source: Part of the ivrit.ai corpus. Selected episode has been manually transcribed to golden standard quality to serve as a high-quality evaluation benchmark.

ivrit-ai/eval-whatsapp

Size: 1:10 hours
Domain: Freestyle WhatsApp recordings made by volunteers.
Source: ivrit.ai volunteers. Manually transcribed by an expert.

SASpeech

Size: 4 hours (manually corrected portion of the corpus)
Domain: Economic and political podcast content, containing both read speech and conversational segments. Segments are several seconds in length.
Source: Derived from the Robo-Shaul project and published in the paper "SASPEECH: A Hebrew Single Speaker Dataset for Text To Speech and Voice Conversion" (Sharoni, O., Shenberg, R., Cooper, E. (2023) SASPEECH: A Hebrew Single Speaker Dataset for Text To Speech and Voice Conversion. Proc. INTERSPEECH 2023,)

google/fleurs/he

Size: 2 hours (test set of the corpus)
Domain: Read speech covering common topics and phrases in Hebrew
Source: Created as part of Google's FLEURS project, designed for multilingual speech tasks and evaluation. Data collected through crowdsourcing from Hebrew speakers.

mozilla-foundation/common_voice_17_0/he

Size: 2 hours (validated set of the corpus)
Domain: Read sentences in Hebrew from various texts.
Source: Collected through Mozilla's Common Voice initiative, where volunteers contribute recordings and validate other speakers' contributions

imvladikon/hebrew_speech_kan

Size: 1.7 hours (validation set of the corpus)
Domain: Varied content types from the Kan (Israeli Public Broadcasting Corporation) youtube channel
Source: Published by Vladimir Gurevich. Scraped audio and subtitles data from YouTube channel "כאן" (Kan).


amazon-transcribe	ivrit-ai/whisper-large-v3-turbo-ct2-20250513	0.051	0.072	0.064	0.174	0.149	0.081


faster-whisper	ivrit-ai/whisper-large-v3-ct2-20250513	0.051	0.072	0.064	0.174	0.149	0.081
faster-whisper	ivrit-ai/whisper-large-v3-ct2-20250403	0.055	0.075	0.071	0.2	0.171	0.095
faster-whisper	ivrit-ai/whisper-large-v3-ct2-20250209	0.059	0.082	0.074	0.208	0.172	0.094
faster-whisper	ivrit-ai/faster-whisper-v2-d4	0.061	0.098	0.08	0.241	0.207	0.113
faster-whisper	ivrit-ai/faster-whisper-v2-d3-e3	0.068	0.104	0.086	0.255	0.214	0.139
amazon-transcribe	batch	0.066	0.104	0.085	0.23	0.141	0.09
faster-whisper	large-v2	0.077	0.121	0.098	0.266	0.233	0.164
faster-whisper	large-v3-turbo	0.084	0.128	0.104	0.289	0.28	0.156
faster-whisper	ivrit-ai/whisper-large-v3-turbo-ct2-20250513	0.053	0.071	0.066	0.181	0.151	0.082
faster-whisper	ivrit-ai/whisper-large-v3-turbo-ct2-20250403	0.055	0.061	0.074	0.208	0.183	0.1
faster-whisper	ivrit-ai/whisper-large-v3-turbo-ct2-20250209	0.071	0.104	0.077	0.245	0.209	0.1
amazon-transcribe	stream	0.079	0.129	0.09	0.287	0.2	0.131
faster-whisper	large-v3	0.098	0.132	0.094	0.262	0.231	0.134
google-speech	google-speech	0.211	0.352	0.189	0.385	0.38	0.292
openai	gpt-4o-transcribe	0.073	0.126	0.109	0.21	0.169	0.394
openai	gpt-4o-mini-transcribe	0.09	0.158	0.15	0.3	0.237	0.468
elevenlabs	scribe_v1	0.2	0.264	0.068	0.181	0.156	0.109