AccentFold: A Journey through African Accents for Zero-Shot ASR Adaptation to Target Accents
Conference of the European Chapter of the Association for Computational Linguistics(2024)
摘要
Despite advancements in speech recognition, accented speech remains
challenging. While previous approaches have focused on modeling techniques or
creating accented speech datasets, gathering sufficient data for the multitude
of accents, particularly in the African context, remains impractical due to
their sheer diversity and associated budget constraints. To address these
challenges, we propose AccentFold, a method that exploits spatial
relationships between learned accent embeddings to improve downstream Automatic
Speech Recognition (ASR). Our exploratory analysis of speech embeddings
representing 100+ African accents reveals interesting spatial accent
relationships highlighting geographic and genealogical similarities, capturing
consistent phonological, and morphological regularities, all learned
empirically from speech. Furthermore, we discover accent relationships
previously uncharacterized by the Ethnologue. Through empirical evaluation, we
demonstrate the effectiveness of AccentFold by showing that, for
out-of-distribution (OOD) accents, sampling accent subsets for training based
on AccentFold information outperforms strong baselines a relative WER
improvement of 4.6
performance on accented speech, particularly in the context of African accents,
where data scarcity and budget constraints pose significant challenges. Our
findings emphasize the potential of leveraging linguistic relationships to
improve zero-shot ASR adaptation to target accents.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要