Quality at a Glance: an Audit of Web-Crawled Multilingual Datasets
Julia Kreutzer,Isaac Caswell,Lisa Wang,Ahsan Wahab,Daan van Esch,Nasanbayar Ulzii-Orshikh,Allahsera Tapo,Nishant Subramani,Artem Sokolov,Claytone Sikasote, Monang Setyawan,Supheakmungkol Sarin, Sokhar Samb,Benoit Sagot,Clara Rivera,Annette Rios,Isabel Papadimitriou,Salomey Osei,Pedro Ortiz Suarez,Iroro Orife,Kelechi Ogueji,Andre Niyongabo Rubungo,Toan Q. Nguyen,Mathias Mueller, Andre Mueller,Shamsuddeen Hassan Muhammad, Nanda Muhammad, Ayanda Mnyakeni,Jamshidbek Mirzakhalov,Tapiwanashe Matangira,Colin Leong, Nze Lawson,Sneha Kudugunta,Yacine Jernite,Mathias Jenny,Orhan Firat,Bonaventure F. P. Dossou,Sakhile Dlamini,Nisansa de Silva,Sakine Cabuk Balli,Stella Biderman,Alessia Battisti,Ahmed Baruwa,Ankur Bapna,Pallavi Baljekar,Israel Abebe Azime,Ayodele Awokoya,Duygu Ataman,Orevaoghene Ahia,Oghenefego Ahia,Sweta Agrawal,Mofetoluwa Adeyemi Transactions of the Association for Computational Linguistics(2022)
AI 理解论文
溯源树
样例
