On Scaling Up a Multilingual Vision and Language ModelXi Chen,Josip Djolonga,Piotr Padlewski,Basil Mustafa,Soravit Changpinyo,Jialin Wu,Carlos Riquelme Ruiz,Sebastian Goodman,Xiao Wang,Yi Tay,Siamak Shakeri,Mostafa Dehghani,Daniel Salz,Mario Lučić,Michael Tschannen,Arsha Nagrani,Hexiang Hu,Mandar Joshi,Bo Pang,Ceslee Montgomery, Paulina Pietrzyk,Marvin Ritter,AJ Piergiovanni,Matthias Minderer,Filip Pavetic,Austin Waters,Gang Li,Ibrahim Alabdulmohsin,Lucas Beyer,Julien Amelot,Kenton Lee,Andreas Steiner,Yang Li,Daniel Keysers,Anurag Arnab,Yuanzhong Xu,Keran Rong,Alexander Kolesnikov,Mojtaba Seyedhosseini,Anelia Angelova,Xiaohua Zhai,Neil Houlsby,Radu SoricutCVPR 2024(2024)引用 140|浏览441关键词vision,multimodal,language,pretrainingAI 理解论文溯源树样例生成溯源树,研究论文发展脉络Chat Paper正在生成论文摘要