Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
Lili Yu,Bowen Shi,Ramakanth Pasunuru,Benjamin Muller,Olga Golovneva,Tianlu Wang,Arun Babu,Binh Tang,Brian Karrer,Shelly Sheynin,Candace Ross,Adam Polyak,Russell Howes,Vasu Sharma,Puxin Xu,Hovhannes Tamoyan,Oron Ashual,Uriel Singer,Shang-Wen Li,Susan Zhang, Richard James,Gargi Ghosh,Yaniv Taigman,Maryam Fazel-Zarandi,Asli Celikyilmaz,Luke Zettlemoyer,Armen Aghajanyan arXivorg(2023)
关键词
Multimodal Fusion,Language Modeling,Image Captioning,Topic Modeling
AI 理解论文
溯源树
样例
