基本信息
浏览量:73
职业迁徙
个人简介
I’ve been writing papers on neural language modeling since 2016. My focus is on figuring out what issues aren’t solved by scaling in LMs, and how to improve LMs without increasing the number of parameters, runtime or memory usage.
The weight tying method I developed is used today by almost all big language and translation models, including OpenAI’s GPT, Google’s BERT, and the translation models of Google, Microsoft, Meta and Amazon.
Our ALiBi method showed for the first time how to efficiently enable LMs to handle longer sequences at inference than the ones they were trained on. It has been adopted by BigScience’s 176 billion parameter BLOOM model, by the MPT series of models from MosaicML, by Replit’s models and many others.
The weight tying method I developed is used today by almost all big language and translation models, including OpenAI’s GPT, Google’s BERT, and the translation models of Google, Microsoft, Meta and Amazon.
Our ALiBi method showed for the first time how to efficiently enable LMs to handle longer sequences at inference than the ones they were trained on. It has been adopted by BigScience’s 176 billion parameter BLOOM model, by the MPT series of models from MosaicML, by Replit’s models and many others.
研究兴趣
论文共 16 篇作者统计合作学者相似作者
按年份排序按引用量排序主题筛选期刊级别筛选合作者筛选合作机构筛选
时间
引用量
主题
期刊级别
合作者
合作机构
Minyang Tian,Luyu Gao, Shizhuo Dylan Zhang, Xinan Chen, Cunwei Fan, Xuefei Guo, Roland Haas, Pan Ji, Kittithat Krongchon, Yao Li, Shengyan Liu, Di Luo,
CoRR (2024)
引用0浏览0EI引用
0
0
John Yang,Carlos E. Jimenez,Alexander Wettig, Kilian Lieret,Shunyu Yao,Karthik Narasimhan,Ofir Press
CoRR (2024)
引用0浏览0EI引用
0
0
CoRR (2024)
引用0浏览0EI引用
0
0
EMNLP 2023 (2023): 5687-5711
引用40浏览0引用
40
0
Conference on Empirical Methods in Natural Language Processing (2022)
加载更多
作者统计
合作学者
合作机构
D-Core
- 合作者
- 学生
- 导师
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn