[论文笔记] CPM-2: Large-scale Cost-effective Pre-trained Language Models

论文题目:CPM-2: Large-scale Cost-effective Pre-trained Language Models

作者:Zhengyan Zhang, Yuxian Gu, Xu Han, Shengqi Chen, Chaojun Xiao, Zhenbo Sun, Yuan Yao, Fanchao Qi, Jian Guan, Pei Ke, Yanzheng Cai, Guoyang Zeng, Zhixing Tan, Zhiyuan Liu, Minlie Huang, Wentao Han, Yang Liu, Xiaoyan Zhu, Maosong Sun

单位:Tsinghua Univerisity, Beijing Academy of Artificial Intelligence (BAAI)

期刊:Arxiv

发表日期:2021.06.20

快速总结

本文贡献主要有以下几点:

总的来说这篇论文工程性很强,贡献中的三个部分基本都是参考的现有工作,原创性贡献主要是实现了单GPU上大规模MoE模型的推理以及训练了中文的198B预训练模型。