Skip to main content

Large Model

Mixed Expert (MoE) Model Notes
·1388 words·7 mins
MoE Large Model AI Paper Reading
This article mainly sorts out the relevant concepts of the hybrid expert model (MoE), and introduces the architectures and optimization methods of several open source MoE models, such as GShard, Switch Transformers, DeepSeek-MoE, and LLaMA-MoE. The characteristics and optimization methods of these models are also introduced.