Sharing Knowledge about Foundation Models

Homepage: https://xuefuzhao.github.io/

Twitter: https://twitter.com/XueFz

Email: xuefuzhao at outlook.com, f.xue at u.nus.edu


Table of Content

Mar 2024 | Take a Closer Look at the MoE LLM Routing

Sep 2023 | Encoder-Decoder is actually not that different from Decoder-only

Aug 2023 | OpenMoE v0.2 Release

May 2023 | What is the relationship between transformer scaling and training objective?


Mar 2024 | Take a Closer Look at the MoE LLM Routing

Mar 27, 2024

MoE is so widely discussed and used now, but what are the experts specializing at? In this blog, let’s take a closer look at the MoE specializing and routing to understand MoE LLMs better.


Sep 2023 | Encoder-Decoder is actually not that different from Decoder-only

Sep 27, 2023

Do not get annoyed for the Encoder-Decoder vs Decoder-only. It doesn’t matter. The gap is mainly from the difference of trainable parameters, which can be patched by MoE easily. Instead, if you are interested in improve the LLM pretraining, please work more on data (scale, quality, mixture) and smart training objectives.