Microsoft has introduced Tutel, a high-performance library to facilitate the development of large-scale MoE (mixture-of-experts) models. Tutel is integrated into the Meta Fairsec toolkit.
MoE is a deep learning model architecture in which computational costs grow with the number of parameters slower than a linear function. Currently, MoE is the only demonstrated approach to scaling deep learning models to over a trillion parameters.
Tutel is optimized for Azure NDM A100 v4. Thanks to Tutel, the use of MoE models is simplified and becomes more efficient. For a single layer, MOE Tutel provides 8.49-fold acceleration on an NDM A100 v4 node with 8 GPUs and 2.75-fold acceleration on 64 NDM A100 v4 nodes with 512 A100 GPUs, respectively, compared to modern MoE implementations such as the Facebook AI Research Sequence-to-Sequence (Fairseq) Meta.
Microsoft worked on Tutel together with Meta and integrated the library into the Fairsec toolkit.