MoE-Kit

This repository contains implementations of mixture-of-expert models such as the Switch Transformer (Fedus et al. 2021), exploring the ways in which conditional computation can be exploited to scale model parameter count independently of compute as well as its effects on performance and training time.

For more, please see ROADMAP.md

Introduction

For points of contact, please directly contact the authors of this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
benchmark_logs		benchmark_logs
data		data
models		models
switch_transformer		switch_transformer
vanilla_transformer		vanilla_transformer
.gitignore		.gitignore
README.md		README.md
ROADMAP.md		ROADMAP.md
SMoE_temp.ipynb		SMoE_temp.ipynb
benchmark.ipynb		benchmark.ipynb
flop_counter.py		flop_counter.py
switch_sweep.yaml		switch_sweep.yaml
train_logs_manual.txt		train_logs_manual.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MoE-Kit

Introduction

Data Acknowledgements

About

Releases

Packages

Languages

terru3/moe-kit

Folders and files

Latest commit

History

Repository files navigation

MoE-Kit

Introduction

Data Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages