Skip to content

MoE All-to-All communications modeling with simulation and performance modelling

Notifications You must be signed in to change notification settings

forknay/moe_comms

Repository files navigation

moe_comms

MoE All-to-All communications modeling with simulation and performance modeling

Follow steps to use:

  1. Set desired parameters and mode in params.py
  2. Run simulation.py to generate routing data
  3. Run perf_model.py to run communications simulation
  4. See traces in comm_log.txt (diagrams detailing the high-level design of simulation.py and perf_model.py can be found in the PDF)

Currently Supports:

  • Parameters for different MoE configurations and hardware restrictions
  • Routing data generation (with load imbalance parameters)
  • Routing data conversion to bytes for each node (dst, src)
  • Full mesh communication of a cluster (with host) with certain assumptions + Communication time
  • Multiple links per connection between nodes (can send fragment of a load that is > intra_bw or send in a different direction)

Not supported (yet) / Assumptions:

  • Hierarchical communication (source to cluster, cluster to node)
  • Create visualization
  • Throughput calculation (prefill / decode)
  • Add delay/transfer parallelization for each round (ie each packet needs to prepare but can be prepared while last packet is sending) (list all packets for a round, choose link with most packets, do parallelization for those to find critical path)
  • Add comms for allocation ** ask where
  • Add flag size for packets
  • Currently round robin only applied inside a node load, ie starting at node 0 it will try to finish all node 0 sends before moving to node 1, inefficiency as it might have no receives during first rounds
  • PCIe FIFO Buffer Size (smaller packets would have inefficiencies)
  • Add support for multiple GPUs per node (easy)

About

MoE All-to-All communications modeling with simulation and performance modelling

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages