Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] A systematic tensor parallelism verifier #37

Open
comaniac opened this issue Feb 1, 2023 · 0 comments
Open

[Feature] A systematic tensor parallelism verifier #37

comaniac opened this issue Feb 1, 2023 · 0 comments

Comments

@comaniac
Copy link
Contributor

comaniac commented Feb 1, 2023

We need a systematic approach to verify whether a scheduled model with tensor parallelism still produces the same results (i.e., outputs and gradients). It should check the following:

  • Whether .shard and .sync are correctly specified to maintain the shape correctness. For example, shard the weight of a linear layer by its output feature dimension results in partitioned outputs. In this case, we either need an all-gather right after the linear, or the next linear must shard its weight by input feature dimension.
  • Whether .shard and .sync are correctly specified to maintain the functional correctness. For example, shard the weight of a linear layer by its input feature dimension results in the same shape but partial sum outputs. In this case, all-reduce is required.
  • Whether the random seeds of each dropout is property configured. In the case that the input tensor of dropout is not partitioned (i.e., replica and partial sum), the random seed on each device should be the same, because all devices are supposed to do redundant computations. On the other hand, if the input tensor is partitioned (i.e., the shape on each device is divided by TP group size), the random seed on each device should be different to avoid repeated dropout patterns that hurt the convergence.

We use this issue to discuss possible solutions and track the progress. At the first glance, we may consider a compiler approach that perform type inference on static graphs. We could use TorchDynamo or LazyTensor to capture static graph and apply this analysis. However, this approach won't work if the module cannot be captured as a static graph due to coding style limitations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant