You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add an option when sharding a dataset to have all shards the same size. Will be good to provide both an option of duplication, and by truncation.
Motivation
Currently the behavior of sharding is "If n % i == l, then the first l shards will have length (n // i) + 1, and the remaining shards will have length (n // i).". However, when using FSDP we want the shards to have the same size. This requires the user to manually handle this situation, but it will be nice if we had an option to shard the dataset into equally sized shards.
Your contribution
For now just a PR. I can also add code that does what is needed, but probably not efficient.
Shard to equal size by duplication:
Feature request
Add an option when sharding a dataset to have all shards the same size. Will be good to provide both an option of duplication, and by truncation.
Motivation
Currently the behavior of sharding is "If n % i == l, then the first l shards will have length (n // i) + 1, and the remaining shards will have length (n // i).". However, when using FSDP we want the shards to have the same size. This requires the user to manually handle this situation, but it will be nice if we had an option to shard the dataset into equally sized shards.
Your contribution
For now just a PR. I can also add code that does what is needed, but probably not efficient.
Shard to equal size by duplication:
Or by truncation:
The text was updated successfully, but these errors were encountered: