You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After #5581, torch dynamo can be enabled by engine.compile and there's no longer a config for that, which simplify things a lot.
Internally, we are primarily supporting a few models, specifically LLaMA and its variants. To minimize user effort, we aim to:
Automatically set leaf modules: For example, LlamaDecoderLayer for LLaMA.
Adjust prefetch arguments: Optimize prefetch settings when dynamo is enabled (max_live_parameters and prefetch_bucket_size to be specific)
Previously, in DeepSpeed, we could detect if dynamo was enabled via ds_config, allowing us to apply these adjustments seamlessly. However, with the recent changes, it is now challenging to determine if dynamo is enabled during the ds_init phase.
To solve that, IMO there're several options:
just warn when dynamo is enabled later and the tuning I mentioned is not enabled, ask the user to change their code/config
We're currently migrating from option 2 to option 3, for that's easy to maintain and add less ad-hoc logic in deepspeed. We'd like to know your options here, not limited to if you think this is a good idea overall, what do you think is the best way to implement, would you like to get this upstreamed.
After #5581, torch dynamo can be enabled by
engine.compile
and there's no longer a config for that, which simplify things a lot.Internally, we are primarily supporting a few models, specifically LLaMA and its variants. To minimize user effort, we aim to:
LlamaDecoderLayer
for LLaMA.max_live_parameters
andprefetch_bucket_size
to be specific)Previously, in DeepSpeed, we could detect if dynamo was enabled via ds_config, allowing us to apply these adjustments seamlessly. However, with the recent changes, it is now challenging to determine if dynamo is enabled during the ds_init phase.
To solve that, IMO there're several options:
We're currently migrating from option 2 to option 3, for that's easy to maintain and add less ad-hoc logic in deepspeed. We'd like to know your options here, not limited to if you think this is a good idea overall, what do you think is the best way to implement, would you like to get this upstreamed.
Thanks!
@tohtana @loadams cc @SunMarc @tjruwase
The text was updated successfully, but these errors were encountered: