You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
get a toy FSDP loop working on Gaudi (incomplete so far)
test our code on cards - this shouldn't work immediately.
check if we can run a toy Accelerate+FSDP loop working on Gaudi cards (I'm worried this won't work)
if YES then we change our code to accommodate Gaudi+AMD+Nvidia, and then build config.s
if NO then we have two choices:
implement a Gaudi-only FSDP training loop (the easiest / cheapest option, and probably the one we'll go with)
patch Accelerate to support Gaudi and work on committing this upstream (something I think we should do even if we go with route 1)
Risks:
Current machine used to test this has issues that are delaying overall progress for James to fully confirm whether or not we can target this for 1.3. He is currently discussing with the Intel team and @tiran to troubleshoot further. This is currently a major highlighted risk since we want to target this Tech Preview for 1.3.
The text was updated successfully, but these errors were encountered:
Tasks Needed:
Risks:
Current machine used to test this has issues that are delaying overall progress for James to fully confirm whether or not we can target this for 1.3. He is currently discussing with the Intel team and @tiran to troubleshoot further. This is currently a major highlighted risk since we want to target this Tech Preview for 1.3.
The text was updated successfully, but these errors were encountered: