Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge disc backend to acc 2.3 #3

Merged
merged 12 commits into from
Oct 11, 2024
Merged

Merge disc backend to acc 2.3 #3

merged 12 commits into from
Oct 11, 2024

Conversation

yitongh
Copy link

@yitongh yitongh commented Aug 8, 2024

Open this PR to track the modifications on disc backend.

@CLAassistant
Copy link

CLAassistant commented Aug 8, 2024

CLA assistant check
All committers have signed the CLA.

Support Disc as backend
Co-authored-by: yancey.yx <[email protected]>
Co-authored-by: wangang.wa <[email protected]>
Yancey1989 and others added 5 commits August 8, 2024 14:32
* add flag to disable disc backend in bazel workspace
support disc backend debug mode to dump DISC compilation logs
* fix bazel flag when complie python

* fix lint.
add float-norm pass to support bf16 amp training
@yitongh
Copy link
Author

yitongh commented Aug 8, 2024

The prs that have not been merged yet:

  • e5470e3d50d528f49f287cfbb015e0d10897b897
  • cceb7f76fa2c1dcdc3260cef75d3cf34e0744126
  • 3fa4fe284dacef4ae447271356d4719cd223c2e2
  • c820ca030a51429a97654178afdefaf5f88d4485
  • 4c7fbd81383bbd05de2aca7b3c84731f197e9628

cceb7f76fa2c1dcdc3260cef75d3cf34e0744126 should be merged before support disc debug mode to dump mhlo and logs because "//torch_xla/csrc/runtime:env_vars" and "//torch_xla/csrc/runtime:sys_util" can be linked.

FA 256 support: #4

@yitongh
Copy link
Author

yitongh commented Aug 8, 2024

TODO:

  • third_party/nccl/nccl.h not found
  • ENABLE_DISC=0 not work

yitongh and others added 4 commits August 12, 2024 14:45
* fix build failed on nccl

* using nccl hdrs
* change the device type of disc to cuda to make amp work properly

* Use the value of DISC_DEVICE as the device type of disc backend
@anw90 anw90 merged commit fab18e0 into acc Oct 11, 2024
2 checks passed
@anw90 anw90 deleted the merge_disc branch October 11, 2024 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants