Skip to content

fix bazel OOM problems #1034

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion tao/.bazelrc
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ build:disc --config=release_base

build:disc_cpu --config=disc --cxxopt=-DTAO_CPU_ONLY
build:disc_x86 --config=disc_cpu --config=release_cpu_linux --cxxopt=-DTAO_X86 --define disc_x86=true
build:disc_aarch64 --config=disc_cpu --cxxopt=-DTAO_AARCH64 --define disc_aarch64=true --linkopt="-Xlinker --stub-group-size -Xlinker 10000000"
build:disc_aarch64 --config=disc_cpu --cxxopt=-DTAO_AARCH64 --define disc_aarch64=true --linkopt="-Xlinker --stub-group-size -Xlinker 10000000" --experimental_local_memory_estimate --jobs=10
build:disc_cuda --config=disc --config=cuda
build:disc_dcu --config=disc --config=dcu
build:disc_rocm --config=disc --config=rocm
Expand Down
2 changes: 1 addition & 1 deletion tao_compiler/.bazelrc.user
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ build:ci_build --noshow_loading_progress --show_progress_rate_limit=600 --jobs=1
build:disc --define framework_shared_object=false --experimental_multi_threaded_digest
build:disc_cpu --config=disc --cxxopt=-DTAO_CPU_ONLY
build:disc_x86 --config=disc_cpu --config=release_cpu_linux --cxxopt=-DTAO_X86 --define disc_x86=true
build:disc_aarch64 --config=disc_cpu --config=mkl_aarch64 --cxxopt=-DTAO_AARCH64 --define disc_aarch64=true --linkopt="-Xlinker --stub-group-size -Xlinker 10000000"
build:disc_aarch64 --config=disc_cpu --config=mkl_aarch64 --cxxopt=-DTAO_AARCH64 --define disc_aarch64=true --linkopt="-Xlinker --stub-group-size -Xlinker 10000000" --experimental_local_memory_estimate --jobs=10
build:disc_cuda --config=disc --config=cuda
build:disc_dcu --config=disc --config=dcu

Expand Down
2 changes: 2 additions & 0 deletions tensorflow_blade/.bazelrc
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ build:disc_aarch64 --cxxopt=-DTAO_AARCH64
build:disc_aarch64 --define disc_aarch64=true
build:disc_aarch64 --linkopt="-Xlinker --stub-group-size -Xlinker 10000000"
build:disc_aarch64 --action_env BUILD_WITH_AARCH64=1
build:disc_aarch64 --experimental_local_memory_estimate
build:disc_aarch64 --jobs=10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 10 too small for parallelism? @qiuxiafei @Yancey1989

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about determining this number according to the number of the cores on the machine

Copy link
Collaborator

@qiuxiafei qiuxiafei Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 10 is usually to small. And as on different machines, number of cores varies, there won't be a fit-for-all number. How about --local_cpu_resources/--local_ram_resources here: https://bazel.build/docs/user-manual#local-resources

build:disc_cuda --config=disc --config=cuda
build:disc_dcu --config=disc --config=dcu

Expand Down