-
Notifications
You must be signed in to change notification settings - Fork 563
Open
Labels
bugSomething isn't workingSomething isn't workingbuildBuild process related matters (e.g. build system).Build process related matters (e.g. build system).
Description
🐛 Bug
GCC fails to compile PyTorch/XLA (163193e -- master branch), ending with an internal compiler error (ICE), apparently caused by a segmentation fault.
ERROR: external/xla/xla/service/spmd/shardy/stablehlo_round_trip/BUILD:44:11: Compiling xla/service/spmd/shardy/stablehlo_round_trip/export_ops.cc failed: (Exit 1): gcc failed: error executing CppCompile command (from target @@xla//xla/service/spmd/shardy/stablehlo_round_trip:export_ops) /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 128 arguments skipped)
In file included from <command-line>:
/usr/include/stdc-predef.h: In substitution of 'template<class _Functor, class, class> std::function<std::unique_ptr<mlir::Pass>()>::function(_Functor) [with _Functor = <missing>; <template-parameter-1-2> = <missing>; <template-parameter-1-3> = <missing>]':
external/xla/xla/service/spmd/shardy/stablehlo_round_trip/export_ops.cc:249:53: required from here
/usr/include/stdc-predef.h:32:70: internal compiler error: Segmentation fault
32 | whether the overall intent is to support these features; otherwise,
| ^
0x7e2852d42ddf ???
./signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
0x7e2852d2dd79 __libc_start_main
../csu/libc-start.c:308
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <file:///usr/share/doc/gcc-10/README.Bugs> for instructions.
[17,006 / 19,133] Compiling xla/mlir_hlo/mhlo/IR/hlo_ops.cc; 99s local ... (24 actions, 23 running)
Target //:_XLAC.so failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 697.676s, Critical Path: 175.00s
INFO: 17030 processes: 10181 internal, 6849 local.
ERROR: Build did NOT complete successfully
error: command '/usr/local/bin/bazel' failed with exit code 1
error: subprocess-exited-with-error
While I'm not sure exactly what this is, it could be related to this gcc-10
bug.
Setup
Same image used in this CI run: https://github.com/pytorch/xla/actions/runs/17246611174/job/48937880278?pr=9588
Docker image: us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/development:3.12_tpuvm
Docker image sha256: c194788bb5ea6d76806371a80c59c08f33c5ed0186a88e4e65a54245cf0a9014
Additional Context
- Apparently, @qihqi also hit this error, which is why he updated
gcc-10
togcc-11
in Update XLA pin then fix up to make it compile #9565 (not on CI). - It's odd that CI doesn't end with this same error, even though the docker image used (thus, the compiler) is the same.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingbuildBuild process related matters (e.g. build system).Build process related matters (e.g. build system).