Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different DDP ranks have different bn_stats after PreciseBN hook as precise_bn in fvcore does not synchronize batch_size #5398

Open
guzy0324 opened this issue Nov 25, 2024 · 0 comments

Comments

@guzy0324
Copy link

Instructions To Reproduce the 🐛 Bug:

  1. Full runnable code or full changes you made:
    https://github.com/facebookresearch/moco/tree/main/detection

  2. What exact command you run:

    python detection/train_net.py \
    --config-file detection/configs/pascal_voc_R_50_C4_24k.yaml \
    --num-gpus 8 \
    OUTPUT_DIR "temp/train" \
    SEED 0 \
    SOLVER.MAX_ITER 1
  3. Full logs or other relevant observations:

    In update_bn_stats, different ranks have different batch_size as it's not synchronized, resulting in different bn_stats on different ranks after PreciseBN hook.

  4. please simplify the steps as much as possible so they do not require additional resources to
    run, such as a private dataset.

    no private dataset

Expected behavior:

Same bn_stats after PreciseBN hook.

Environment:

Provide your environment information using the following command:

wget -nc -q https://github.com/facebookresearch/detectron2/raw/main/detectron2/utils/collect_env.py && python collect_env.py
-------------------------------  ---------------------------------------------------------------------------------------------
sys.platform                     linux
Python                           3.8.0 | packaged by conda-forge | (default, Nov 22 2019, 19:11:38) [GCC 7.3.0]
numpy                            1.24.4
detectron2                       0.6 @/mdata/guzy0324/anaconda3/envs/DICR_new/lib/python3.8/site-packages/detectron2
Compiler                         GCC 11.4
CUDA compiler                    CUDA 12.4
detectron2 arch flags            8.9
DETECTRON2_ENV_MODULE            <not set>
PyTorch                          2.4.1+cu124 @/mdata/guzy0324/anaconda3/envs/DICR_new/lib/python3.8/site-packages/torch
PyTorch debug build              False
torch._C._GLIBCXX_USE_CXX11_ABI  False
GPU available                    Yes
GPU 0,1,2,3,4,5,6,7              NVIDIA GeForce RTX 4090 D (arch=8.9)
Driver version                   550.54.14
CUDA_HOME                        /usr/local/cuda
Pillow                           9.4.0
torchvision                      0.19.1+cu124 @/mdata/guzy0324/anaconda3/envs/DICR_new/lib/python3.8/site-packages/torchvision
torchvision arch flags           5.0, 6.0, 7.0, 7.5, 8.0, 8.6, 9.0
fvcore                           0.1.5.post20221221
iopath                           0.1.9
cv2                              4.9.0
-------------------------------  ---------------------------------------------------------------------------------------------
PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.4.2 (Git Hash 1137e04ec0b5251ca2b4400a4fd3c667ce843d67)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 12.4
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 90.1
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.4, CUDNN_VERSION=9.1.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.4.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

Testing NCCL connectivity ... this should not hang.
NCCL succeeded.

If your issue looks like an installation issue / environment issue,
please first try to solve it yourself with the instructions in
https://detectron2.readthedocs.io/tutorials/install.html#common-installation-issues

@guzy0324 guzy0324 changed the title Different ranks have different bn_stats after PreciseBN hook as precise_bn in fvcore does not synchronize batch_size Different DDP ranks have different bn_stats after PreciseBN hook as precise_bn in fvcore does not synchronize batch_size Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant