Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ModuleNotFoundError: No module named 'scaleoutbridge'. #12

Open
winca opened this issue Jun 11, 2024 · 3 comments
Open

ModuleNotFoundError: No module named 'scaleoutbridge'. #12

winca opened this issue Jun 11, 2024 · 3 comments

Comments

@winca
Copy link

winca commented Jun 11, 2024

When I tried "training_results_v3.1/NVIDIA/benchmarks/resnet/implementations/mxnet", I run cmdline [ CONT="mlperf-nvidia:image_classification-mxnet" DATADIR=/test/training_results_v3.1/NVIDIA/benchmarks/resnet/implementations/run_dir/dataset/preprocessed_data bash ./run_with_docker.s], I just encountered the error: “ModuleNotFoundError: No module named 'scaleoutbridge'”. where can I find the 'scleoutbridge' module?

@Abatpool
Copy link

Abatpool commented Jun 28, 2024

Hi @winca @nv-rborkar

There two .py files in common .

  1. Optimizer.py
  2. Fit.py.
    We found the respective change for a library scaleoutbridge inside
    /unet3d/implementations/mxnet/runtime/training.py

The changes made are as follows
In optimizer.py we commended out the earlier scaleoutbridge imports and inplace changed it with:-
from mlperf_common.scaleoutbridge import ScaleoutBridgeBase as SBridge
This below is refernce of that change:_
image

In fit.py
We commented out the native scaleoutbridge and changed it following:-

from mlperf_common.scaleoutbridge import init_bridge,ScaleoutBridgeBase as SBridge
from mlperf_common.frameworks.mxnet import MXNetProfilerHandler, MPICommunicationHandler
Here is the reference image of that change in fit.py:-
image

Then we have change the sbridge initialization also as follows:-

sbridge = init_bridge(MXNetProfilerHandler(), MPICommunicationHandler(), mllogger)
Here is reference image of the change:-
image

Alternative way /another way there is another way to bypass the scaleoutbridge error.

There is a scaleoutbridge.py is at the location training_results_v3.1/Fujitsu/benchmarks/resnet/implementations/mxnet
/scaleoutbridge.py in github of mlperf 3.1

Here is the url to find that :-
(

)

Creating a scaleoutbridge.py file at the location
/home/adminserc/training_results_v3.1/NVIDIA/benchmarks/resnet/implementations/mxnet
Then the above changes we had made of changing the imports and Sbridge initialization is not required. The existing code base will take care of it.

@mmarcinkiewicz
Copy link

Thanks @Abatpool !

@winca does that fix your issue?

@mmarcinkiewicz
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants