Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when run ./train_mpe_spread.sh #98

Open
ChuangZhang1999 opened this issue Jan 12, 2024 · 2 comments
Open

Error when run ./train_mpe_spread.sh #98

ChuangZhang1999 opened this issue Jan 12, 2024 · 2 comments

Comments

@ChuangZhang1999
Copy link

When I tried to run ./train_mpe_spread.sh, I met the following issue:

obs_space:  [Box(18,), Box(18,), Box(18,)]
share_obs_space:  [Box(54,), Box(54,), Box(54,)]
act_space:  [Discrete(5), Discrete(5), Discrete(5)]
Traceback (most recent call last):
  File "../train/train_mpe.py", line 174, in <module>
    main(sys.argv[1:])
  File "../train/train_mpe.py", line 159, in main
    runner.run()
  File "/mnt/nvme1n1/zhangchuang_23/MARL/on-policy-main/onpolicy/runner/shared/mpe_runner.py", line 28, in run
    values, actions, action_log_probs, rnn_states, rnn_states_critic, actions_env = self.collect(step)
  File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/nvme1n1/zhangchuang_23/MARL/on-policy-main/onpolicy/runner/shared/mpe_runner.py", line 103, in collect
    np.concatenate(self.buffer.masks[step]))
  File "/mnt/nvme1n1/zhangchuang_23/MARL/on-policy-main/onpolicy/algorithms/r_mappo/algorithm/rMAPPOPolicy.py", line 71, in get_actions    deterministic)
  File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/nvme1n1/zhangchuang_23/MARL/on-policy-main/onpolicy/algorithms/r_mappo/algorithm/r_actor_critic.py", line 64, in forward
    actor_features = self.base(obs)
  File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/nvme1n1/zhangchuang_23/MARL/on-policy-main/onpolicy/algorithms/utils/mlp.py", line 56, in forward
    x = self.mlp(x)
  File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/nvme1n1/zhangchuang_23/MARL/on-policy-main/onpolicy/algorithms/utils/mlp.py", line 27, in forward
    x = self.fc1(x)
  File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 87, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/functional.py", line 1610, in linear
    ret = torch.addmm(bias, input, weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
@satpreetsingh
Copy link

Try running this and see if you still get the error.

import torch
print("Is CUDA available:", torch.cuda.is_available())
print("CUDA version:", torch.version.cuda)
print("cuDNN version:", torch.backends.cudnn.version())

a = torch.randn(1024, 1024, device="cuda:0")
b = torch.randn(1024, 1024, device="cuda:0")
c = torch.matmul(a, b)  # Matrix multiplication
print("Matrix multiplication result shape:", c.shape)

If so, you need to fix your PyTorch/CUDA installation. Try

conda install pytorch  -c pytorch

@zoeyuchao
Copy link
Member

Fixed!try the new code!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants