-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeepSeek V3模型训练报错 #387
Comments
麻烦提供一下 yaml config |
|
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
使用的是0.6.5版本example提供的yaml文件进行配置,报错信息如下:
[default0]:Traceback (most recent call last):
[default0]: File "/data2/nfs/liyucong/FlagScale-0.6.5/flagscale/train/train_deepseek_v3.py", line 433, in
[default0]: pretrain(
[default0]: File "/data2/nfs/liyucong/FlagScale-0.6.5/flagscale/train/train.py", line 423, in pretrain
[default0]: iteration, num_floating_point_operations_so_far = train(
[default0]: File "/data2/nfs/liyucong/FlagScale-0.6.5/flagscale/train/train.py", line 1659, in train
[default0]: train_step(forward_step_func,
[default0]: File "/data2/nfs/liyucong/FlagScale-0.6.5/flagscale/train/train.py", line 863, in train_step
[default0]: losses_reduced = forward_backward_func(
[default0]: File "/data2/nfs/liyucong/FlagScale-0.6.5/megatron/megatron/core/pipeline_parallel/schedules.py", line 1742, in forward_backward_pipelining_without_interleaving
[default0]: output_tensor, num_tokens = forward_step(
[default0]: File "/data2/nfs/liyucong/FlagScale-0.6.5/megatron/megatron/core/pipeline_parallel/schedules.py", line 275, in forward_step
[default0]: output_tensor, loss_func = forward_step_func(data_iterator, model)
[default0]: File "/data2/nfs/liyucong/FlagScale-0.6.5/flagscale/train/train_deepseek_v3.py", line 322, in forward_step
[default0]: output_tensor = model(tokens, position_ids, attention_mask,
[default0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
[default0]: return self._call_impl(*args, **kwargs)
[default0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
[default0]: return forward_call(*args, **kwargs)
[default0]: File "/data2/nfs/liyucong/FlagScale-0.6.5/megatron/megatron/core/distributed/data_parallel_base.py", line 22, in forward
[default0]: return self.module(*inputs, **kwargs)
[default0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
[default0]: return self._call_impl(*args, **kwargs)
[default0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
[default0]: return forward_call(*args, **kwargs)
[default0]: File "/data2/nfs/liyucong/FlagScale-0.6.5/megatron/megatron/legacy/model/module.py", line 189, in forward
[default0]: outputs = self.module(*inputs, **kwargs)
[default0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
[default0]: return self._call_impl(*args, **kwargs)
[default0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
[default0]: return forward_call(*args, **kwargs)
[default0]: File "/data2/nfs/liyucong/FlagScale-0.6.5/flagscale/train/models/deepseek_v3/deepseek_v3_model.py", line 220, in forward
[default0]: hidden_states = self.decoder(
[default0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
[default0]: return self._call_impl(*args, **kwargs)
[default0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
[default0]: return forward_call(*args, **kwargs)
[default0]: File "/data2/nfs/liyucong/FlagScale-0.6.5/megatron/megatron/core/transformer/transformer_block.py", line 619, in forward
[default0]: hidden_states, context = layer(
[default0]: File "/data2/nfs/liyucong/FlagScale-0.6.5/megatron/megatron/core/transformer/transformer_layer.py", line 503, in call
[default0]: return super(MegatronModule, self).call(*args, **kwargs)
[default0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
[default0]: return self._call_impl(*args, **kwargs)
[default0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
[default0]: return forward_call(*args, **kwargs)
[default0]: File "/data2/nfs/liyucong/FlagScale-0.6.5/megatron/megatron/core/transformer/transformer_layer.py", line 391, in forward
[default0]: attention_output_with_bias = self.self_attention(
[default0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
[default0]: return self._call_impl(*args, **kwargs)
[default0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
[default0]: return forward_call(*args, **kwargs)
[default0]: File "/data2/nfs/liyucong/FlagScale-0.6.5/megatron/megatron/core/transformer/multi_latent_attention.py", line 165, in forward
[default0]: core_attn_out = self.core_attention(
[default0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
[default0]: return self._call_impl(*args, **kwargs)
[default0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
[default0]: return forward_call(*args, **kwargs)
[default0]: File "/data2/nfs/liyucong/FlagScale-0.6.5/megatron/megatron/core/extensions/transformer_engine.py", line 804, in forward
[default0]: core_attn_out = super().forward(
[default0]: File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/attention.py", line 2488, in forward
[default0]: assert (key_layer.shape == value_layer.shape
[default0]:AssertionError: Keys and values must have the same shape!
The text was updated successfully, but these errors were encountered: