Skip to content

0维Tensor,数据并行梯度处理报错 #71782

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wangguan1995 opened this issue Mar 19, 2025 · 1 comment
Open

0维Tensor,数据并行梯度处理报错 #71782

wangguan1995 opened this issue Mar 19, 2025 · 1 comment
Assignees

Comments

@wangguan1995
Copy link

wangguan1995 commented Mar 19, 2025

bug描述 Describe the Bug

Image

最小复现

# https://github.com/PaddlePaddle/PaddleScience/blob/develop/ppsci/arch/activation.py
import paddle
import paddle.nn as nn
import paddle.distributed as dist
from paddle.distributed import fleet
import paddle.nn.functional as F
from paddle.distributed.fleet.utils import hybrid_parallel_util as hpu
dist.init_parallel_env()

class Swish(nn.Layer):
    def __init__(self, beta: float = 1.0):
        super().__init__()
        self.linear = nn.Linear(5,1)
        self.beta = self.create_parameter(
            shape=[],
            default_initializer=nn.initializer.Constant(beta),
        )

    def forward(self, x):
        x = self.linear(x)
        return x * F.sigmoid(self.beta * x)

if __name__ == '__main__':
    model = Swish()

    if dist.get_world_size() > 1:
        fleet.init(is_collective=True)
        model = fleet.distributed_model(model)


        y = model(paddle.to_tensor([-2., -1., 0., 1., 2.]))
        loss = y - paddle.to_tensor([-1., 0., 1., 2., 3.])
        loss.backward()

        # fuse + allreduce manually before optimization if use DDP + no_sync
        # details in https://github.com/PaddlePaddle/Paddle/issues/48898#issuecomment-1343838622
        p = list(model.parameters())
        for i, (name, param) in enumerate(model.named_parameters()):
            print(f"Layer: {name} | Size: {param.shape}, {p[i].shape}")
            print(param.trainable)
            print(param._grad_ivar() is not None) # true after backward

        hpu.fused_allreduce_gradients(list(model.parameters()), None)
        print(model(paddle.to_tensor([-2., -1., 0., 1., 2.])))
    else:
        raise NotImplementedError

其他补充信息 Additional Supplementary Information

paddle commit id :
e2894ad

@LokeZhou
Copy link
Contributor

你好,本地暂时没有复现错误,后面持续看看

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants