I can't reproduce the results of the paper #11

Mowenyii · 2024-06-02T13:47:11Z

Thank you for your impressive work.

I can't reproduce the results of DPO-SD 1.5.

We train on 8 NVIDIA A100 GPUs with a local batch size of 1 pair and gradient accumulation of 256 steps. Other experimental Settings are the same as those in the paper.

Here are some of the results I sampled during the training.

There is also something strange about the loss function during training. The training process took about nine hours.

Can you give me some advice?

DwanZhang-AI · 2024-07-26T05:57:01Z

Hi, Why local batch size should be 1? A bigger local batch size may cause a performance degradation?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I can't reproduce the results of the paper #11

I can't reproduce the results of the paper #11

Mowenyii commented Jun 2, 2024

DwanZhang-AI commented Jul 26, 2024

I can't reproduce the results of the paper #11

I can't reproduce the results of the paper #11

Comments

Mowenyii commented Jun 2, 2024

DwanZhang-AI commented Jul 26, 2024