Still cannot reproduce the results using the released model #9

lazyGj opened this issue Feb 14, 2023

lazyGj opened this issue Feb 14, 2023


lazyGj commented Feb 14, 2023

Similar to #5, I still cannot reproduce the results using the released model and the results I got were extremely poor.
{"r_precision": {"top-1": 0.06051829268292683, "top-2": 0.1298780487804878, "top-3": 0.19603658536585367}, "fid": 1481.7516534444785, "clip_score": {"clip_score": 0.14643903637110403}, "mid": -53.25080871582031}

I had installed pytorch-lightning and transformers with correct versions ( 1.8.6 and 4.19.2). I tested the released model on GTX 1080Ti using the command python model=diffusion_hml3d.yaml datamodule=humanml3d.yaml ckpt_path=pretrained/flame_hml3d_bc.ckpt. My python enviorment is shown as follows:

Can you know me the number of files in generated_samples under your testing environment? I reproduced the close numbers to the benchmark on the paper, so there might be a difference in testing files. All testing samples should be generated from to reproduce the result.

lazyGj commented Feb 15, 2023

The number of files in generated_samples is 6559 (hml3d). The testing samples were generated from There is no random operation in and I am confused about why there is a difference in testing files.

@ToBeCodeCreater From my side, I have 6,557 test samples and generated results from running I know it takes some time in the testing stage, but can you try it again? Sorry for the inconvenience but I have made two clean containers and set up the repository from scratch and got similar numbers from the paper.

lazyGj commented Feb 20, 2023

@jihoonerd I had tested the released model three times and got similar results. I'm wondering what the problem is causing this result. The size of testing data in HumanML3D/processed/test_data is 6557 and the size of training data in HumanML3D/processed/train_data is 34936.

lazyGj commented Feb 20, 2023

I find that the model trained using this repository performs well on testing dataset ( got similar numbers from the paper). But the results of the released model are very poor. Very confusing.

I also got similar results when I use the pretrained weight (for HumanML3D). It will be thankful if the authors check whether the pretrained weight has some problem or not.

