Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The eval results from Tuber CSN-152 IG65+K400 model #16

Open
lemonheadboy opened this issue Jan 31, 2023 · 8 comments
Open

The eval results from Tuber CSN-152 IG65+K400 model #16

lemonheadboy opened this issue Jan 31, 2023 · 8 comments

Comments

@lemonheadboy
Copy link

Hi,

First, thanks for your work and for providing the implementation.

Following the steps you provided, I downloaded the pretrained |CSN-152 Kinetics-400+IG65M from this link you provided: TubeR_CSN152_AVA22; and after installing the same version of pytorch and other packages as you suggested and changing only the paths to the data and model in the config file: TubeR_CSN152_AVA22.yaml. I was not able to obtain the 31.1 mAP, but have only gotten 27.8 mAP (did 2 runs, same results).

image

I wonder if I am doing everything right and how to proceed.

Thank you.

@cifunla
Copy link

cifunla commented Feb 2, 2023

I am the same as you, but maybe the only difference is that I eval on a single GPU. And I get 31.137 mAP.

@huang-chenhai
Copy link

Epoch: [0][50125/50134]
data_time: 0.005, batch time: 0.083
class_error: 99.894, loss: 147.424, loss_bbox: 0.738, loss_giou: 0.835, loss_ce: 1.515, loss_ce_b: 1.093

{'PascalBoxes_Precision/[email protected]': 0.00011119179516651725, 'PascalBoxes_PerformanceByCategory/[email protected]/bend/bow (at the waist)': 0.0001111917951665172
5}
person AP: 0.00011
testing time 1:47:07

Hi, I used the single 3090, non-distributed method, above is the process of reasoning ava2.2, why is classerror, loss so high. The final reasoning result came out wrong too. Looking forward to your answer

@cifunla
Copy link

cifunla commented Feb 7, 2023

Epoch: [0][50125/50134] data_time: 0.005, batch time: 0.083 class_error: 99.894, loss: 147.424, loss_bbox: 0.738, loss_giou: 0.835, loss_ce: 1.515, loss_ce_b: 1.093

{'PascalBoxes_Precision/[email protected]': 0.00011119179516651725, 'PascalBoxes_PerformanceByCategory/[email protected]/bend/bow (at the waist)': 0.0001111917951665172 5} person AP: 0.00011 testing time 1:47:07

Hi, I used the single 3090, non-distributed method, above is the process of reasoning ava2.2, why is classerror, loss so high. The final reasoning result came out wrong too. Looking forward to your answer

Hi, Have you commented out line 423 and line 452 of the video_action_recognition.py?

@huang-chenhai
Copy link

Epoch: [0][50125/50134] data_time: 0.005, batch time: 0.083 class_error: 99.894, loss: 147.424, loss_bbox: 0.738, loss_giou: 0.835, loss_ce: 1.515, loss_ce_b: 1.093
{'PascalBoxes_Precision/[email protected]': 0.00011119179516651725, 'PascalBoxes_PerformanceByCategory/[email protected]/bend/bow (at the waist)': 0.0001111917951665172 5} person AP: 0.00011 testing time 1:47:07
Hi, I used the single 3090, non-distributed method, above is the process of reasoning ava2.2, why is classerror, loss so high. The final reasoning result came out wrong too. Looking forward to your answer

Hi, Have you commented out line 423 and line 452 of the video_action_recognition.py?

Thank you very much for your answer, I have commented out these two lines, still no effect.
But it's the distributed training that causes the problem, the result I got with distributed training is correct, I don't know where I didn't change it, I'll check it again. Change to single machine single card training, do you have any other changes?
非常感谢你的回答,这两行我已经注释掉了,还是没有效果。
但是就是分布式训练导致的问题,我用分布式训练出来的结果是正确的,不知道是哪里没有改好,我再检查检查。改成单机单卡训练,你还有改动其他地方吗?

@cifunla
Copy link

cifunla commented Feb 7, 2023

Epoch: [0][50125/50134] data_time: 0.005, batch time: 0.083 class_error: 99.894, loss: 147.424, loss_bbox: 0.738, loss_giou: 0.835, loss_ce: 1.515, loss_ce_b: 1.093
{'PascalBoxes_Precision/[email protected]': 0.00011119179516651725, 'PascalBoxes_PerformanceByCategory/[email protected]/bend/bow (at the waist)': 0.0001111917951665172 5} person AP: 0.00011 testing time 1:47:07
Hi, I used the single 3090, non-distributed method, above is the process of reasoning ava2.2, why is classerror, loss so high. The final reasoning result came out wrong too. Looking forward to your answer

Hi, Have you commented out line 423 and line 452 of the video_action_recognition.py?

Thank you very much for your answer, I have commented out these two lines, still no effect. But it's the distributed training that causes the problem, the result I got with distributed training is correct, I don't know where I didn't change it, I'll check it again. Change to single machine single card training, do you have any other changes? 非常感谢你的回答,这两行我已经注释掉了,还是没有效果。 但是就是分布式训练导致的问题,我用分布式训练出来的结果是正确的,不知道是哪里没有改好,我再检查检查。改成单机单卡训练,你还有改动其他地方吗?

I haven‘t any other changes.Sorry.I don't know why you get the wrong result.

@lemonheadboy
Copy link
Author

I tried running with 1 GPU, but the results are still the same. I also get the same drop for ava 2.1.
I was wondering if maybe the issue comes from something else beside the number of GPUs.

@huang-chenhai
Copy link

hello, can you train the JHMDB dataset properly?I encountered the following problem
I used a pre-training dataset that worked fine during training and did not predict correct results on the validation set。
To my surprise, everything works fine when continuing training with the weights provided by the author that have already been trained(TubeR_CSN152_JHMDB.pth).

@huang-chenhai
Copy link

huang-chenhai commented Apr 11, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants