The eval results from Tuber CSN-152 IG65+K400 model #16

lemonheadboy · 2023-01-31T14:59:04Z

Hi,

First, thanks for your work and for providing the implementation.

Following the steps you provided, I downloaded the pretrained |CSN-152 Kinetics-400+IG65M from this link you provided: TubeR_CSN152_AVA22; and after installing the same version of pytorch and other packages as you suggested and changing only the paths to the data and model in the config file: TubeR_CSN152_AVA22.yaml. I was not able to obtain the 31.1 mAP, but have only gotten 27.8 mAP (did 2 runs, same results).

I wonder if I am doing everything right and how to proceed.

Thank you.

cifunla · 2023-02-02T01:15:00Z

I am the same as you, but maybe the only difference is that I eval on a single GPU. And I get 31.137 mAP.

huang-chenhai · 2023-02-06T02:39:12Z

Epoch: [0][50125/50134]
data_time: 0.005, batch time: 0.083
class_error: 99.894, loss: 147.424, loss_bbox: 0.738, loss_giou: 0.835, loss_ce: 1.515, loss_ce_b: 1.093

{'PascalBoxes_Precision/[email protected]': 0.00011119179516651725, 'PascalBoxes_PerformanceByCategory/[email protected]/bend/bow (at the waist)': 0.0001111917951665172
5}
person AP: 0.00011
testing time 1:47:07

Hi, I used the single 3090， non-distributed method, above is the process of reasoning ava2.2, why is classerror, loss so high. The final reasoning result came out wrong too. Looking forward to your answer

cifunla · 2023-02-07T06:23:21Z

Epoch: [0][50125/50134] data_time: 0.005, batch time: 0.083 class_error: 99.894, loss: 147.424, loss_bbox: 0.738, loss_giou: 0.835, loss_ce: 1.515, loss_ce_b: 1.093

{'PascalBoxes_Precision/[email protected]': 0.00011119179516651725, 'PascalBoxes_PerformanceByCategory/[email protected]/bend/bow (at the waist)': 0.0001111917951665172 5} person AP: 0.00011 testing time 1:47:07

Hi, I used the single 3090， non-distributed method, above is the process of reasoning ava2.2, why is classerror, loss so high. The final reasoning result came out wrong too. Looking forward to your answer

Hi, Have you commented out line 423 and line 452 of the video_action_recognition.py?

huang-chenhai · 2023-02-07T06:33:47Z

Epoch: [0][50125/50134] data_time: 0.005, batch time: 0.083 class_error: 99.894, loss: 147.424, loss_bbox: 0.738, loss_giou: 0.835, loss_ce: 1.515, loss_ce_b: 1.093
{'PascalBoxes_Precision/[email protected]': 0.00011119179516651725, 'PascalBoxes_PerformanceByCategory/[email protected]/bend/bow (at the waist)': 0.0001111917951665172 5} person AP: 0.00011 testing time 1:47:07
Hi, I used the single 3090， non-distributed method, above is the process of reasoning ava2.2, why is classerror, loss so high. The final reasoning result came out wrong too. Looking forward to your answer

Hi, Have you commented out line 423 and line 452 of the video_action_recognition.py?

Thank you very much for your answer, I have commented out these two lines, still no effect.
But it's the distributed training that causes the problem, the result I got with distributed training is correct, I don't know where I didn't change it, I'll check it again. Change to single machine single card training, do you have any other changes?
非常感谢你的回答，这两行我已经注释掉了，还是没有效果。
但是就是分布式训练导致的问题，我用分布式训练出来的结果是正确的，不知道是哪里没有改好，我再检查检查。改成单机单卡训练，你还有改动其他地方吗？

cifunla · 2023-02-07T07:03:18Z

Epoch: [0][50125/50134] data_time: 0.005, batch time: 0.083 class_error: 99.894, loss: 147.424, loss_bbox: 0.738, loss_giou: 0.835, loss_ce: 1.515, loss_ce_b: 1.093
{'PascalBoxes_Precision/[email protected]': 0.00011119179516651725, 'PascalBoxes_PerformanceByCategory/[email protected]/bend/bow (at the waist)': 0.0001111917951665172 5} person AP: 0.00011 testing time 1:47:07
Hi, I used the single 3090， non-distributed method, above is the process of reasoning ava2.2, why is classerror, loss so high. The final reasoning result came out wrong too. Looking forward to your answer

Hi, Have you commented out line 423 and line 452 of the video_action_recognition.py?

Thank you very much for your answer, I have commented out these two lines, still no effect. But it's the distributed training that causes the problem, the result I got with distributed training is correct, I don't know where I didn't change it, I'll check it again. Change to single machine single card training, do you have any other changes? 非常感谢你的回答，这两行我已经注释掉了，还是没有效果。但是就是分布式训练导致的问题，我用分布式训练出来的结果是正确的，不知道是哪里没有改好，我再检查检查。改成单机单卡训练，你还有改动其他地方吗？

I haven‘t any other changes.Sorry.I don't know why you get the wrong result.

lemonheadboy · 2023-03-02T10:44:37Z

I tried running with 1 GPU, but the results are still the same. I also get the same drop for ava 2.1.
I was wondering if maybe the issue comes from something else beside the number of GPUs.

huang-chenhai · 2023-03-16T01:31:32Z

hello, can you train the JHMDB dataset properly?I encountered the following problem
I used a pre-training dataset that worked fine during training and did not predict correct results on the validation set。
To my surprise, everything works fine when continuing training with the weights provided by the author that have already been trained（TubeR_CSN152_JHMDB.pth）.

huang-chenhai · 2023-04-11T03:10:27Z

Hi, have you retrained this dataset of JHMDB, I can't train to get the author's result. Very much looking forward to get your reply.   天醒之路 ***@***.***  

…

------------------ 原始邮件 ------------------ 发件人: ***@***.***>; 发送时间: 2023年2月7日(星期二) 下午3:03 收件人: ***@***.***>; 抄送: ***@***.***>; ***@***.***>; 主题: Re: [amazon-science/tubelet-transformer] The eval results from Tuber CSN-152 IG65+K400 model (Issue #16) Epoch: [0][50125/50134] data_time: 0.005, batch time: 0.083 class_error: 99.894, loss: 147.424, loss_bbox: 0.738, loss_giou: 0.835, loss_ce: 1.515, loss_ce_b: 1.093 ***@***.***': 0.00011119179516651725, ***@***.***/bend/bow (at the waist)': 0.0001111917951665172 5} person AP: 0.00011 testing time 1:47:07 Hi, I used the single 3090， non-distributed method, above is the process of reasoning ava2.2, why is classerror, loss so high. The final reasoning result came out wrong too. Looking forward to your answer Hi, Have you commented out line 423 and line 452 of the video_action_recognition.py? Thank you very much for your answer, I have commented out these two lines, still no effect. But it's the distributed training that causes the problem, the result I got with distributed training is correct, I don't know where I didn't change it, I'll check it again. Change to single machine single card training, do you have any other changes? 非常感谢你的回答，这两行我已经注释掉了，还是没有效果。但是就是分布式训练导致的问题，我用分布式训练出来的结果是正确的，不知道是哪里没有改好，我再检查检查。改成单机单卡训练，你还有改动其他地方吗？ I haven‘t any other changes.Sorry.I don't know why you get the wrong result. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The eval results from Tuber CSN-152 IG65+K400 model #16

The eval results from Tuber CSN-152 IG65+K400 model #16

lemonheadboy commented Jan 31, 2023

cifunla commented Feb 2, 2023 •

edited

Loading

huang-chenhai commented Feb 6, 2023

cifunla commented Feb 7, 2023 •

edited

Loading

huang-chenhai commented Feb 7, 2023

cifunla commented Feb 7, 2023

lemonheadboy commented Mar 2, 2023

huang-chenhai commented Mar 16, 2023

huang-chenhai commented Apr 11, 2023 via email

The eval results from Tuber CSN-152 IG65+K400 model #16

The eval results from Tuber CSN-152 IG65+K400 model #16

Comments

lemonheadboy commented Jan 31, 2023

cifunla commented Feb 2, 2023 • edited Loading

huang-chenhai commented Feb 6, 2023

cifunla commented Feb 7, 2023 • edited Loading

huang-chenhai commented Feb 7, 2023

cifunla commented Feb 7, 2023

lemonheadboy commented Mar 2, 2023

huang-chenhai commented Mar 16, 2023

huang-chenhai commented Apr 11, 2023 via email

cifunla commented Feb 2, 2023 •

edited

Loading

cifunla commented Feb 7, 2023 •

edited

Loading