Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

您好,我使用CASIA-WebFace数据集从零开始训练,拟合后发现效果很差 #11

Open
imliusu opened this issue Dec 27, 2019 · 6 comments

Comments

@imliusu
Copy link

imliusu commented Dec 27, 2019

您好,我使用CASIA-WebFace数据集从零开始训练,拟合后发现效果很差。
在训练过程中达到95-97%的识别率,但是使用自己的图片作为识别时几乎所有照片阈值都在0.002-0.003左右,无法区分开来。
请问能否提供下您使用的数据集和相关训练参数,我想重新训练一下。
谢谢您。

@104503529
Copy link

您好,我也有一样的情况,使用CASIA数据集模型,模型最后仅能收敛在val acc 85% ,我试过调整Dropout,BatchSize,Epoch但并没有任何改善,还请楼主提供训练参数,感谢。

@imliusu
Copy link
Author

imliusu commented Dec 27, 2019

您好,我也有一样的情况,使用CASIA数据集模型,模型最后仅能收敛在val acc 85% ,我试过调整Dropout,BatchSize,Epoch但并没有任何改善,还请楼主提供训练参数,感谢。

我使用自带的Dropout,Batchsize128,Epoch为50,学习率初始为0.1,在[5,10,15,20,25,30]衰减为原来的1/10,acc大致在95左右
但是使用项目自带的方法对单张图片进行向量化并人脸识别效果很差,远达不到项目中的效果。


我去翻了部分日志出来看了下,我那个92%-95%之间震荡的学习数据
batchsize128,初始lr0.01,在[15,35]衰减为原来的1/10,Epoch总计为70。

适当增加lr在0.01以及0.0001部分对val acc有很大帮助,但极其容易过拟合(导致对单张图片进行向量化并人脸识别效果很差,远达不到项目中的效果),命令行的val acc和tensorboard的val acc有点差异,原因未知。

另外,使用triplet_loss可能对于对单张图片进行向量化并人脸识别效果会好一些,但更难收敛,acc上升缓慢


image

epoch:1/10
Step: 1/48800, accuracy: 0.988889, center loss: 1.002501, cross loss: 0.605674, Total Loss: 2.053904 ,lr:0.000100
epoch:1/10
Step: 101/48800, accuracy: 0.988889, center loss: 0.759490, cross loss: 0.379286, Total Loss: 1.825077 ,lr:0.000100
epoch:1/10
Step: 201/48800, accuracy: 0.977778, center loss: 0.614323, cross loss: 0.564720, Total Loss: 2.009045 ,lr:0.000100
epoch:1/10
Step: 301/48800, accuracy: 0.977778, center loss: 0.590413, cross loss: 0.545574, Total Loss: 1.989640 ,lr:0.000100
epoch:1/10
Step: 401/48800, accuracy: 0.922222, center loss: 0.564167, cross loss: 0.684513, Total Loss: 2.128296 ,lr:0.000100
epoch:1/10
Step: 501/48800, accuracy: 0.955556, center loss: 0.527559, cross loss: 0.567637, Total Loss: 2.011031 ,lr:0.000100
epoch:1/10
Step: 601/48800, accuracy: 0.988889, center loss: 0.481062, cross loss: 0.408155, Total Loss: 1.851059 ,lr:0.000100
epoch:1/10
Step: 701/48800, accuracy: 0.955556, center loss: 0.481447, cross loss: 0.492881, Total Loss: 1.935760 ,lr:0.000100
epoch:1/10
Step: 801/48800, accuracy: 0.955556, center loss: 0.528111, cross loss: 0.773060, Total Loss: 2.216377 ,lr:0.000100
epoch:1/10
Step: 901/48800, accuracy: 0.977778, center loss: 0.468997, cross loss: 0.460811, Total Loss: 1.903506 ,lr:0.000100
epoch:1/10
Step: 1001/48800, accuracy: 0.977778, center loss: 0.492456, cross loss: 0.578687, Total Loss: 2.021586 ,lr:0.000100
epoch:1/10
Step: 1101/48800, accuracy: 0.988889, center loss: 0.455803, cross loss: 0.457620, Total Loss: 1.900118 ,lr:0.000100
epoch:1/10
Step: 1201/48800, accuracy: 0.955556, center loss: 0.487945, cross loss: 0.681197, Total Loss: 2.123986 ,lr:0.000100
epoch:1/10
Step: 1301/48800, accuracy: 0.966667, center loss: 0.479575, cross loss: 0.539578, Total Loss: 1.982249 ,lr:0.000100
epoch:1/10
Step: 1401/48800, accuracy: 0.977778, center loss: 0.418883, cross loss: 0.398044, Total Loss: 1.840073 ,lr:0.000100
epoch:1/10
Step: 1501/48800, accuracy: 0.977778, center loss: 0.423312, cross loss: 0.395883, Total Loss: 1.837919 ,lr:0.000100
epoch:1/10
Step: 1601/48800, accuracy: 0.933333, center loss: 0.471851, cross loss: 0.710375, Total Loss: 2.152862 ,lr:0.000100
epoch:1/10
Step: 1701/48800, accuracy: 0.944444, center loss: 0.471061, cross loss: 0.568877, Total Loss: 2.011319 ,lr:0.000100
epoch:1/10
Step: 1801/48800, accuracy: 0.977778, center loss: 0.464372, cross loss: 0.586737, Total Loss: 2.029074 ,lr:0.000100
epoch:1/10
Step: 1901/48800, accuracy: 0.988889, center loss: 0.440136, cross loss: 0.391311, Total Loss: 1.833368 ,lr:0.000100
epoch:1/10
Step: 2001/48800, accuracy: 0.966667, center loss: 0.457153, cross loss: 0.540593, Total Loss: 1.982782 ,lr:0.000100
epoch:1/10
Step: 2101/48800, accuracy: 0.977778, center loss: 0.434008, cross loss: 0.478475, Total Loss: 1.920395 ,lr:0.000100
epoch:1/10
Step: 2201/48800, accuracy: 0.988889, center loss: 0.444030, cross loss: 0.384417, Total Loss: 1.826397 ,lr:0.000100
epoch:1/10
Step: 2301/48800, accuracy: 0.988889, center loss: 0.437531, cross loss: 0.510363, Total Loss: 1.952239 ,lr:0.000100
epoch:1/10
Step: 2401/48800, accuracy: 0.933333, center loss: 0.460341, cross loss: 0.698997, Total Loss: 2.141060 ,lr:0.000100
epoch:1/10
Step: 2501/48800, accuracy: 0.966667, center loss: 0.448871, cross loss: 0.524262, Total Loss: 1.966169 ,lr:0.000100
epoch:1/10
Step: 2601/48800, accuracy: 0.966667, center loss: 0.427015, cross loss: 0.516706, Total Loss: 1.958354 ,lr:0.000100
epoch:1/10
Step: 2701/48800, accuracy: 0.955556, center loss: 0.441092, cross loss: 0.431897, Total Loss: 1.873644 ,lr:0.000100
epoch:1/10
Step: 2801/48800, accuracy: 0.988889, center loss: 0.435879, cross loss: 0.447247, Total Loss: 1.888902 ,lr:0.000100
epoch:1/10
Step: 2901/48800, accuracy: 0.977778, center loss: 0.448032, cross loss: 0.512469, Total Loss: 1.954206 ,lr:0.000100
epoch:1/10
Step: 3001/48800, accuracy: 0.955556, center loss: 0.462427, cross loss: 0.556183, Total Loss: 1.998023 ,lr:0.000100
epoch:1/10
Step: 3101/48800, accuracy: 0.933333, center loss: 0.471707, cross loss: 0.574157, Total Loss: 2.016049 ,lr:0.000100
epoch:1/10
Step: 3201/48800, accuracy: 0.977778, center loss: 0.452982, cross loss: 0.531652, Total Loss: 1.973315 ,lr:0.000100
epoch:1/10
Step: 3301/48800, accuracy: 0.977778, center loss: 0.439534, cross loss: 0.385271, Total Loss: 1.826757 ,lr:0.000100
epoch:1/10
Step: 3401/48800, accuracy: 0.944444, center loss: 0.432559, cross loss: 0.506032, Total Loss: 1.947405 ,lr:0.000100
epoch:1/10
Step: 3501/48800, accuracy: 0.966667, center loss: 0.437325, cross loss: 0.480046, Total Loss: 1.921427 ,lr:0.000100
epoch:1/10
Step: 3601/48800, accuracy: 0.977778, center loss: 0.436265, cross loss: 0.539114, Total Loss: 1.980444 ,lr:0.000100
epoch:1/10
Step: 3701/48800, accuracy: 0.966667, center loss: 0.438592, cross loss: 0.500624, Total Loss: 1.941934 ,lr:0.000100
epoch:1/10
Step: 3801/48800, accuracy: 0.944444, center loss: 0.450625, cross loss: 0.597496, Total Loss: 2.038888 ,lr:0.000100
epoch:1/10
Step: 3901/48800, accuracy: 0.955556, center loss: 0.462800, cross loss: 0.661550, Total Loss: 2.103020 ,lr:0.000100
epoch:1/10
Step: 4001/48800, accuracy: 0.977778, center loss: 0.455612, cross loss: 0.443667, Total Loss: 1.885022 ,lr:0.000100
epoch:1/10
Step: 4101/48800, accuracy: 0.977778, center loss: 0.402570, cross loss: 0.434042, Total Loss: 1.874823 ,lr:0.000100
epoch:1/10
Step: 4201/48800, accuracy: 0.933333, center loss: 0.460326, cross loss: 0.582508, Total Loss: 2.023825 ,lr:0.000100
epoch:1/10
Step: 4301/48800, accuracy: 0.977778, center loss: 0.469785, cross loss: 0.485561, Total Loss: 1.926928 ,lr:0.000100
epoch:1/10
Step: 4401/48800, accuracy: 0.955556, center loss: 0.435124, cross loss: 0.594522, Total Loss: 2.035501 ,lr:0.000100
epoch:1/10
Step: 4501/48800, accuracy: 0.977778, center loss: 0.438984, cross loss: 0.534539, Total Loss: 1.975518 ,lr:0.000100
epoch:1/10
Step: 4601/48800, accuracy: 0.977778, center loss: 0.439246, cross loss: 0.499091, Total Loss: 1.940030 ,lr:0.000100
epoch:1/10
Step: 4701/48800, accuracy: 0.922222, center loss: 0.489296, cross loss: 0.836667, Total Loss: 2.278061 ,lr:0.000100
epoch:1/10
Step: 4801/48800, accuracy: 0.988889, center loss: 0.424289, cross loss: 0.412741, Total Loss: 1.853443 ,lr:0.000100
..............................到这了!
val: accuracy: 0.970586, center loss: 0.440298, cross loss: 0.491939, Total Loss: 1.932766

@104503529
Copy link

你好,我分别用了上面两种参数进行训练,但val acc都不会超过90%,仅从你给的log猜测,你这个应该是关掉后再重新开始训练的,如果没有在utils. py里呼叫np.random.seed(int),之后的验证数据会被渗入先前的训练数据而失准。鉴于我85%的模型已跟作者的Demo效果一致,故有此猜测,还请指正。

@imliusu
Copy link
Author

imliusu commented Jan 2, 2020

你好,我分别用了上面两种参数进行训练,但val acc都不会超过90%,仅从你给的log猜测,你这个应该是关掉后再重新开始训练的,如果没有在utils. py里呼叫np.random.seed(int),之后的验证数据会被渗入先前的训练数据而失准。鉴于我85%的模型已跟作者的Demo效果一致,故有此猜测,还请指正。

可能是有此问题,应该是意外中断而重载模型训练,但我忽略了这个问题。
请问能否提供和Demo一样效果的训练参数?我自己这两天用泰坦V重新跑了次,Batchsize 150,其他基本不变的情况下达到了92%左右的val acc,加多0.01和0.0001lr的训练次数会提高val acc,但运用和作者DEMO类似的方法效果并不好。
另外,我对CASIA数据集进行了清洗,效果略有提升,但是Demo中的方法效果很差(疑似过拟合)。
怀疑就默认150epoch会过拟合

@104503529
Copy link

基本上我这些参数是乱设的,没什么根据,batch_size 64,epoch 120,lr 0.01,LR_EPOCH=[40,70,100],其他如预设,完整的训练不中断,应能达到val 85%。
https://drive.google.com/open?id=1GJwQ0uEsFOSVAdUi4BAKbY9x0OuAnYje
以上是model连结,你可先载下来试试。

@imliusu
Copy link
Author

imliusu commented Jan 2, 2020

基本上我这些参数是乱设的,没什么根据,batch_size 64,epoch 120,lr 0.01,LR_EPOCH=[40,70,100],其他如预设,完整的训练不中断,应能达到val 85%。
https://drive.google.com/open?id=1GJwQ0uEsFOSVAdUi4BAKbY9x0OuAnYje
以上是model连结,你可先载下来试试。

好的,谢谢。
我会尝试在最近对相关参数做大批量训练分析,谢谢你提供的参数。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants