-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training Detail for Pretrain #24
Comments
@EasonXiao-888 Yes, during pretraining, I use the zero-shot QVhighlight results to monitor the training stage. Sure, I can share you the log, but might need few days to retrieve it. Please sent me email if I do not response in time. |
Okay , thanks a lot. But there is an additional question. When we use "Curve" data to perform pretrain on A100, it cannot be started due to CPU memory problems. Have you encountered this problem? |
I think this may due to the cache option |
I encounter the same problem too. I did not use the cache and when I load the Curve data the num_workers should only set to be 0. Otherwise it will encounter the problem. But setting the num_workers to 0, the programme will be quite slow. |
@EasonXiao-888 @RobertLuo1 Can you provide me the error output with details and the matched code line for better understanding? thanks you |
Same problem here. My program has always got stuck when loading the "curve_5_window.jsonl" file into the dataset. I used DDP and have tried to set num_workers=0, but it still didn't work. I wonder what was the cpu hardware environment being used for the pretraining. It seems that the pretraining has a very high cpu hardware requirement. Thank you. @QinghongLin |
Hello, thanks for your fancy work. I want to make sure that the pretrain model is verified on the val set of the QVHighlight dataset, ?and the ckpt is selected by comparing [email protected] ? What's more,could you please share the log file for pretraing?
The text was updated successfully, but these errors were encountered: