-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom BERT Model outputs #22
Comments
I have a quick first pass at point 1 in a fork, based on how How would I approach point 2? |
I'm making more progress on the very beginnings of a transformers-style library for Burn using traits for pipeline implementations, but in my WIP testing so far I'm having trouble with learning not working correctly. It doesn't seem to be using the pre-trained weights form https://github.com/bkonkle/burn-transformers This is using my branch with pooled Bert output. The branch doesn't currently build, but I plan to do more work on it this week to fix that and get a good example in place for feedback. |
Awesome @bkonkle! I think the current implementation is using RoBERTa weights instead of BERT, so maybe this isn't compatible with the BERT weights for the classification head. Not sure if this helps, but if you find something not working, make sure to test multiple backends and report a bug if there are differences. |
Okay, I believe I understand goal 2 better now. I was thinking this meant a flag for If my interpretation is correct, I believe the approach in my fork here addresses this by returning both the last hidden states and the optional pooled output if available. Update: Solved - see the next comment below.
From what I can tell, the Bert model here should support `bert-base-uncased` without any issues using the additional pooler layer (which is also loaded from the safetensors file) despite being originally written for RoBERTa. Unfortunately, I'm still getting really poor accuracy when loading from safetensors and then fine-tuning on the Snips dataset.
|
The learning rate was indeed a hint. I had it set way too low, based on the default value in the JointBERT repo I was learning from. 😅 Setting the learning rate to ======================== Learner Summary ========================
|
For flexibility to fine-tune in downstream tasks, we should have the following options in the BERT family model outputs:
The text was updated successfully, but these errors were encountered: