Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

continue pretraining nomic embed on domain data #70

Open
keyuchen21 opened this issue Dec 18, 2024 · 0 comments
Open

continue pretraining nomic embed on domain data #70

keyuchen21 opened this issue Dec 18, 2024 · 0 comments

Comments

@keyuchen21
Copy link

  1. any guide/code on how can I do continue pre-training on nomic v1.5 to align with a specific domain area? like finance or medicine.
  2. will just do Masked Language Modeling Pretraining on medical data works in this case? (based on this paper's idea: https://arxiv.org/pdf/2004.10964)
  • since Nomic embed have additional Unsupervised and Supervised Contrastive Pretraining, I'm not sure if I can still do additional masked langauge modeling pre-training on a domain data........

Best,
Keyu

@keyuchen21 keyuchen21 changed the title continue pretraining nomic continue pretraining nomic embed on domain data Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant