Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

embedding training config file #21

Open
starry-y opened this issue Mar 5, 2023 · 4 comments
Open

embedding training config file #21

starry-y opened this issue Mar 5, 2023 · 4 comments

Comments

@starry-y
Copy link

starry-y commented Mar 5, 2023

Thanks for your work!

I can not find this file train-embeddings-base-1gpu.json mentioned in ReadMe.md, but found bert-wwm-ext_literature file. Does the bert-wwm-ext_literature file replace the former file?

Thanks a lot!

@Vimos
Copy link
Member

Vimos commented Mar 5, 2023

Hi, the basic difference of configurations are db paths. For embeddings, we use literature data rather than official data as the training data.

Yes, please use the ext_literature as the configuration file.

@starry-y
Copy link
Author

starry-y commented Mar 5, 2023

Ok, thanks for your reply. I have replace the config file in the terminal.

And I have another question.

In the evaluation stage, what is pretrained/Chinese-word-vector/embeddings refering to ?

@starry-y
Copy link
Author

starry-y commented Mar 6, 2023

And I could not find chengyu_synonym_dict in train_embedding.py ...

Sorry for bothering you, and waiting for your reply.

@Vimos
Copy link
Member

Vimos commented Mar 6, 2023

Please refer to https://github.com/VisualJoyce/ChengyuBERT#learning-and-evaluating-chinese-idiom-embeddings

This is a different paper focusing on embedding learning and evaluation. The data has been shared online.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants