embedding training config file #21

starry-y · 2023-03-05T03:03:00Z

Thanks for your work!

I can not find this file train-embeddings-base-1gpu.json mentioned in ReadMe.md, but found bert-wwm-ext_literature file. Does the bert-wwm-ext_literature file replace the former file?

Thanks a lot!

Vimos · 2023-03-05T12:51:37Z

Hi, the basic difference of configurations are db paths. For embeddings, we use literature data rather than official data as the training data.

Yes, please use the ext_literature as the configuration file.

starry-y · 2023-03-05T14:35:37Z

Ok, thanks for your reply. I have replace the config file in the terminal.

And I have another question.

In the evaluation stage, what is pretrained/Chinese-word-vector/embeddings refering to ？

starry-y · 2023-03-06T02:37:10Z

And I could not find chengyu_synonym_dict in train_embedding.py ...

Sorry for bothering you, and waiting for your reply.

Vimos · 2023-03-06T06:29:17Z

Please refer to https://github.com/VisualJoyce/ChengyuBERT#learning-and-evaluating-chinese-idiom-embeddings

This is a different paper focusing on embedding learning and evaluation. The data has been shared online.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

embedding training config file #21

embedding training config file #21

starry-y commented Mar 5, 2023

Vimos commented Mar 5, 2023

starry-y commented Mar 5, 2023 •

edited

Loading

starry-y commented Mar 6, 2023 •

edited

Loading

Vimos commented Mar 6, 2023

embedding training config file #21

embedding training config file #21

Comments

starry-y commented Mar 5, 2023

Vimos commented Mar 5, 2023

starry-y commented Mar 5, 2023 • edited Loading

starry-y commented Mar 6, 2023 • edited Loading

Vimos commented Mar 6, 2023

starry-y commented Mar 5, 2023 •

edited

Loading

starry-y commented Mar 6, 2023 •

edited

Loading