You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
论文中提到,For the general distillation, we set the maximum sequence length to 128 and use English Wikipedia (2,500M words) as the text corpus and perform the intermediate layer distillation for 3 epochs with the supervision from a pre-trained BERT BASE and keep other hyper-parameters the same as BERT pre-training (Devlin et al., 2019). 关于pre-train的语料,是否可以提供下载地址?
你好,
论文中提到,For the general distillation, we set the maximum sequence length to 128 and use English Wikipedia (2,500M words) as the text corpus and perform the intermediate layer distillation for 3 epochs with the supervision from a pre-trained BERT BASE and keep other hyper-parameters the same as BERT pre-training (Devlin et al., 2019). 关于pre-train的语料,是否可以提供下载地址?
此外,在pre-train阶段,对于general_distill.py的配置参数--do_lower_case,是否可以不设置该参数。看到已开放模型的vocab.txt是小写字典,请问目前是否有已训好的、关注大小写的TinyBERT模型(即"tinybert-cased")?
谢谢~
The text was updated successfully, but these errors were encountered: