Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

Loading 'distilbert_6_768_12' is broken #1549

Open
craffel opened this issue Apr 14, 2021 · 2 comments
Open

Loading 'distilbert_6_768_12' is broken #1549

craffel opened this issue Apr 14, 2021 · 2 comments
Labels
bug Something isn't working

Comments

@craffel
Copy link

craffel commented Apr 14, 2021

Description

The example code at https://nlp.gluon.ai/model_zoo/bert/index.html for the DistilBERT model produces an exception at HEAD.

Error Message

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-1-fbb321631ad8> in <module>()
      2 get_ipython().system('pip install mxnet')
      3 import gluonnlp as nlp; import mxnet as mx;
----> 4 model, vocab = nlp.model.get_model('distilbert_6_768_12', dataset_name='distil_book_corpus_wiki_en_uncased')

5 frames

/usr/local/lib/python3.7/dist-packages/gluonnlp/model/__init__.py in get_model(name, **kwargs)
    154             'Model %s is not supported. Available options are\n\t%s'%(
    155                 name, '\n\t'.join(sorted(models.keys()))))
--> 156     return models[name](**kwargs)

/usr/local/lib/python3.7/dist-packages/gluonnlp/model/bert.py in distilbert_6_768_12(dataset_name, vocab, pretrained, ctx, output_attention, output_all_encodings, root, hparam_allow_override, **kwargs)
   1311 
   1312     from ..vocab import Vocab  # pylint: disable=import-outside-toplevel
-> 1313     bert_vocab = _load_vocab(dataset_name, vocab, root, cls=Vocab)
   1314     # DistilBERT
   1315     net = DistilBERTModel(encoder, len(bert_vocab),

/usr/local/lib/python3.7/dist-packages/gluonnlp/model/utils.py in _load_vocab(dataset_name, vocab, root, cls)
    269                           'Loading vocab based on dataset_name. '
    270                           'Input "vocab" argument will be ignored.')
--> 271         vocab = _load_pretrained_vocab(dataset_name, root, cls)
    272     else:
    273         assert vocab is not None, 'Must specify vocab if not loading from predefined datasets.'

/usr/local/lib/python3.7/dist-packages/gluonnlp/data/utils.py in _load_pretrained_vocab(name, root, cls)
    387         Loaded vocabulary object and Tokenizer for the pre-trained model.
    388     """
--> 389     file_name, file_ext, sha1_hash, special_tokens = _get_vocab_tokenizer_info(name, root)
    390     file_path = os.path.join(root, file_name + file_ext)
    391     if os.path.exists(file_path):

/usr/local/lib/python3.7/dist-packages/gluonnlp/data/utils.py in _get_vocab_tokenizer_info(name, root)
    346 def _get_vocab_tokenizer_info(name, root):
    347     file_name = '{name}-{short_hash}'.format(name=name,
--> 348                                              short_hash=short_hash(name))
    349     root = os.path.expanduser(root)
    350     sha1_hash, file_ext, special_tokens = _vocab_sha1[name]

/usr/local/lib/python3.7/dist-packages/gluonnlp/data/utils.py in short_hash(name)
    340         raise ValueError('Vocabulary for {name} is not available. '
    341                          'Hosted vocabularies include: {vocabs}'.format(name=name,
--> 342                                                                         vocabs=vocabs))
    343     return _vocab_sha1[name][0][:8]
    344 

ValueError: Vocabulary for distil_book_corpus_wiki_en_uncased is not available. Hosted vocabularies include: ['wikitext-2', 'gbw', 'WMT2014_src', 'WMT2014_tgt', 'book_corpus_wiki_en_cased', 'book_corpus_wiki_en_uncased', 'openwebtext_book_corpus_wiki_en_uncased', 'openwebtext_ccnews_stories_books_cased', 'wiki_multilingual_cased', 'distilbert_book_corpus_wiki_en_uncased', 'wiki_cn_cased', 'wiki_multilingual_uncased', 'scibert_scivocab_uncased', 'scibert_scivocab_cased', 'scibert_basevocab_uncased', 'scibert_basevocab_cased', 'biobert_v1.0_pmc_cased', 'biobert_v1.0_pubmed_cased', 'biobert_v1.0_pubmed_pmc_cased', 'biobert_v1.1_pubmed_cased', 'clinicalbert_uncased', 'baidu_ernie_uncased', 'openai_webtext', 'xlnet_126gb', 'kobert_news_wiki_ko_cased']

To Reproduce

Here is a colab: https://colab.research.google.com/drive/1PhShfNvXWQIzPbBiSZwo3uwfNzv2n0UJ?usp=sharing
It is as simple as

!pip install gluonnlp
!pip install mxnet
import gluonnlp as nlp; import mxnet as mx;
model, vocab = nlp.model.get_model('distilbert_6_768_12', dataset_name='distil_book_corpus_wiki_en_uncased')

Steps to reproduce

(Paste the commands you ran that produced the error.)

!pip install gluonnlp
!pip install mxnet
import gluonnlp as nlp; import mxnet as mx;
model, vocab = nlp.model.get_model('distilbert_6_768_12', dataset_name='distil_book_corpus_wiki_en_uncased')

What have you tried to solve it?

  1. I tried other models, they worked.
  2. I tried replacing the dataset_name with book_corpus_wiki_en_uncased which did not work

Environment

We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:

curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python

# paste outputs here

This script (https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py) does not exist, it gives a 404.

@craffel craffel added the bug Something isn't working label Apr 14, 2021
@szha
Copy link
Member

szha commented Apr 14, 2021

@craffel thanks for reporting. The above PRs should fix the problem. The correct dataset name is distilbert_book_corpus_wiki_en_uncased

@craffel
Copy link
Author

craffel commented Apr 15, 2021

Thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants