Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: could not broadcast input array from shape (86935,256) into shape (87008,256) #1

Open
p-null opened this issue Nov 20, 2018 · 5 comments

Comments

@p-null
Copy link

p-null commented Nov 20, 2018

Hi, when i try to run download.sh, i have the following error:

Prepare for IMDB
Prepare script is running...
Traceback (most recent call last):
  File "preprocess.py", line 79, in <module>
    prepare_imdb()
  File "preprocess.py", line 55, in prepare_imdb
    imdb_validation_pos_start_id)
  File "preprocess.py", line 24, in load_file
    words = read_text(filename.strip())
  File "preprocess.py", line 11, in read_text
    for line in f:
  File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 399: ordinal not in range(128)

Then i added encoding='utf-8' at every with open() in preprocessing.py

After that, i have the following error:

Namespace(adaptive_softmax=1, add_labeld_to_unlabel=1, alpha=0.001, alpha_decay=0.9998, batchsize=32, batchsize_semi=96, clip=5.0, dataset='imdb', debug_mode=0, dropout=0.5, emb_dim=256, eval=0, freeze_word_emb=0, gpu=0, hidden_cls_dim=30, hidden_dim=1024, ignore_unk=1, load_trained_lstm='', lower=0, min_count=1, n_class=2, n_epoch=30, n_layers=1, nl_factor=1.0, norm_sentence_level=1, pretrained_model='imdb_pretrained_lm.model', random_seed=1234, save_name='imdb_model_vat', use_adv=0, use_exp_decay=1, use_rational=0, use_semi_data=1, use_unlabled_to_vocab=1, word_only=0, xi_var=5.0, xi_var_first=1.0)
train_set:71246
avg word number:242.8615501221121
vocab:87008
avg word number (train_x): 242.43914148545608
avg word number (dev_x):239.861747469366
avg word number (test_x):235.59372
lm_words_num:17297560
train_vocab_size: 66825
vocab_inv: 87008
Traceback (most recent call last):
  File "train.py", line 354, in <module>
    main()
  File "train.py", line 164, in main
    serializers.load_npz(args.pretrained_model, pretrain_model)
  File "/usr/local/lib/python3.6/dist-packages/chainer/serializers/npz.py", line 190, in load_npz
    d.load(obj)
  File "/usr/local/lib/python3.6/dist-packages/chainer/serializer.py", line 83, in load
    obj.serialize(self)
  File "/usr/local/lib/python3.6/dist-packages/chainer/link.py", line 997, in serialize
    d[name].serialize(serializer[name])
  File "/usr/local/lib/python3.6/dist-packages/chainer/link.py", line 651, in serialize
    data = serializer(name, param.data)
  File "/usr/local/lib/python3.6/dist-packages/chainer/serializers/npz.py", line 150, in __call__
    numpy.copyto(value, dataset)
ValueError: could not broadcast input array from shape (86935,256) into shape (87008,256)

I guess it is my modifying the decoding method that throws out some lines in file?
Could you give me a workout on this issue?

@aonotas
Copy link
Owner

aonotas commented Nov 22, 2018

Thank you for your report!

Could you try to use following command?

$ cd data/imdb/
$ wget http://sato-motoki.com/research/vat/imdb_list.zip
$ unzip imdb_list.zip

@p-null
Copy link
Author

p-null commented Nov 22, 2018

Hi,
Still got the same error. To help reproduce the error, i upload a notebook here
Thanks!

@aonotas
Copy link
Owner

aonotas commented Nov 24, 2018

Thank you for your notebook.

$ cd data/imdb/
$ wget http://sato-motoki.com/research/vat/imdb_list.zip
$ unzip imdb_list.zip

Then add encoding='utf-8' at every with open() in preprocessing.py

Please let me know the result!

@longquan0104
Copy link

Hi,
Still got the same error. To help reproduce the error, i upload a notebook here
Thanks!

Did you fix it. And may I know how ?!

@dcetin
Copy link

dcetin commented Jun 22, 2019

No progress on this one? I can train my own LM than it seems to load the weights alright, but couldn't make it with the pretrained weights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants