ValueError: could not broadcast input array from shape (86935,256) into shape (87008,256) #1

p-null · 2018-11-20T21:58:30Z

Hi, when i try to run download.sh, i have the following error:

Prepare for IMDB
Prepare script is running...
Traceback (most recent call last):
  File "preprocess.py", line 79, in <module>
    prepare_imdb()
  File "preprocess.py", line 55, in prepare_imdb
    imdb_validation_pos_start_id)
  File "preprocess.py", line 24, in load_file
    words = read_text(filename.strip())
  File "preprocess.py", line 11, in read_text
    for line in f:
  File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 399: ordinal not in range(128)

Then i added encoding='utf-8' at every with open() in preprocessing.py

After that, i have the following error:

Namespace(adaptive_softmax=1, add_labeld_to_unlabel=1, alpha=0.001, alpha_decay=0.9998, batchsize=32, batchsize_semi=96, clip=5.0, dataset='imdb', debug_mode=0, dropout=0.5, emb_dim=256, eval=0, freeze_word_emb=0, gpu=0, hidden_cls_dim=30, hidden_dim=1024, ignore_unk=1, load_trained_lstm='', lower=0, min_count=1, n_class=2, n_epoch=30, n_layers=1, nl_factor=1.0, norm_sentence_level=1, pretrained_model='imdb_pretrained_lm.model', random_seed=1234, save_name='imdb_model_vat', use_adv=0, use_exp_decay=1, use_rational=0, use_semi_data=1, use_unlabled_to_vocab=1, word_only=0, xi_var=5.0, xi_var_first=1.0)
train_set:71246
avg word number:242.8615501221121
vocab:87008
avg word number (train_x): 242.43914148545608
avg word number (dev_x):239.861747469366
avg word number (test_x):235.59372
lm_words_num:17297560
train_vocab_size: 66825
vocab_inv: 87008
Traceback (most recent call last):
  File "train.py", line 354, in <module>
    main()
  File "train.py", line 164, in main
    serializers.load_npz(args.pretrained_model, pretrain_model)
  File "/usr/local/lib/python3.6/dist-packages/chainer/serializers/npz.py", line 190, in load_npz
    d.load(obj)
  File "/usr/local/lib/python3.6/dist-packages/chainer/serializer.py", line 83, in load
    obj.serialize(self)
  File "/usr/local/lib/python3.6/dist-packages/chainer/link.py", line 997, in serialize
    d[name].serialize(serializer[name])
  File "/usr/local/lib/python3.6/dist-packages/chainer/link.py", line 651, in serialize
    data = serializer(name, param.data)
  File "/usr/local/lib/python3.6/dist-packages/chainer/serializers/npz.py", line 150, in __call__
    numpy.copyto(value, dataset)
ValueError: could not broadcast input array from shape (86935,256) into shape (87008,256)

I guess it is my modifying the decoding method that throws out some lines in file?
Could you give me a workout on this issue?

The text was updated successfully, but these errors were encountered:

aonotas · 2018-11-22T04:53:06Z

Thank you for your report!

Could you try to use following command?

$ cd data/imdb/
$ wget http://sato-motoki.com/research/vat/imdb_list.zip
$ unzip imdb_list.zip

p-null · 2018-11-22T21:12:07Z

Hi,
Still got the same error. To help reproduce the error, i upload a notebook here
Thanks!

aonotas · 2018-11-24T01:09:41Z

Thank you for your notebook.

$ cd data/imdb/
$ wget http://sato-motoki.com/research/vat/imdb_list.zip
$ unzip imdb_list.zip

Then add encoding='utf-8' at every with open() in preprocessing.py

Please let me know the result!

longquan0104 · 2019-06-16T16:29:18Z

Hi,
Still got the same error. To help reproduce the error, i upload a notebook here
Thanks!

Did you fix it. And may I know how ?!

dcetin · 2019-06-22T11:03:14Z

No progress on this one? I can train my own LM than it seems to load the weights alright, but couldn't make it with the pretrained weights.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: could not broadcast input array from shape (86935,256) into shape (87008,256) #1

ValueError: could not broadcast input array from shape (86935,256) into shape (87008,256) #1

p-null commented Nov 20, 2018 •

edited

Loading

aonotas commented Nov 22, 2018

p-null commented Nov 22, 2018

aonotas commented Nov 24, 2018 •

edited

Loading

longquan0104 commented Jun 16, 2019

dcetin commented Jun 22, 2019

ValueError: could not broadcast input array from shape (86935,256) into shape (87008,256) #1

ValueError: could not broadcast input array from shape (86935,256) into shape (87008,256) #1

Comments

p-null commented Nov 20, 2018 • edited Loading

aonotas commented Nov 22, 2018

p-null commented Nov 22, 2018

aonotas commented Nov 24, 2018 • edited Loading

longquan0104 commented Jun 16, 2019

dcetin commented Jun 22, 2019

p-null commented Nov 20, 2018 •

edited

Loading

aonotas commented Nov 24, 2018 •

edited

Loading