Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bert运行出现问题 #18

Open
LLawlietc opened this issue Apr 9, 2020 · 7 comments
Open

bert运行出现问题 #18

LLawlietc opened this issue Apr 9, 2020 · 7 comments

Comments

@LLawlietc
Copy link

Traceback (most recent call last):
not enough values to unpack (expected 2, got 1)
File "D:/dogtime/mission/BiGRU_crf/bert_data_utils.py", line 41, in load_data
word, tag = line.split()
ValueError: not enough values to unpack (expected 2, got 1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:/dogtime/mission/BiGRU_crf/bert_data_utils.py", line 119, in
bert_data_util = BertDataUtils(tokenizer)
File "D:/dogtime/mission/BiGRU_crf/bert_data_utils.py", line 26, in init
self.load_data()
File "D:/dogtime/mission/BiGRU_crf/bert_data_utils.py", line 49, in load_data
inputs_ids = self.tokenizer.convert_tokens_to_ids(ntokens)
File "D:\dogtime\mission\BiGRU_crf\bert_base\bert\tokenization.py", line 179, in convert_tokens_to_ids
return convert_by_vocab(self.vocab, tokens)
File "D:\dogtime\mission\BiGRU_crf\bert_base\bert\tokenization.py", line 140, in convert_by_vocab
output.append(vocab[item])
KeyError: 'D'

@LLawlietc
Copy link
Author

您好,作者,按照你的用法不使用bert没有问题,使用bert模型的时候会报上面的错误。debug的时候发现是跑到数据end那一行的时候出的问题,inputs_ids = self.tokenizer.convert_tokens_to_ids(ntokens)应该是这里不对,但是找不到哪里有问题。

@yanwii
Copy link
Owner

yanwii commented Apr 10, 2020

可以定位到
File "D:\dogtime\mission\BiGRU_crf\bert_base\bert\tokenization.py", line 140, in convert_by_vocab output.append(vocab[item]) KeyError: 'D'
可见错误是vocab没有D这个key,修改一下
vocab.get(item, UNK_TOKEN_INDEX)

@LLawlietc
Copy link
Author

train data: 2757
nums of tags: 9
D:\Anaconda\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\ops\gradients_impl.py:97: UserWarning: Converting sparse I
ndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
2020-04-17 17:21:54.547394: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] You
r CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2020-04-17 17:21:54.680174: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1105]
Found device 0 with properties:
name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:01:00.0
totalMemory: 6.00GiB freeMemory: 4.97GiB
2020-04-17 17:21:54.688171: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1195]
Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1)
Traceback (most recent call last):
File "model.py", line 503, in
model.train()
File "model.py", line 329, in train
ARGS.init_checkpoint)
File "D:\dogtime\mission\论文相关\资源\BiGRU_crf\bert_base\bert\modeling.py", line 330, in get_assignment_map_from_checkpoint
init_vars = tf.train.list_variables(init_checkpoint)
File "D:\Anaconda\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\training\checkpoint_utils.py", line 89, in list_va
riables
reader = load_checkpoint(ckpt_dir_or_file)
File "D:\Anaconda\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\training\checkpoint_utils.py", line 60, in load_ch
eckpoint
return pywrap_tensorflow.NewCheckpointReader(filename)
File "D:\Anaconda\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 225, in NewCh
eckpointReader
return CheckpointReader(compat.as_bytes(filepattern), status)
File "D:\Anaconda\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.DataLossError: file is too short to be an sstable

@LLawlietc
Copy link
Author

您好,作者,按你所说修改后运行显示 file is too short to be an sstable。我的文件目录按你之前的issue创建如下,网上找了一下也没有很好的解决办法,感谢作者的帮助
捕获

@aleien95
Copy link

您好,作者,按你所说修改后运行显示 file is too short to be an sstable。我的文件目录按你之前的issue创建如下,网上找了一下也没有很好的解决办法,感谢作者的帮助
捕获

兄弟这个bert的词典是没有区分大小写的,所以训练集的大写字母在vocab里面找不到哦。简单的解决办法就是你把数据集的大写改成小写就好了~

@baiyewww
Copy link

您好,作者,按你所说修改后运行显示 file is too short to be an sstable。我的文件目录按你之前的issue创建如下,网上找了一下也没有很好的解决办法,感谢作者的帮助
捕获

你好,请问你解决了运行的问题了嘛,我的运行也出现了一些问题,请问bert_base这个目录下是什么文件呢

@hexieojie
Copy link

你好,请问这个bert_base在哪里获取
QQ图片20201203152231

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants