Vocabulary size #183

lubiluk · 2019-06-12T07:37:58Z

Hi, while I was working with your code (I've rewritten it in Keras) I noticed one small detail about vocab_size:

vocab_size=len(vocab_processor.vocabulary_),

cnn-text-classification-tf/train.py

Line 88 in 18762b4

vocab_size=len(vocab_processor.vocabulary_),

Since we are padding sentences with zeros aren't we supposed to add padding word (0) into the vocabulary size? I think the original code from Yoon Kim does that:

W = np.zeros(shape=(vocab_size+1, k), dtype='float32')

https://github.com/yoonkim/CNN_sentence/blob/23e0e1f7355705bb083043fda05c031b15acb38c/process_data.py#L55

I know, it's probably a minor thing but wanted to ask to be 100% sure whether we should or should not add one to the vocabulary size.

PS. You can find my code here, I tried to follow your solution as closely as possible. But despite reaching similar accuracy as your code, the training doesn't behave so nicely like yours does (yet...?).
https://github.com/lubiluk/cnn-sentence/blob/master/cnn_sentence.ipynb

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vocabulary size #183

Vocabulary size #183

lubiluk commented Jun 12, 2019 •

edited

Loading

Vocabulary size #183

Vocabulary size #183

Comments

lubiluk commented Jun 12, 2019 • edited Loading

lubiluk commented Jun 12, 2019 •

edited

Loading