Tensorflow implementation of Semi-supervised Sequence Learning(https://arxiv.org/abs/1511.01432).
Auto-encoder or language model is used as a pre-trained model to initialize LSTM text classification model.
- SA-LSTM: Use auto-encoder as a pre-trained model.
- LM-LSTM: Use language model as a pre-trained model.
- Python 3
- Tensorflow
- pip install -r requirements.txt
DBpedia dataset is used for pre-training and training.
$ python pre_train.py --model="<MODEL>" --model_name="<MODEL_NAME>" --dict_size="<DICT_SIZE>"
(<Model>: auto_encoder | language_model) (<Model_Name>: Give a name to the model, default to "model") (<Dict_Szie>: The limit of vocabulary dictionary size, default to 20000)
$ python train.py --pre_trained="<MODEL>" --model_name="<MODEL_NAME>"
(<Model>: none | auto_encoder | language_model) (<Model_Name>: The pretrained model's name, default to "model")