A PyTorch implementation of the Hierarchical Attention Network for Sentiment Analysis on the Amazon Product Reviews datasets. The system uses the review text and the summary text to classify the reviews as one of positive, negative or neutral. These classes correspond to ratings 4-5, 1-2 and 3 respectively in the dataset.
The code in the repository are organised in following modules:
- main.py: driver code
- model.py: Hierachical Attention Network implementation
- train.py: training/validation/testing code
- preprocess.py: data preprocessing code
- vocab.py: code for building vocab
- dataset.py: custom pytorch dataset for review data
- utils.py: logging, config generation, experiment analysis scripts
Following utility scripts have been added for training/testing:
- train.sh: will clean and preprocess train data, generate vocabulary pickles, and then train the model on the preprocessed data.
- test.sh: clean and preprocess test data, evaluate model on the preprocessed data and write model predictions to file.
$ ./train.sh <train_data_json>
$ ./test.sh <test_data_json> <result_file>