Important

There is a bug in the implementation of the ensemble methods, causing the ensembles to have incorrectly high accuracies. Most notably, the previously reported accuracy of 97.42% for the DV-ngrams-cosine + NB-weighted BON ensemble is incorrect and when implemented correctly results in an accuracy of 93.68%. For now, please refer to https://github.com/bgzh/dv_cosine_revisited for a correct implementation of the ensembles.

In summary, the bug is caused by the incorrect concatenation of documents, so the non-ensemble methods are unaffected.

Code for the ACL-SRW 2019 paper

Code for the ACL-SRW 2019 paper: "Sentiment Classification using Document Embeddings trained with Cosine Similarity".

This repository contains Java code to train document embeddings using cosine similarity, simply run the project in order to do so. All hyperparameters that need adjusting are in the top of the file NeuralNetwork.java, default hyperparameters are the same as in the paper.

There are also options to train them using dot product and L2-regularized dot product.

Run ensemble.py in order to test the combination of document embeddings with NB-weighted bag of ngrams.

IMDB data: unigrams, unigrams+bigrams, unigrams+bigrams+trigrams

Trained embeddings (using cosine similarity): train vectors, test vectors

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
build/classes		build/classes
nbproject		nbproject
src		src
README.md		README.md
build.xml		build.xml
ensemble.py		ensemble.py
manifest.mf		manifest.mf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Important

Code for the ACL-SRW 2019 paper

About

Releases

Packages

Contributors 2

Languages

tanthongtan/dv-cosine

Folders and files

Latest commit

History

Repository files navigation

** Important **

Code for the ACL-SRW 2019 paper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Important

Packages