3. Classification using Graphlab

Previously : Crawling and Preparing Training Set

Once the training data is ready in

# Build the classifier
python classification/graphlab/graphlab_train.py --training_dir classification/data/latest/
# Build the dataset
python classification/graphlab/graphlab_train.py --dataset_parsed_dir ~/brazil/all_files_parsed # ~/brazil/2005_parsed/ 
# Classify the dataset
python classification/graphlab/graphlab_classify.py --dataset_dir graphlab/my_dataset --classified_dir graphlab/result_dataset
#to print the results:
python classification/graphlab/graphlab_classify.py --classified_dir result_dataset --print

Classification by event type:

python classification/graphlab/classify_by_event_type.py --classified_dir graphlab/result_dataset
python classification/graphlab/classify_by_event_type.py --print

Embeddings

Word Embeddings (Portuguese)

  pip install nlpnet
  wget http://nilc.icmc.usp.br/nlpnet/data/embeddings-pt.tgz
  tar xzf embeddings-pt.tgz
  
  python classification/embeddingstotxt.py --type plain --embeddings ~/brazil/w2e-embeddings/types-features.npy -v  ~/brazil/w2e-embeddings/vocabulary2.txt -o /tmp/
  mv /tmp/models.txt ~/brazil/portuguese-nlp/word2vec_model.txt