Skip to content

davidmrau/dl4nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Learning for Natural Language Technologies

In this paper, a Multilayer Perceptron (MLP) with tf-idf features and n-gram features, a Recurrent Neural Network (RNN) are applied to identify the predominant language of a given paragraph from the WiLI-2018 dataset [Thoma, 2018]. The WiLI-2018 dataset includes 235 distinct languages. We could achieve 90 % accuracy on the test set for the MLP with tf-idf features, 93% for the MLP with n-grams and 91% with the RNN.

Implementation of the three lanugage identification models:

  1. MLP classifier with tf-idf features

  2. MLP classifier with n-grams features

  3. RNN

Each folder contains a readme that explains the parameter of each python script. Example bash scripts show how each python script in the pipeline can be runned.