Deep Learning for Natural Language Technologies

In this paper, a Multilayer Perceptron (MLP) with tf-idf features and n-gram features, a Recurrent Neural Network (RNN) are applied to identify the predominant language of a given paragraph from the WiLI-2018 dataset [Thoma, 2018]. The WiLI-2018 dataset includes 235 distinct languages. We could achieve 90 % accuracy on the test set for the MLP with tf-idf features, 93% for the MLP with n-grams and 91% with the RNN.

Implementation of the three lanugage identification models:

MLP classifier with tf-idf features
MLP classifier with n-grams features
RNN

Each folder contains a readme that explains the parameter of each python script. Example bash scripts show how each python script in the pipeline can be runned.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
n-gram		n-gram
rnn		rnn
tf_idf		tf_idf
visualization		visualization
.gitignore		.gitignore
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Learning for Natural Language Technologies

About

Releases

Packages

Contributors 2

Languages

davidmrau/dl4nlp

Folders and files

Latest commit

History

Repository files navigation

Deep Learning for Natural Language Technologies

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages