NLP - Bi-LSTM for Hashtag Segmentation

Using Keras to implement a bidirectional LSTM for Twitter hashtag segmentation. The task is to take a hashtag and segment it into the phrase it corresponds to. This task may seem trivial, but a single hashtag can be segmented in many different ways:

#wordsoftheday => “word soft he day" or “words of the day"
#statefarmisthere => “state far mist here" or “state farm is here"
#brainstorm => “bra in storm" or “brain strom"
#doubledown => “do u bled own" or “double down"
#votedems => “voted ems" or “vote dems"

The approach to solving this problem is to assume each timestep is a character and assign a binary label: 1 if a character should be followed by a space and 0 otherwise.

#nlprocks
input: [n, l, p, r, o, c, k, s]
label: [0, 0, 1, 0, 0, 0, 0, 0]

F1 Score: 0.754

Dataset

~700,000 segmented hashtags from Twitter

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
dev.txt		dev.txt
hashtag_lstm.ipynb		hashtag_lstm.ipynb
output.txt		output.txt
train.txt		train.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP - Bi-LSTM for Hashtag Segmentation

F1 Score: 0.754

Dataset

About

Releases

Packages

Languages

afmdnf/hashtag-lstm

Folders and files

Latest commit

History

Repository files navigation

NLP - Bi-LSTM for Hashtag Segmentation

F1 Score: 0.754

Dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages