Skip to content

afmdnf/hashtag-lstm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP - Bi-LSTM for Hashtag Segmentation

Using Keras to implement a bidirectional LSTM for Twitter hashtag segmentation. The task is to take a hashtag and segment it into the phrase it corresponds to. This task may seem trivial, but a single hashtag can be segmented in many different ways:

#wordsoftheday => “word soft he day" or “words of the day"
#statefarmisthere => “state far mist here" or “state farm is here"
#brainstorm => “bra in storm" or “brain strom"
#doubledown => “do u bled own" or “double down"
#votedems => “voted ems" or “vote dems"

The approach to solving this problem is to assume each timestep is a character and assign a binary label: 1 if a character should be followed by a space and 0 otherwise.

#nlprocks
input: [n, l, p, r, o, c, k, s]
label: [0, 0, 1, 0, 0, 0, 0, 0]

F1 Score: 0.754

Dataset

~700,000 segmented hashtags from Twitter

About

NLP - Bi-LSTM for Hashtag Segmentation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published