NLP

LemmoPoSSpacy-NLP

date: 08-11-2022
written by: Wan-Ting Yeh
language: python library: os, pandas, spacy

tokenise the text
clean the data (Exclude unwanted token, eg., punctuation, symbols)
lemmentisation (talked, talking --> talk)
custominsed lemmentisation (eg., peeeeeekaboo --> peekaboo)
count unique word / total word / type-token ratio
- unique word: only appears once in the text
- total word: word count in the text
- type-token ratio = unique word/ total word
list part of word (noun, pronoun, adj...)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LemmPoSSpacy-NLP.ipynb		LemmPoSSpacy-NLP.ipynb
README.md		README.md