This is an algorithm for generating keywords (tags) for a document.
We can split the algorithm into two parts:
- TF (= term frequency): how often does a word occur in one document
- IDF (= inverse document frequency): the higher this score, the less frequently the term occurs in other documents (words such as 'a' or 'the' get a low IDF score)
Clone this repo
$ git clone https://github.com/moritzmitterdorfer/TF-IDF.git