Skip to content

An NLP project to cluster YouTube comments on the basis of their similarity of words.

License

Notifications You must be signed in to change notification settings

TERNION-1121/YT-Comments-Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

💻 YouTube Comments Clustering 👾

An NLP project to cluster YouTube comments on the basis of their similarity of words

📜 Description

An NLP Project in Python3 that clusters YouTube comments made on a particular video into distinct groups on the basis of their similarity of words, and visualises the results using wordclouds and a bar graph plot; primarily using techniques like k-Means clustering and the tf-idf.


Image 1 Image 2 Image 3
Image 4 Image 5

Sample word clouds and bar graph plot to analyse the clustered comments' data; comments from this video



The "Why" of the project

This video whipped up the inspiration within me to create something like this, sometime in the future. And who knew this was the best time to begin fulfulling this long held longing!

Pondering for a few days had hit me up with this idea to cluster YouTube comments.

Asked Why? 🤔

  • Firstly it could help one identify the genre of comments that were made the most on a particular video, and
  • Secondly how many people resonated with them (i.e. which kind of comments were liked the most)

A simple yet an effective way to analyse people's reviews and opinions on a particular video. Sounds fair and square?



⌨ Usage

Click here to navigate to the USAGE.md file and go through the steps to make use of this project by yourself!



🎯 Learnings

This was my first NLP project, that too in Python!

It was a nice experience learning about the basics of What NLP is, the NLP pipeline, Text pre-processing and representation, and to use these concepts in actual code.

One of the resources (in Hindi) I found really helpful was this YouTube playlist, these videos were really insightful and helped me understand my requirements and plan of action along the making of this project.

Not only did I get familiarized with the basics of pandas, but a part of this project also focused majorly on how to fetch the YouTube comments using the Google API. Trying to code that, along with a couple of documentations, references and resources available online, turned out to be a profound adventure on it's own.



✏️ On Contributions

I have tried what I could to structure the code nicely; had also spent considerable time to speed up the text-preprocessing times. However, if one could help out with a better code or overall project organisation, or more optimised methods in various parts of the project, that would be highly appreciated!

Even README contributions would be of profound help!


I hope you found this project, and it's explanation valuable. Let me know about anything that could be made better. Thanks for your time!