💻 YouTube Comments Clustering 👾

An NLP project to cluster YouTube comments on the basis of their similarity of words

📜 Description

An NLP Project in Python3 that clusters YouTube comments made on a particular video into distinct groups on the basis of their similarity of words, and visualises the results using wordclouds and a bar graph plot; primarily using techniques like k-Means clustering and the tf-idf.

Sample word clouds and bar graph plot to analyse the clustered comments' data; comments from this video

The "Why" of the project

This video whipped up the inspiration within me to create something like this, sometime in the future. And who knew this was the best time to begin fulfulling this long held longing!

Pondering for a few days had hit me up with this idea to cluster YouTube comments.

Asked Why? 🤔

Firstly it could help one identify the genre of comments that were made the most on a particular video, and
Secondly how many people resonated with them (i.e. which kind of comments were liked the most)

A simple yet an effective way to analyse people's reviews and opinions on a particular video. Sounds fair and square?

⌨ Usage

Click here to navigate to the USAGE.md file and go through the steps to make use of this project by yourself!

🎯 Learnings

This was my first NLP project, that too in Python!

It was a nice experience learning about the basics of What NLP is, the NLP pipeline, Text pre-processing and representation, and to use these concepts in actual code.

One of the resources (in Hindi) I found really helpful was this YouTube playlist, these videos were really insightful and helped me understand my requirements and plan of action along the making of this project.

Not only did I get familiarized with the basics of pandas, but a part of this project also focused majorly on how to fetch the YouTube comments using the Google API. Trying to code that, along with a couple of documentations, references and resources available online, turned out to be a profound adventure on it's own.

✏️ On Contributions

I have tried what I could to structure the code nicely; had also spent considerable time to speed up the text-preprocessing times. However, if one could help out with a better code or overall project organisation, or more optimised methods in various parts of the project, that would be highly appreciated!

Even README contributions would be of profound help!

I hope you found this project, and it's explanation valuable. Let me know about anything that could be made better. Thanks for your time!

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
image-assets		image-assets
raw-data		raw-data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
USAGE.md		USAGE.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

💻 YouTube Comments Clustering 👾

📜 Description

The "Why" of the project

⌨ Usage

🎯 Learnings

✏️ On Contributions

About

Releases

Languages

License

TERNION-1121/YT-Comments-Clustering

Folders and files

Latest commit

History

Repository files navigation

💻 YouTube Comments Clustering 👾

📜 Description

The "Why" of the project

⌨ Usage

🎯 Learnings

✏️ On Contributions

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages