Social Media Scraper

Arnav Kumar Behera, Vedanta Mohapatra

October, 2022

In this project, we have implemented a social media scrapper for select supported websites and also pre-processed the extracted content. The project is built using snscrape. Kindly refer to Download section in the README.md file of the linked github page for help in installing it.

Features

Supported Websites: Twitter, Reddit
Supported Features for Twitter:
- Scrape tweets from a particular User, or any searches
- Incase, of searches either the tweets can be extracted either in latest or top order.
- The follwoing things are extracted: ['Unique ID', 'Date', 'User', 'Tweet', 'Preproccesed Tweet']
Supported Features for Reddit:
- This code can scrape comments/posts from a particular User, Sub-reddit or any searches.
- For Comments/Posts the following data are extracted:
- ['Unique ID', 'Date', 'Sub-reddit', 'Author', 'Title/Comment', 'Preprocessed Title/Comment']
- Incase of Posts Title is extracted, and for Comments the Comment(body) is extracted.
Preprocessing the data: The pre-processing involves removing URLS, expanding contractions, lower-casing all the texts, removing punctuations, removing numbers, removing extra white spaces, removing stop words, replacing emojis with words (implemented using emoji), lemmantizing the words. Any combinations of these pre-processing can be used depending on the use case.
Storing the Extracted data into .csv files.

Softwares used

Contact Us

Arnav Kumar Behera ([email protected])
Vedanta Mohapatra ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
CSV_Files.zip		CSV_Files.zip
LICENSE		LICENSE
README.md		README.md
webScrapper_tweeter_reddit.py		webScrapper_tweeter_reddit.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Social Media Scraper

Arnav Kumar Behera, Vedanta Mohapatra

October, 2022

Features

Softwares used

Contact Us

About

Releases

Packages

Contributors 2

Languages

License

vedanta28/social-media-scraper

Folders and files

Latest commit

History

Repository files navigation

Social Media Scraper

Arnav Kumar Behera, Vedanta Mohapatra

October, 2022

Features

Softwares used

Contact Us

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages