GitHub - kaushik-iiserm/institutional-scrapping: IDC 409 Web scrapping project

Faculties web page Scraping Repository

This repository contains Python scripts for scraping research data from three leading Indian research institutions:

We analysed the research area and academic background of the faculties to derive some conclusive study (Data and plot for analysis are presented in the webpage and corresponding folders).

Overview

The main script, main.py, serves the following purposes:

Data Scraping: It scrapes data from each faculties websites of the three institutions, extracting information related to research projects, publications, or any other relevant data.
Data Storage: The scraped data is then stored in an SQLite3 database, ensuring a structured and efficient storage mechanism.
Data Analysis: The script also includes functionality to analyze the scraped data, allowing you to gain insights or perform various operations on the collected information.

The script is properly commented for readers to easily follow along..

Webpage

We also built a webpage that we use to explain all the methods we used for our scrapping and analysis.

Folders of IISER_Pune and IISER_Kolkata

This folder includes the seperate analysis contents including code, database file, plots for these institutes. IISER Mohali's analysis is done by the main.py itself.

Usage

To use the main.py script, follow these steps:

Clone this repository to your local machine:

git clone https://github.com/kaushik-iiserm/institutional-scrapping/

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.github/workflows		.github/workflows
.vscode		.vscode
IISER_Kolkata		IISER_Kolkata
IISER_Pune		IISER_Pune
images		images
README.md		README.md
iiserm.db		iiserm.db
index.html		index.html
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Faculties web page Scraping Repository

Overview

Webpage

Folders of IISER_Pune and IISER_Kolkata

Usage

About

Releases

Packages

Contributors 3

Languages

kaushik-iiserm/institutional-scrapping

Folders and files

Latest commit

History

Repository files navigation

Faculties web page Scraping Repository

Overview

Webpage

Folders of IISER_Pune and IISER_Kolkata

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages