Skip to content

A small data analysis project which gathers data using APIs and performs analysis on the data using Python and SQL

Notifications You must be signed in to change notification settings

najeeb-ur-rahaman/Car_Data_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Car Data Analysis Project

Project Overview

The Car Data Analysis project is designed to collect and analyze data about various cars using an external API. The project is divided into several steps, including data collection, storage in a PostgreSQL database, and performing data analysis using Python and SQL. This documentation will provide a step-by-step explanation of each component of the project.

Key Features

  • Data Collection: Data is retrieved from the car information API available at carapi.app.
  • Data Storage: The collected data is stored in a PostgreSQL database.
  • Data Analysis: We perform data analysis on the collected data using Python's pandas library and SQL queries.

Project Structure

The project consists of the following key files:

1. carAPI.py

  • Purpose: This script is responsible for connecting to the external API and retrieving car data.
  • Key Functionality:
    • Establishes a connection to the car API.
    • Sends requests to retrieve car-related data such as make, model, year, and other specifications.
    • Formats the retrieved data into a structured format for easy storage.

2. database.py

  • Purpose: This script handles database operations, including connecting to PostgreSQL and inserting the data.
  • Key Functionality:
    • Connects to a PostgreSQL database.
    • Creates necessary tables to store the car data.
    • Inserts the car data retrieved by carAPI.py into the database for further analysis.

3. main.py

  • Purpose: This is the main script that coordinates the entire process of data collection and storage.
  • Key Functionality:
    • Calls functions from carAPI.py to retrieve car data.
    • Calls functions from database.py to store the data in the PostgreSQL database.
    • Acts as the central point of execution for the project.

4. car_data_analysis.ipynb

  • Purpose: This Jupyter Notebook file is used to analyze the stored car data.
  • Key Functionality:
    • Connects to the PostgreSQL database to retrieve the data.
    • Uses Python's pandas library to perform various analyses, such as:
      • Summarizing car models by make and year.
      • Identifying trends in car specifications.
      • Analyzing the distribution of car prices, fuel types, and other metrics.
    • SQL queries are also used to perform more complex database operations directly on the stored data.

How It Works

1. Data Collection

  • The project uses Python to send requests to the carapi.app API to collect car-related data.
  • This data is then cleaned and organized to ensure it can be easily inserted into the database.

2. Data Storage

  • The project connects to a PostgreSQL database.
  • Tables are created to store the different types of data (e.g., car make, model, year, price).
  • Data from the API is inserted into these tables for future analysis.

3. Data Analysis

  • Using pandas in Python, we analyze the data to extract valuable insights.
  • The analysis focuses on identifying trends, summarizing data, and answering specific questions related to the car industry.
  • SQL is used to query the data in the database for more detailed analysis.

Getting Started

Prerequisites

To run this project, you'll need the following:

  • Python 3.x
  • PostgreSQL database
  • Required Python libraries (can be installed via requirements.txt):
    • pandas
    • requests
    • psycopg2 (for connecting to PostgreSQL)

Installation and Setup

1. Clone the Repository

First, clone the project repository to your local machine. This will create a local copy of the project files.

git clone https://github.com/najeeb-ur-rahaman/Car_Data_Analysis.git

2. Install Python and Required Libraries

Ensure you have Python 3.x installed on your machine. Next, install the required Python libraries. These libraries are essential for running the project scripts and performing data analysis.

  • Python Libraries:
    • pandas for data manipulation and analysis.
    • requests for making HTTP requests to the car API.
    • psycopg2 for connecting to PostgreSQL.
    • Other dependencies listed in the requirements.txt file.

3. Set Up PostgreSQL Database

You need to set up a PostgreSQL database to store the car data.

  • Create a Database:

    • Log in to your PostgreSQL instance and create a new database for storing car data.
  • Update Connection Details:

    • In the database.py file, update the connection details with your PostgreSQL database credentials (host, port, database name, username, and password).

4. Configure Car API

Before running the project, ensure that the car API connection details are properly configured in the carAPI.py file.

  • Update API Credentials:
    • If the car API requires authentication, update the carAPI.py file with the necessary API keys or credentials.

5. Run the Data Collection and Storage

Execute the main script to fetch data from the car API and store it in the PostgreSQL database.

  • Execute Main Script:
    • The main.py script orchestrates the data collection and storage process. Run this script to initiate the data fetching and storage operations.
python main.py

6. Perform Data Analysis

Once the data is stored in PostgreSQL, you can analyze it using the provided Jupyter Notebook.

  • Open Jupyter Notebook:

    • Launch Jupyter Notebook and open the car_data_analysis.ipynb file.
  • Run Analysis Cells:

    • Execute the cells in the notebook to perform data analysis using pandas and SQL. The notebook contains code for various analyses, such as summarizing car models, identifying trends, and analyzing distributions.

Usage

1. Data Collection

To collect and store data, run the main.py script. This script:

  • Connects to the car API using carAPI.py.
  • Retrieves car-related data.
  • Connects to the PostgreSQL database using database.py.
  • Creates tables and inserts the data into the database.

2. Data Analysis

  • Access the Jupyter Notebook:

    • Open car_data_analysis.ipynb in Jupyter Notebook.
  • Perform Analysis:

    • Run the cells to analyze the data. The notebook provides various analysis techniques, such as:
      • Summarizing car models by make and year.
      • Identifying trends in car specifications.
      • Analyzing the distribution of car prices, fuel types, and other metrics.

Conclusion

The Car Data Analysis project provides a complete solution for collecting, storing, and analyzing car data. By integrating data from the car API with a PostgreSQL database and using Python and SQL for analysis, the project offers valuable insights into various aspects of car data.

With this setup, you can:

  • Collect and store detailed information about cars.
  • Perform comprehensive data analysis to uncover trends and insights.
  • Use the analysis results to make informed decisions or generate reports.

This project serves as a robust foundation for further enhancements, such as adding more data sources, implementing advanced analysis techniques, or developing additional features for improved data handling and visualization.

Future Work

1. Expand Data Sources

  • Additional APIs: Integrate more car data APIs for a broader dataset.
  • Web Scraping: Implement scraping for other car-related websites to enrich data.

2. Enhanced Data Analysis

  • Advanced Metrics: Include more metrics and visualizations, like car performance trends over time.
  • Predictive Analysis: Add machine learning models to predict car prices or trends.

3. Improved User Interface

  • Interactive Dashboards: Develop interactive dashboards for better data visualization.
  • User Input: Allow users to filter and query data dynamically through a web interface.

4. Data Quality and Management

  • Data Cleaning: Implement more robust data cleaning processes to handle missing or inconsistent data.
  • Real-Time Updates: Set up a pipeline for real-time data updates from the API.

5. Automation and Deployment

  • Automation: Automate data collection and analysis tasks with scheduled scripts.
  • Deployment: Create a user-friendly deployment process for easy setup and use.

About

A small data analysis project which gathers data using APIs and performs analysis on the data using Python and SQL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published