Skip to content

Latest commit

 

History

History
50 lines (39 loc) · 4.1 KB

README.md

File metadata and controls

50 lines (39 loc) · 4.1 KB

Realtime Climate Analysis Pipeline

Overview

This repository contains a powerful realtime climate analysis pipeline leveraging various AWS services and Snowflake, along with Power BI for insightful visualizations. The pipeline is designed to ingest data from a weather API, store it in Dynamo DB, process it through AWS Lambda functions, store it in S3, and finally load it into Snowflake for further analysis. Python scripts are utilized for carrying out advanced analytics such as linear regression on the collected data. Realtime_Weather_Analalysis

Features

  • Data Ingestion: Utilizes AWS Lambda functions to ingest data from the Climate API into Dynamo DB.

  • Data Storage: Dynamo DB is used as the primary data storage, with Dynamo DB streams used to trigger the ingestion of data into S3. Screenshot 2024-04-10 170332

  • Data Processing: Data stored in S3 is processed and loaded into Snowflake stages, followed by ingestion into Snowflake tables using Snowpipe.

  • Screenshot 2024-04-10 170449

  • Insightful Visualizations: Leverages Power BI for building insightful visualizations comparing parameters such as humidity, temperature, pressure, and wind speed across multiple cities. Screenshot 2024-04-11 003924 Screenshot 2024-04-10 235552_PB2

  • Advanced Analytics: Python scripts are employed for carrying out advanced analytics like linear regression on the collected data. image

  • instance prediction of Temprature with XGBoost Model image

  • Optimization and Scaling: Provides best practices for optimizing and scaling the data pipeline to enhance performance, ensure data integrity, and accommodate evolving project needs.

  • Troubleshooting and Best Practices: Equips users with troubleshooting skills and industry best practices to overcome common challenges in real-world scenarios, ensuring the reliability and resilience of the data pipeline.

  • Advanced Concepts and Future Trends: Explores advanced concepts and emerging trends in AWS Data Engineering, keeping users updated with the latest technologies and potential impacts on data engineering practices.

Installation

  1. Clone this repository to your local machine.
  2. Install the required dependencies mentioned in requirements.txt.
  3. Configure AWS credentials and permissions for accessing AWS services.
  4. Set up Snowflake credentials for connecting to Snowflake.

Usage

  1. Run the Lambda functions to ingest data from the Climate API into Dynamo DB.
  2. Monitor Dynamo DB streams for changes and trigger the ingestion of data into S3.
  3. Process the data stored in S3 and load it into Snowflake stages.
  4. Ingest the data from Snowflake stages into Snowflake tables using Snowpipe.
  5. Utilize Power BI for building insightful visualizations and analytics.
  6. Execute Python scripts for advanced analytics like linear regression & XGboost Models for prediction.

Contributing

Contributions are welcome! Feel free to open issues or pull requests for any improvements or new features.

Acknowledgements

  • Thanks to Wheather API for providing the weather data.
  • This project was inspired by the need for real-time climate analysis and the power of AWS and Snowflake in handling large-scale data pipelines.

Contact

For any inquiries or support, please contact https://www.linkedin.com/in/ritikdhame/ .