Skip to content
View mariambadmusk's full-sized avatar
  • 10Anlytics
  • Edinbugh

Block or report mariambadmusk

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
mariambadmusk/README.md

Junior Data Engineer | AWS Certified | Python, SQL, ETL, Cloud Data Platforms | Building Scalable Data Pipelines | 2+ Years of Experience in Data

Hi, I'm my name is Mariam, a passionate Junior Data Engineer with over 2 years of experience working with various tools and technologies in data engineering, ETL processes, and cloud platforms. My journey in data has allowed me to build pipelines, analyse large datasets, and implement cloud-based solutions, making an impact on data-driven decision-making and business outcomes.

Tools:

AWS Airflow Docker PySpark SQL Pandas Python Kafka

Projects:

AWS Driven News Aggregation And Sentiment Data Pipeline

AWS EC2 RDS IAM AWS System Manager Newsdata.io API PySpark Hugging Face Transformers Airflow PostgreSQL

A fully automated data pipeline designed to streamline the process of collecting, analysing, and storing daily news articles. Leveraging the Newsdata.io API, it fetches relevant news articles, performs sophisticated sentiment analysis, and securely stores the processed data in a PostgreSQL database deployed on Amazon RDS.

End-to-End Airbnb Data Pipeline with AWS and GCP

AWS Glue Amazon S3 VPC Amazon Athena Amazon Quicksight IAM PySpark GCP

This project showcases a robust, scalable, and efficient data pipeline designed for Airbnb data using the power of AWS and GCP. The pipeline extracts, transforms, and loads (ETL) data from multiple sources, ensuring high performance and reliability. It leverages AWS services like S3, Lambda, and Redshift for storage and processing, while utilising GCP's BigQuery and Dataflow for advanced analytics and real-time data processing.

Pinned Loading

  1. aliexpress_data_engineering_pipeline_webscraping_etl_for_products_and_reviews aliexpress_data_engineering_pipeline_webscraping_etl_for_products_and_reviews Public

    Jupyter Notebook

  2. aws_driven_news_aggregation_and_sentiment_data_pipeline aws_driven_news_aggregation_and_sentiment_data_pipeline Public

    This project automates a data pipeline to collect daily news articles from Newsdata.io API, perform sentiment analysis, and store the processed data in PostgreSQL deployed to Amazon RDS

    Python

  3. news_etl_automation_with_airflow_and_docker news_etl_automation_with_airflow_and_docker Public

    A pipeline for extracting news articles, analysing sentiment using machine learning models, and storing the results in a structured database. This project leverages Apache Airflow for workflow auto…

    Python

  4. end_to_end_airbnb_data_pipeline_with_aws_and_gcp end_to_end_airbnb_data_pipeline_with_aws_and_gcp Public