Skip to content
View Shegzimus's full-sized avatar

Block or report Shegzimus

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Shegzimus/README.md

Deutsch wakatime

🌟 About Me

  • ⚙️ Data Engineer skilled in building robust and scalable ETL pipelines for cloud platforms like AWS and GCP.
  • 🔢 Academic background in Mathematics & Data Science with proven abilities in problem-solving and analogous thinking.
  • 💼 Brief career in Management & Public Health Consulting where I delivered data-driven strategies across various industries.
  • 💻 Passionate about data pipeline development, and containerized workflows.

🛠️ Key Skills

  • Programming Languages: Python, SQL, PySpark, HCL.
  • Tools & Technologies: Docker, Airflow, Terraform, AWS (S3, Glue, EC2), GCP (BigQuery, Storage, Cloud Run).
  • Behavioral Strengths: Seeing the "bigger picture", abstracting ideas and integrating client expectations into my solution design process.
  • Tech Specialties: Writing readable & refactorable ETL modules and configuring Docker images/containers.

💡 My Design Philosophy (what you can expect)

  • Scalable and Delegable Pipelines: I believe in designing pipelines that are easy to operate and maintain so that teams can focus on creating new solutions and solving other challenges. This approach contrasts with the common belief that only the author can maintain their code. Good pipelines require little to no author intervention after deployment.

  • Functionality before refinement: I believe in getting things off the ground and starting with a pipeline that works—delivering value quickly—before refining it into a more polished, future-proof version. This saves time and lets me adapt designs based on real-world feedback.

  • Security and Modularity as Cornerstones: Secure and modular designs are fundamental to my work. I focus (too much sometimes) on implementing best practices like secret management, non-hardcoded paths, and modular structures to ensure pipelines are robust, compliant, and easy to maintain.

🔭 What I’m Working On

  • Building a weekly ETL pipeline for Near Earth Objects (asteroids) and Coronal Mass Ejections using NASA's API.
  • An AI model that deploys metaheuristic algorithms to plan and optimize grocery lists based on customer location and product proxy/availability; maximizing/minimizing variables based on fitness goals, nutrition density & budget.
  • Building a pipeline with dynamic workload management capabilities using Airflow sensors and Spark to adjust resource allocation based on data volume.
  • Designing a player database for a League of Legends discord group to track in-house scrims, tournaments and player stats.

📈 Recent Projects

  • Fashion Image ETL Pipeline: A modular pipeline for extracting, transforming, and loading 23+ GB of images and their respective metadata for hypothetical ML use.

📫 Connect With Me

If you think I'll be a good addition to your team, feel free to contact me using the links below and let's discuss your next solution!

Pinned Loading

  1. DE_Fashion_Product_Images DE_Fashion_Product_Images Public

    Apache Airflow powered ETL Pipeline for moving about 133k images from Kaggle to GCS and BigQuery

    Python 1

  2. DE_NASA_NeoW_Pipeline DE_NASA_NeoW_Pipeline Public

    Airflow powered ETL pipeline for moving Near-Earth-Object data from NASA to Google Cloud

    Python

  3. ML-Video-Game-Sales-Prediction ML-Video-Game-Sales-Prediction Public

    Jupyter Notebook

  4. Masters-Thesis Masters-Thesis Public

    Python