This project demonstrates a complete end-to-end data science pipeline using the Asthma Disease Dataset.
The goal is to predict which factors lead to a positive asthma diagnosis, following the typical data science lifecycle:
- Data exploration & visualization
- Feature engineering
- Model building & evaluation
- Packaging reusable code
- Testing and documentation
This project is part of the SoftUni Data Science course.
asthma_project/
│
├── data/ # Dataset(s)
│ └── asthma_disease_data.csv
│
├── notebooks/ # Jupyter notebooks (EDA, experiments)
│ ├── 01_exploration.ipynb
│ ├── 02_feature_engineering.ipynb
│ └── 03_modeling.ipynb
│
├── src/ # Python source code (reusable functions)
│ ├── __init__.py
│ ├── data_prep.py
│ ├── features.py
│ └── model.py
│
├── tests/ # Unit tests
│ └── test_data_prep.py
│
├── outputs/ # Plots, results, model artifacts
│
├── requirements.txt # Dependencies
├── README.md # Project description
└── .gitignore # Ignore rules
The dataset is provided by the course and is located in data/asthma_disease_data.csv.
It contains patient health data and labels indicating asthma diagnosis.
-
Clone the repository (private now):
git clone https://github.com/YOUR-USERNAME/asthma-prediction.git cd asthma-prediction -
Create a virtual environment and install dependencies:
python -m venv venv venv\Scripts\activate # On Windows pip install -r requirements.txt
-
Launch Jupyter:
jupyter lab
-
Tests:
python -m pytest -v
- Explore dataset properties and key patterns.
- Engineer useful features for prediction.
- Train and evaluate classification models.
- Package workflow into reusable modules (
src/). - Add unit tests for reproducibility.
- This project is for educational purposes as part of the SoftUni Data Science course.
- The workflow is simplified compared to real-world projects, which are more iterative and complex.