This project focuses on building a machine learning model to predict liver disease stages using clinical data. The workflow involves:
- Preprocessing the dataset: Handling missing values, scaling numerical features, and removing outliers to improve model performance.
- Feature selection: Employing the RandomForest algorithm for feature selection and model training.
- Hyperparameter tuning: Utilizing GridSearchCV for optimization.
- Model evaluation: Using metrics like accuracy, precision, recall, and F1 score to assess the model's performance.
Various visualizations such as feature importance plots, correlation heatmaps, and box plots are generated to offer insights into the data and model behavior.
To develop a machine learning model that accurately predicts liver disease stages by leveraging RandomForest and feature importance analysis, improving clinical decision-making.
- Python (Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn)
- Data Preprocessing and Cleaning (Handling missing values, outliers)
- Machine Learning (RandomForest, GridSearchCV, KFold Cross-Validation)
- Model Performance Evaluation (Accuracy, Precision, Recall, F1 Score)
- Data Visualization (Box plots, Heatmaps, Feature Importance plots)
- 🔍 Problem-Solving & Critical Thinking
- 🎯 Attention to Detail
- ⏱️ Time Management
- 🧪 Preprocessed Liver Disease Dataset: Missing values imputed, outliers removed.
- 🌲 Trained RandomForest Model: Optimized hyperparameters through GridSearchCV.
- 📊 Feature Importance Analysis: Important features selected for prediction.
- 📈 Performance Metrics Report: Including accuracy, precision, recall, and F1 score.
- 🔥 Correlation Heatmap
- 📦 Box Plots of Features
- 🌟 Feature Importance Plots
- 📉 Performance Metrics Bar Plots
- 🎨 Pair Plots of Important Features with the Target Variable
- Dataset Source: Liver Cirrhosis Stage Classification
- Preprocessing: Applied data scaling, outlier removal using Interquartile Range (IQR), and missing value imputation using the mode.
- Model Selection: RandomForest was chosen for its ability to rank feature importance.
- Hyperparameter Tuning: GridSearchCV used for optimal model configuration.
- Interpretability: Emphasis on model interpretability using cumulative feature importance to provide insights into the most influential features for predicting liver disease stages.
- Data preprocessing and cleaning
- Feature selection using RandomForest importance
- Model training with hyperparameter tuning
- Comprehensive evaluation metrics
- Visualization tools for model insights
- Unit tests with pytest
- CI/CD pipeline with GitHub Actions
- Clone the repository:
git clone https://github.com/AliNikoo73/Liver-Disease-Stage-Classification.git
cd Liver-Disease-Stage-Classification
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
-
Download the dataset as described in
data/README.md
-
Run the example notebook:
jupyter notebook notebooks/liver_disease_classification.ipynb
- Import and use the package in your own code:
from src import data, features, model, evaluate
# Load and preprocess data
df = data.load_data('data/liver_cirrhosis.csv')
X, y = data.preprocess_data(df)
# Train model
best_model, best_params = model.train_model(X, y)
# Make predictions
predictions = model.predict(best_model, X)
├── src/ # Source code
│ ├── data.py # Data loading and preprocessing
│ ├── features.py # Feature selection and engineering
│ ├── model.py # Model training and tuning
│ └── evaluate.py # Evaluation and visualization
├── tests/ # Unit tests
├── notebooks/ # Jupyter notebooks
├── data/ # Dataset directory
├── results/ # Output directory for plots
├── .github/ # GitHub Actions workflows
├── requirements.txt # Project dependencies
├── setup.py # Package setup file
├── LICENSE # MIT License
└── README.md # This file
- Install development dependencies:
pip install -e ".[dev]"
- Run tests:
pytest tests/
- Run linting:
black .
flake8 .
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
See CHANGELOG.md for a list of changes.