🧬 Liver Disease Detection Using RandomForest and Feature Importance Analysis

📜 Summary

This project focuses on building a machine learning model to predict liver disease stages using clinical data. The workflow involves:

Preprocessing the dataset: Handling missing values, scaling numerical features, and removing outliers to improve model performance.
Feature selection: Employing the RandomForest algorithm for feature selection and model training.
Hyperparameter tuning: Utilizing GridSearchCV for optimization.
Model evaluation: Using metrics like accuracy, precision, recall, and F1 score to assess the model's performance.

Various visualizations such as feature importance plots, correlation heatmaps, and box plots are generated to offer insights into the data and model behavior.

🎯 Objective

To develop a machine learning model that accurately predicts liver disease stages by leveraging RandomForest and feature importance analysis, improving clinical decision-making.

🛠 Skills Required

Technical Skills

Python (Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn)
Data Preprocessing and Cleaning (Handling missing values, outliers)
Machine Learning (RandomForest, GridSearchCV, KFold Cross-Validation)
Model Performance Evaluation (Accuracy, Precision, Recall, F1 Score)
Data Visualization (Box plots, Heatmaps, Feature Importance plots)

Soft Skills

🔍 Problem-Solving & Critical Thinking
🎯 Attention to Detail
⏱️ Time Management

📊 Deliverables

Key Outputs

🧪 Preprocessed Liver Disease Dataset: Missing values imputed, outliers removed.
🌲 Trained RandomForest Model: Optimized hyperparameters through GridSearchCV.
📊 Feature Importance Analysis: Important features selected for prediction.
📈 Performance Metrics Report: Including accuracy, precision, recall, and F1 score.

Visualizations

🔥 Correlation Heatmap
📦 Box Plots of Features
🌟 Feature Importance Plots
📉 Performance Metrics Bar Plots
🎨 Pair Plots of Important Features with the Target Variable

🔍 Additional Information

Dataset Source: Liver Cirrhosis Stage Classification
Preprocessing: Applied data scaling, outlier removal using Interquartile Range (IQR), and missing value imputation using the mode.
Model Selection: RandomForest was chosen for its ability to rank feature importance.
Hyperparameter Tuning: GridSearchCV used for optimal model configuration.
Interpretability: Emphasis on model interpretability using cumulative feature importance to provide insights into the most influential features for predicting liver disease stages.

Features

Data preprocessing and cleaning
Feature selection using RandomForest importance
Model training with hyperparameter tuning
Comprehensive evaluation metrics
Visualization tools for model insights
Unit tests with pytest
CI/CD pipeline with GitHub Actions

Installation

Clone the repository:

git clone https://github.com/AliNikoo73/Liver-Disease-Stage-Classification.git
cd Liver-Disease-Stage-Classification

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Usage

Download the dataset as described in data/README.md
Run the example notebook:

jupyter notebook notebooks/liver_disease_classification.ipynb

Import and use the package in your own code:

from src import data, features, model, evaluate

# Load and preprocess data
df = data.load_data('data/liver_cirrhosis.csv')
X, y = data.preprocess_data(df)

# Train model
best_model, best_params = model.train_model(X, y)

# Make predictions
predictions = model.predict(best_model, X)

Project Structure

├── src/                    # Source code
│   ├── data.py            # Data loading and preprocessing
│   ├── features.py        # Feature selection and engineering
│   ├── model.py           # Model training and tuning
│   └── evaluate.py        # Evaluation and visualization
├── tests/                  # Unit tests
├── notebooks/             # Jupyter notebooks
├── data/                  # Dataset directory
├── results/               # Output directory for plots
├── .github/               # GitHub Actions workflows
├── requirements.txt       # Project dependencies
├── setup.py              # Package setup file
├── LICENSE               # MIT License
└── README.md             # This file

Development

Install development dependencies:

pip install -e ".[dev]"

Run tests:

pytest tests/

Run linting:

black .
flake8 .

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Changelog

See CHANGELOG.md for a list of changes.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
configs		configs
data		data
models		models
results		results
src/liver_disease_prediction		src/liver_disease_prediction
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧬 Liver Disease Detection Using RandomForest and Feature Importance Analysis

📜 Summary

🎯 Objective

🛠 Skills Required

Technical Skills

Soft Skills

📊 Deliverables

Key Outputs

Visualizations

🔍 Additional Information

Features

Installation

Usage

Project Structure

Development

Contributing

License

Changelog

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

AliNikoo73/Liver-Disease-Stage-Classification

Folders and files

Latest commit

History

Repository files navigation

🧬 Liver Disease Detection Using RandomForest and Feature Importance Analysis

📜 Summary

🎯 Objective

🛠 Skills Required

Technical Skills

Soft Skills

📊 Deliverables

Key Outputs

Visualizations

🔍 Additional Information

Features

Installation

Usage

Project Structure

Development

Contributing

License

Changelog

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages