PCOS Symptom Network Analysis Pipeline

🎯 Project Objective

This project builds an integrated analytical pipeline that transforms raw clinical data from PCOS (Polycystic Ovary Syndrome) patients into meaningful network-based insights. By analyzing symptom co-occurrence patterns, the project helps researchers uncover hidden relationships, community structures, and potential subtypes within PCOS symptomatology.

📁 Project Structure

config.yaml
Central configuration file specifying data paths, filenames, threshold values, network settings, and visualization parameters.
PCOS_data_without_infertility.xlsx
Raw clinical dataset of PCOS patients (excluding infertility cases).
data_cleaning.py
Preprocesses the raw dataset by handling missing data, cleaning columns, and transforming or binarizing features as defined in config.yaml.
symptom_coocurence.py
Computes symptom co-occurrence matrices by analyzing how frequently symptoms appear together among patients.
network_utils.py
Contains helper functions for network construction, filtering, and computing network statistics to support the visualization and analysis processes.
symptom_network_visuals.py
Generates various visualizations of the symptom network including static images, interactive HTML-based networks, heatmaps, and community detection plots.
Output Files (generated during execution):
- cleaned_pcos_data.csv and binary_transformed_pcos_data.csv
- Co-occurrence matrices
- Visualizations: network_static.png, interactive_network.html, heatmap.png, and communities.png

⚙️ How to Run

Environment Setup:
Ensure you have Python 3.x installed, along with the required libraries. Install dependencies using: pip install pandas numpy matplotlib networkx pyyaml openpyxl plotly python-louvain scipy (Adjust the package list as needed if any additional dependencies are required.)
File Setup:
Place all project files (i.e., config.yaml, PCOS_data_without_infertility.xlsx, and the Python scripts) in your working directory, ensuring that directory paths match those specified in the configuration file.
Configuration Adjustments:
Modify config.yaml if necessary—for example, update file paths, threshold values, or visualization settings.
Execution Order:
Run the scripts in the following sequence: python data_cleaning.py python symptom_coocurence.py python symptom_network_visuals.py

🧠 Methodology

Data Cleaning:
The data_cleaning.py script processes the raw dataset by removing columns with excessive missing values, handling missing data beyond a 50% threshold, and transforming or binarizing relevant features as per the configuration settings.
Symptom Co-occurrence Analysis:
The symptom_coocurence.py script calculates pairwise symptom relationships to generate co-occurrence matrices, quantifying how frequently symptoms appear together across the patient records.
Network Construction:
Utilizing helper functions from network_utils.py, the pipeline filters and constructs a network graph based on co-occurrence data. Settings such as a minimum edge weight and the top N nodes (defined in config.yaml) ensure that only significant relationships are visualized.
Visualization & Community Detection:
The symptom_network_visuals.py script produces:
- A static overview image of the symptom network,
- An interactive HTML version for in-depth exploration,
- A heatmap showing the intensity of symptom co-occurrences, and
- Community detection visualizations (using the Louvain algorithm) to highlight clusters within the network.

📈 Output Visuals

network_static.png:
A static image capturing the overall symptom relationship network.
interactive_network.html:
An interactive, zoomable network visualization for detailed exploration.
heatmap.png:
A heatmap that visualizes the intensity and frequency of symptom co-occurrences.
communities.png:
A visual representation of community structures within the network, revealing potential clusters of related symptoms.

💡 Future Improvements

Integrate all pipeline steps into a unified CLI or main script for streamlined execution.
Enhance logging and error handling to increase robustness.
Develop additional visualizations and statistical summaries.
Containerize the application using Docker or virtual environments to ensure reproducibility.
Extend the analysis to incorporate additional clinical data or multi-omics datasets for comprehensive research applications.

📚 Credits

Created by Iroda Ulmasboeva as part of a Network Science and Telematics Lab. For questions/collaboration: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
data		data
scripts		scripts
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
report.pdf		report.pdf
report.txt		report.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PCOS Symptom Network Analysis Pipeline

🎯 Project Objective

📁 Project Structure

⚙️ How to Run

🧠 Methodology

📈 Output Visuals

💡 Future Improvements

📚 Credits

About

Uh oh!

Releases

Packages

Uh oh!

Languages

irooooda/PCOS_Symptom_Network_Analysis

Folders and files

Latest commit

History

Repository files navigation

PCOS Symptom Network Analysis Pipeline

🎯 Project Objective

📁 Project Structure

⚙️ How to Run

🧠 Methodology

📈 Output Visuals

💡 Future Improvements

📚 Credits

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages