Skip to content

lab-rasool/MINDS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

62 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

logo

MINDS is a framework designed to integrate multimodal oncology data. It queries and integrates data from multiple sources, including clinical data, genomic data, and imaging data from the NIH NCI CRDC and TCIA portals.

Note

We are currently updating MINDS to include more data sources and improve the user experience. If you have any suggestions or would like to contribute, please feel free to reach out to us. Here is a list of the projects to be included in MINDS (115,974 total patients).

Projects in MINDS
Project Name Cases Clinical Radiology Histopathology Molecular
Foundation Medicine (FM) 18,004 βœ“ βœ“
The Cancer Genome Atlas (TCGA) 11,428 βœ“ βœ“ βœ“ βœ“
Therapeutically Applicable Research to Generate Effective Treatments (TARGET) 6,543 βœ“ βœ“
Clinical Proteomic Tumor Analysis Consortium (CPTAC) 1,656 βœ“ βœ“ βœ“
The Molecular Profiling to Predict Response to Treatment (MP2PRT) 1,562 βœ“ βœ“
Multiple Myeloma Research Foundation (MMRF) 995 βœ“ βœ“
BEATAML1.0 882 βœ“ βœ“
Cancer Genome Characterization Initiatives (CGCI) 645 βœ“ βœ“ βœ“
NCI Center for Cancer Research (NCICCR) 489 βœ“ βœ“
REBC 449 βœ“ βœ“
MATCH 448 βœ“ βœ“
Ukrainian National Research Center for Radiation Medicine Trio Study (TRIO) 339 βœ“ βœ“
Count Me In (CMI) 299 βœ“ βœ“
Human Cancer Model Initiative (HCMI) 278 βœ“ βœ“ βœ“
West Coast Prostrate Cancer Dream Team (WCDT) 101 βœ“ βœ“
Oregon Health and Science University (OHSU) 176 βœ“ βœ“
Applied Proteogenomics OrganizationaL Learning and Outcomes (APOLLO) 87 βœ“ βœ“
EXCEPTIONAL RESPONDERS 84 βœ“ βœ“
Environment And Genetics in Lung Cancer Etiology (EAGLE) 50 βœ“ βœ“
ORGANOID 70 βœ“ βœ“
Clinical Trials Sequencing Project (CTSP) 45 βœ“ βœ“
VA Research Precision Oncology Program (VAREPOP) 7 βœ“ βœ“
4D-Lung 20 βœ“
A091105 83 βœ“
AAPM-RT-MAC 55 βœ“
ACNS0332 85 βœ“
ACRIN-6698 385 βœ“
ACRIN-Contralateral-Breast-MR 984 βœ“
ACRIN-DSC-MR-Brain 123 βœ“
ACRIN-FLT-Breast 83 βœ“ βœ“
ACRIN-FMISO-Brain 45 βœ“
ACRIN-HNSCC-FDG-PET-CT 260 βœ“
ACRIN-NSCLC-FDG-PET 242 βœ“
Adrenal-ACC-Ki67-Seg 53 βœ“ βœ“
Advanced-MRI-Breast-Lesions 632 βœ“ βœ“ βœ“
AHEP0731 80 βœ“
AHOD0831 165 βœ“
AML-Cytomorphology_LMU 200 βœ“
AML-Cytomorphology_MLL_Helmholtz 189 βœ“
Anti-PD-1_Lung 46 βœ“
Anti-PD-1_MELANOMA 47 βœ“
APOLLO-5 414 βœ“
ARAR0331 108 βœ“
AREN0532 544 βœ“
AREN0533 294 βœ“
AREN0534 239 βœ“
B-mode-and-CEUS-Liver 120 βœ“
Bone-Marrow-Cytomorphology_MLL_Helmholtz_Fraunhofer 945 βœ“
Brain-TR-GammaKnife 47 βœ“
Brain-Tumor-Progression 20 βœ“
Breast-Cancer-Screening-DBT 5,060 βœ“
BREAST-DIAGNOSIS 88 βœ“
Breast-Lesions-USG 256 βœ“
Breast-MRI-NACT-Pilot 64 βœ“
Burdenko-GBM-Progression 180 βœ“
C-NMC 2019 118 βœ“
C4KC-KiTS 210 βœ“
CALGB50303 155 βœ“
CBIS-DDSM 1,566 βœ“
CC-Radiomics-Phantom 17 βœ“
CC-Radiomics-Phantom-2 251 βœ“
CC-Tumor-Heterogeneity 23 βœ“
CDD-CESM 326 βœ“
CMB-AML 8 βœ“ βœ“
CMB-CRC 49 βœ“ βœ“
CMB-GEC 7 βœ“ βœ“
CMB-LCA 61 βœ“ βœ“
CMB-MEL 44 βœ“ βœ“
CMB-MML 64 βœ“ βœ“
CMB-PCA 12 βœ“ βœ“
CMMD 1,775 βœ“ βœ“ βœ“
CODEX imaging of HCC 15 βœ“
Colorectal-Liver-Metastases 197 βœ“
COVID-19-AR 105 βœ“
COVID-19-NY-SBU 1,384 βœ“
CRC_FFPE-CODEX_CellNeighs 35 βœ“
CT COLONOGRAPHY 825 βœ“ βœ“
CT Images in COVID-19 661 βœ“
CT Lymph Nodes 176 βœ“
CT-ORG 140 βœ“
CT-Phantom4Radiomics 1 βœ“
CT-vs-PET-Ventilation-Imaging 20 βœ“
CTpred-Sunitinib-panNET 38 βœ“
DFCI-BCH-BWH-PEDs-HGG 61 βœ“
DLBCL-Morphology 209 βœ“
DRO-Toolkit 32 βœ“
Duke-Breast-Cancer-MRI 922 βœ“
EA1141 500 βœ“
ExACT 30 βœ“
FDG-PET-CT-Lesions 900 βœ“
GammaKnife-Hippocampal 390 βœ“
GBM-DSC-MRI-DRO 3 βœ“
GLIS-RT 230 βœ“
HCC-TACE-Seg 105 βœ“
HE-vs-MPM 12 βœ“
Head-Neck Cetuximab 111 βœ“
Head-Neck-PET-CT 298 βœ“
HEAD-NECK-RADIOMICS-HN1 137 βœ“
Healthy-Total-Body-CTs 30 βœ“
HER2 tumor ROIs 273 βœ“
HistologyHSI-GB 13 βœ“
HNC-IMRT-70-33 211 βœ“
HNSCC 627 βœ“
HNSCC-3DCT-RT 31 βœ“
HNSCC-mIF-mIHC-comparison 8 βœ“
Hungarian-Colorectal-Screening 200 βœ“
ISPY1 222 βœ“
ISPY2 719 βœ“
IvyGAP 39 βœ“
LCTSC 60 βœ“
LDCT-and-Projection-data 299 βœ“
LGG-1p19qDeletion 159 βœ“
LIDC-IDRI 1,010 βœ“
Lung Phantom 1 βœ“
Lung-Fused-CT-Pathology 6 βœ“
Lung-PET-CT-Dx 355 βœ“
LungCT-Diagnosis 61 βœ“
Meningioma-SEG-CLASS 96 βœ“
MIDRC-RICORD-1A 110 βœ“
MIDRC-RICORD-1B 117 βœ“
MIDRC-RICORD-1C 361 βœ“
MiMM_SBILab 5 βœ“
NADT-Prostate 37 βœ“
NaF PROSTATE 9 βœ“
NLST 26,254 βœ“ βœ“
NRG-1308 12 βœ“
NSCLC Radiogenomics 211 βœ“
NSCLC-Cetuximab 490 βœ“
NSCLC-Radiomics 422 βœ“
NSCLC-Radiomics-Genomics 89 βœ“
NSCLC-Radiomics-Interobserver1 22 βœ“
OPC-Radiomics 606 βœ“
Osteosarcoma-Tumor-Assessment 4 βœ“
Ovarian Bevacizumab Response 78 βœ“
Pancreas-CT 82 βœ“
Pancreatic-CT-CBCT-SEG 40 βœ“
PCa_Bx_3Dpathology 50 βœ“ βœ“
Pediatric-CT-SEG 359 βœ“
Pelvic-Reference-Data 58 βœ“
Phantom FDA 7 βœ“
Post-NAT-BRCA 64 βœ“
Pretreat-MetsToBrain-Masks 200 βœ“ βœ“
Prostate Fused-MRI-Pathology 28 βœ“
Prostate-3T 64 βœ“
Prostate-Anatomical-Edge-Cases 131 βœ“
PROSTATE-DIAGNOSIS 92 βœ“
PROSTATE-MRI 26 βœ“
Prostate-MRI-US-Biopsy 1,151 βœ“
PROSTATEx 346 βœ“
Pseudo-PHI-DICOM-Data 21 βœ“
PTRC-HGSOC 174 βœ“
QIBA CT-1C 1 βœ“
QIBA-CT-Liver-Phantom 3 βœ“
QIN Breast DCE-MRI 10 βœ“
QIN GBM Treatment Response 54 βœ“
QIN LUNG CT 47 βœ“
QIN PET Phantom 2 βœ“
QIN PROSTATE 22 βœ“
QIN-BRAIN-DSC-MRI 49 βœ“
QIN-BREAST 67 βœ“
QIN-BREAST-02 13 βœ“
QIN-HEADNECK 279 βœ“
QIN-PROSTATE-Repeatability 15 βœ“
QIN-SARCOMA 15 βœ“
RADCURE 3,346 βœ“ βœ“
REMBRANDT 130 βœ“
ReMIND 114 βœ“
RHUH-GBM 40 βœ“
RIDER Breast MRI 5 βœ“
RIDER Lung CT 32 βœ“
RIDER Lung PET-CT 244 βœ“
RIDER NEURO MRI 19 βœ“
RIDER PHANTOM MRI 10 βœ“
RIDER PHANTOM PET-CT 20 βœ“
RIDER Pilot 8 βœ“
S0819 1,299 βœ“
SLN-Breast 78 βœ“
SN-AM 60 βœ“
Soft-tissue-Sarcoma 51 βœ“
SPIE-AAPM Lung CT Challenge 70 βœ“
StageII-Colorectal-CT 230 βœ“
UCSF-PDGM 495 βœ“
UPENN-GBM 630 βœ“
Vestibular-Schwannoma-MC-RC 124 βœ“
Vestibular-Schwannoma-SEG 242 βœ“
VICTRE 2,994 βœ“

Installation

Currently the cloud version of MINDS is in closed beta, but, you can still recreate the MINDS database locally. To get the local version of the MINDS database running, you will need to setup a MySQL database and populate it with the MINDS schema. This can be easily done using a docker container. First, you will need to install docker. You can find the installation instructions for your operating system here. Next, you will need to pull the MySQL docker image and run a container with the following command.

Note

Please replace my-secret-pw with your desired password and port with the port you want to use to access the database. The default port for MySQL is 3306. The following command will not work until you replace port with a valid port number.

docker run -d --name minds -e MYSQL_ROOT_PASSWORD=my-secret-pw -e MYSQL_DATABASE=minds -p port:3306 mysql

Finally, to install the MINDS python package use the following pip command:

pip install git+https://github.com/lab-rasool/MINDS.git

After installing the package, please create a .env file in the root directory of the project with the following variables:

HOST=127.0.0.1
PORT=3306
DB_USER=root
PASSWORD=my-secret-pw
DATABASE=minds   

Usage

Initial setup and automated updates

If you have locally setup the MINDS database, then you will need to populate it with data. To do this, or to update the database with the latest data, you can use the following command:

# Import the minds package
import minds

# Update the database with the latest data
minds.update()

Querying the MINDS database

The MINDS python package provides a python interface to the MINDS database. You can use this interface to query the database and return the results as a pandas dataframe.

import minds

# get a list of all the tables in the database
tables = minds.get_tables()

# get a list of all the columns in a table
columns = minds.get_columns("clinical")

# Query the database directly
query = "SELECT * FROM minds.clinical WHERE project_id = 'TCGA-LUAD' LIMIT 10"
df = minds.query(query)

Building the cohort and downloading the data

# Generate a cohort to download from query
query_cohort = minds.build_cohort(query=query, output_dir="./data")

# or you can now directly supply a cohort from GDC
gdc_cohort = minds.build_cohort(gdc_cohort="cohort_Unsaved_Cohort.2024-02-12.tsv", output_dir="./data")

# to get the cohort details
gdc_cohort.stats()

# to download the data from the cohort to the output directory specified
# you can also specify the number of threads to use and the modalities to exclude or include
gdc_cohort.download(threads=12, exclude=["Slide Image"])

Please cite our work

@Article{s24051634,
    AUTHOR = {Tripathi, Aakash and Waqas, Asim and Venkatesan, Kavya and Yilmaz, Yasin and Rasool, Ghulam},
    TITLE = {Building Flexible, Scalable, and Machine Learning-Ready Multimodal Oncology Datasets},
    JOURNAL = {Sensors},
    VOLUME = {24},
    YEAR = {2024},
    NUMBER = {5},
    ARTICLE-NUMBER = {1634},
    URL = {https://www.mdpi.com/1424-8220/24/5/1634},
    ISSN = {1424-8220},
    DOI = {10.3390/s24051634}
}

Contributing

We welcome contributions from the community. If you would like to contribute to the MINDS project, please read our contributing guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.