Skip to content

Commit

Permalink
Merge pull request #74 from MStarmans91/development
Browse files Browse the repository at this point in the history
Release version 3.6.1
  • Loading branch information
MStarmans91 committed Feb 15, 2023
2 parents 0658775 + b1284b8 commit 4801a6f
Show file tree
Hide file tree
Showing 136 changed files with 4,138 additions and 2,182 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -127,10 +127,14 @@ dmypy.json
# Pyre type checker
.pyre/

# Visual studio code config
.vscode

# Example data
WORC/exampledata/*.hdf5
WORC/external/*
WORC/exampledata/ICCvalues.csv
WORC/tests/*.png
WORC/tests/*.mat
WORC/tests/performance*.json
WORC/tests/WORC_Example_STWStrategyHN_*
32 changes: 32 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,38 @@ All notable changes to this project will be documented in this file.
The format is based on `Keep a Changelog <http://keepachangelog.com/>`_
and this project adheres to `Semantic Versioning <http://semver.org/>`_

3.6.1 - 2023-02-15
------------------

Fixed
~~~~~
- Bug when using elastix, dimensionality got wrong name in fastr network.
- Bug in BasicWORC when starting from features
- For statistical test thresholding, if refit does not work during ensembling, skip this method
instead of returning NaN.
- Createfixedsplits function was outdated, updates to newest conventions and added
to documentation.
- When using Dummy's, segmentix now still copies metadata information from image
to segmentation when required.

Changed
~~~~~~~
- When part of fiting and scoring a workflow fails, during optimization return NaN as performance,
but during refitting skip that step. During optimization, if we skip it, the relation between
the hyperparameter and performance gets disturbed. During refitting, we need to have a model,
so best option is to skip the step. Previously, there was only skipping.
- Set default of XGB estimator parallelization to single job.

Added
~~~~~
- Documentation updates
- Option to save the workflows trained on the train-validation training datasets, besides the option to save workflows
trained on the full training dataset. Not an option for SMAC due to implementation of SMAC.
- Validate function for fastr HDF5 datatype, as previously fastr deemed empty hdf5 types as valid.
- Option to eliminate features which are all NaN during imputation.
- Preflightcheck on number of image types provided.


3.6.0 - 2022-04-05
------------------

Expand Down
6 changes: 2 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# WORC v3.6.0
# WORC v3.6.1
## Workflow for Optimal Radiomics Classification

## Information
Expand Down Expand Up @@ -70,14 +70,12 @@ The official documentation can be found at [https://worc.readthedocs.io](https:/
The publicly released WORC database is described in the following paper:

```bibtex
@article {Starmans2021.08.19.21262238,
@article {Starmans2021WORCDatabase,
author = {Starmans, Martijn P.A. and Timbergen, Milea J.M. and Vos, Melissa and Padmos, Guillaume A. and Gr{\"u}nhagen, Dirk J. and Verhoef, Cornelis and Sleijfer, Stefan and van Leenders, Geert J.L.H. and Buisman, Florian E. and Willemssen, Francois E.J.A. and Koerkamp, Bas Groot and Angus, Lindsay and van der Veldt, Astrid A.M. and Rajicic, Ana and Odink, Arlette E. and Renckens, Michel and Doukas, Michail and de Man, Rob A. and IJzermans, Jan N.M. and Miclea, Razvan L. and Vermeulen, Peter B. and Thomeer, Maarten G. and Visser, Jacob J. and Niessen, Wiro J. and Klein, Stefan},
title = {The WORC database: MRI and CT scans, segmentations, and clinical labels for 930 patients from six radiomics studies},
elocation-id = {2021.08.19.21262238},
year = {2021},
doi = {10.1101/2021.08.19.21262238},
publisher = {Cold Spring Harbor Laboratory Press},
abstract = {The WORC database consists in total of 930 patients composed of six datasets gathered at the Erasmus MC, consisting of patients with: 1) well-differentiated liposarcoma or lipoma (115 patients); 2) desmoid-type fibromatosis or extremity soft-tissue sarcomas (203 patients); 3) primary solid liver tumors, either malignant (hepatocellular carcinoma or intrahepatic cholangiocarcinoma) or benign (hepatocellular adenoma or focal nodular hyperplasia) (186 patients); 4) gastrointestinal stromal tumors (GISTs) and intra-abdominal gastrointestinal tumors radiologically resembling GISTs (246 patients); 5) colorectal liver metastases (77 patients); and 6) lung metastases of metastatic melanoma (103 patients). For each patient, either a magnetic resonance imaging (MRI) or computed tomography (CT) scan, collected from routine clinical care, one or multiple (semi-)automatic lesion segmentations, and ground truth labels from a gold standard (e.g., pathologically proven) are available. All datasets are multicenter imaging datasets, as patients referred to our institute often received imaging at their referring hospital. The dataset can be used to validate or develop radiomics methods, i.e., using machine or deep learning to relate the visual appearance to the ground truth labels, and automatic segmentation methods. See also the research article related to this dataset: Starmans et al., Reproducible radiomics through automated machine learning validated on twelve clinical applications, Submitted.View this table:Competing Interest StatementWiro J. Niessen is founder, scientific lead, and shareholder of Quantib BV. Jacob J. Visser is a medical advisor at Contextflow. Astrid A. M. van der Veldt is a consultant (fees paid to the institute) at BMS, Merck, MSD, Sanofi, Eisai, Pfizer, Roche, Novartis, Pierre Fabre and Ipsen. The other authors do not declare any conflicts of interest.Funding StatementThis research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The study protocol for the collection of the WORC database conformed to the ethical guidelines of the 1975 Declaration of Helsinki. Approval by the local institutional review board of the Erasmus MC (Rotterdam, the Netherlands) was obtained for collection of the WORC database (MEC-2020-0961), and separately for the six included studies (Lipo: MEC-2016-339, Desmoid: MEC-2016-339, Liver: MEC-2017-1035, GIST: MEC-2017-1187, CRLM: MEC-2017-479, Melanoma: MEC-2019-0693). The need for informed consent was waived due to the use of anonymized, retrospective data.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe data referred to in this manuscript is publicly available at https://xnat.bmia.nl/data/projects/worc. The code to download the data and reproduce the experiments from the radiomics study in which this data was presented can be found at https://github.com/MStarmans91/WORCDatabase. https://xnat.bmia.nl/data/projects/worc https://github.com/MStarmans91/WORCDatabase},
URL = {https://www.medrxiv.org/content/early/2021/08/25/2021.08.19.21262238},
eprint = {https://www.medrxiv.org/content/early/2021/08/25/2021.08.19.21262238.full.pdf},
journal = {medRxiv}
Expand Down
64 changes: 47 additions & 17 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
WORC v3.6.0
WORC v3.6.1
===========

Workflow for Optimal Radiomics Classification
Expand All @@ -16,8 +16,8 @@ Information
Introduction
============

WORC is an open-source python package for the easy execution of full
radiomics pipelines.
WORC is an open-source python package for the easy execution and fully
automatic construction and optimization of radiomics workflows.

We aim to establish a general radiomics platform supporting easy
integration of other tools. With our modular build and support of
Expand All @@ -33,9 +33,28 @@ License
This package is covered by the open source `APACHE 2.0
License <APACHE-LICENSE-2.0>`__.

When using WORC, please cite this repository as following:

``Martijn P.A. Starmans, Sebastian R. van der Voort, Thomas Phil and Stefan Klein. Workflow for Optimal Radiomics Classification (WORC). Zenodo (2018). Available from: https://github.com/MStarmans91/WORC. DOI: http://doi.org/10.5281/zenodo.3840534.``
When using WORC, please cite this repository and the paper describing
WORC as as follows:

.. code:: bibtex
@article{starmans2021reproducible,
title={Reproducible radiomics through automated machine learning validated on twelve clinical applications},
author={Martijn P. A. Starmans and Sebastian R. van der Voort and Thomas Phil and Milea J. M. Timbergen and Melissa Vos and Guillaume A. Padmos and Wouter Kessels and David Hanff and Dirk J. Grunhagen and Cornelis Verhoef and Stefan Sleijfer and Martin J. van den Bent and Marion Smits and Roy S. Dwarkasing and Christopher J. Els and Federico Fiduzi and Geert J. L. H. van Leenders and Anela Blazevic and Johannes Hofland and Tessa Brabander and Renza A. H. van Gils and Gaston J. H. Franssen and Richard A. Feelders and Wouter W. de Herder and Florian E. Buisman and Francois E. J. A. Willemssen and Bas Groot Koerkamp and Lindsay Angus and Astrid A. M. van der Veldt and Ana Rajicic and Arlette E. Odink and Mitchell Deen and Jose M. Castillo T. and Jifke Veenland and Ivo Schoots and Michel Renckens and Michail Doukas and Rob A. de Man and Jan N. M. IJzermans and Razvan L. Miclea and Peter B. Vermeulen and Esther E. Bron and Maarten G. Thomeer and Jacob J. Visser and Wiro J. Niessen and Stefan Klein},
year={2021},
eprint={2108.08618},
archivePrefix={arXiv},
primaryClass={eess.IV}
}
@software{starmans2018worc,
author = {Martijn P. A. Starmans and Thomas Phil and Sebastian R. van der Voort and Stefan Klein},
title = {Workflow for Optimal Radiomics Classification (WORC)},
year = {2018},
publisher = {Zenodo},
doi = {10.5281/zenodo.3840534},
url = {https://github.com/MStarmans91/WORC}
}
For the DOI, visit |image4|.

Expand All @@ -48,14 +67,32 @@ occur. Please contact us through the channels below if you find any and
we will try to fix them as soon as possible, or create an issue on this
Github.

Tutorial and Documentation
--------------------------
Tutorial, documentation and dataset
-----------------------------------

The WORC tutorial is hosted in a `separate
repository <https://github.com/MStarmans91/WORCTutorial>`__.
The WORC tutorial is hosted at
https://github.com/MStarmans91/WORCTutorial.

The official documentation can be found at https://worc.readthedocs.io.

The publicly released WORC database is described in the following paper:

.. code:: bibtex
@article {Starmans2021WORCDatabase,
author = {Starmans, Martijn P.A. and Timbergen, Milea J.M. and Vos, Melissa and Padmos, Guillaume A. and Gr{\"u}nhagen, Dirk J. and Verhoef, Cornelis and Sleijfer, Stefan and van Leenders, Geert J.L.H. and Buisman, Florian E. and Willemssen, Francois E.J.A. and Koerkamp, Bas Groot and Angus, Lindsay and van der Veldt, Astrid A.M. and Rajicic, Ana and Odink, Arlette E. and Renckens, Michel and Doukas, Michail and de Man, Rob A. and IJzermans, Jan N.M. and Miclea, Razvan L. and Vermeulen, Peter B. and Thomeer, Maarten G. and Visser, Jacob J. and Niessen, Wiro J. and Klein, Stefan},
title = {The WORC database: MRI and CT scans, segmentations, and clinical labels for 930 patients from six radiomics studies},
elocation-id = {2021.08.19.21262238},
year = {2021},
doi = {10.1101/2021.08.19.21262238},
URL = {https://www.medrxiv.org/content/early/2021/08/25/2021.08.19.21262238},
eprint = {https://www.medrxiv.org/content/early/2021/08/25/2021.08.19.21262238.full.pdf},
journal = {medRxiv}
}
The code to download the WORC database and reproduce our experiments can
be found at https://github.com/MStarmans91/WORCDatabase.

Installation
------------

Expand Down Expand Up @@ -109,13 +146,6 @@ Tutorial <https://github.com/MStarmans91/WORCTutorial>`__. Besides a
Jupyter notebook with instructions, we provide there also an example
script for you to get started with.

WIP
---

- We are writing the paper on WORC.
- We are expanding the example experiments of WORC with open source
datasets.

Contact
-------

Expand Down
14 changes: 11 additions & 3 deletions WORC/IOparser/config_io_classifier.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,9 @@ def load_config(config_file_path):
settings_dict['General']['tempsave'] =\
settings['General'].getboolean('tempsave')

settings_dict['General']['DoTestNRSNEns'] =\
settings['General'].getboolean('DoTestNRSNEns')

# Feature Scaling
settings_dict['FeatureScaling']['scale_features'] =\
settings['FeatureScaling'].getboolean('scale_features')
Expand Down Expand Up @@ -154,6 +157,9 @@ def load_config(config_file_path):
[int(str(item).strip()) for item in
settings['Imputation']['n_neighbors'].split(',')]

settings_dict['Imputation']['skipallNaN'] =\
[str(settings['Imputation']['skipallNaN'])]

# OneHotEncoding
settings_dict['OneHotEncoding']['Use'] =\
[str(item).strip() for item in
Expand Down Expand Up @@ -392,11 +398,13 @@ def load_config(config_file_path):
settings['HyperOptimization'].getint('maxlen')
settings_dict['HyperOptimization']['ranking_score'] = \
str(settings['HyperOptimization']['ranking_score'])
settings_dict['HyperOptimization']['refit_workflows'] =\
settings['HyperOptimization'].getboolean('refit_workflows')
settings_dict['HyperOptimization']['refit_training_workflows'] =\
settings['HyperOptimization'].getboolean('refit_training_workflows')
settings_dict['HyperOptimization']['refit_validation_workflows'] =\
settings['HyperOptimization'].getboolean('refit_validation_workflows')
settings_dict['HyperOptimization']['memory'] = \
str(settings['HyperOptimization']['memory'])

# Settings for SMAC
settings_dict['SMAC']['use'] =\
settings['SMAC'].getboolean('use')
Expand Down
30 changes: 17 additions & 13 deletions WORC/WORC.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,7 @@ def defaultconfig(self):
config['General']['AssumeSameImageAndMaskMetadata'] = 'False'
config['General']['ComBat'] = 'False'
config['General']['Fingerprint'] = 'True'
config['General']['DoTestNRSNEns'] = 'False'

# Fingerprinting
config['Fingerprinting'] = dict()
Expand Down Expand Up @@ -352,6 +353,7 @@ def defaultconfig(self):
config['Imputation']['use'] = 'True'
config['Imputation']['strategy'] = 'mean, median, most_frequent, constant, knn'
config['Imputation']['n_neighbors'] = '5, 5'
config['Imputation']['skipallNaN'] = 'True'

# Feature scaling options
config['FeatureScaling'] = dict()
Expand Down Expand Up @@ -496,7 +498,8 @@ def defaultconfig(self):
config['HyperOptimization']['maxlen'] = '100'
config['HyperOptimization']['ranking_score'] = 'test_score'
config['HyperOptimization']['memory'] = '3G'
config['HyperOptimization']['refit_workflows'] = 'False'
config['HyperOptimization']['refit_training_workflows'] = 'False'
config['HyperOptimization']['refit_validation_workflows'] = 'False'

# SMAC options
config['SMAC'] = dict()
Expand Down Expand Up @@ -662,9 +665,9 @@ def build_training(self):

# Optional SMAC output
if self.configs[0]['SMAC']['use'] == 'True':
self.sink_smac_results = self.network.create_sink('JsonFile', id='smac_results',
step_id='general_sinks')
self.sink_smac_results.input = self.classify.outputs['smac_results']
self.sink_smac_results = self.network.create_sink('JsonFile', id='smac_results',
step_id='general_sinks')
self.sink_smac_results.input = self.classify.outputs['smac_results']

if self.TrainTest:
# FIXME: the naming here is ugly
Expand Down Expand Up @@ -942,9 +945,15 @@ def build_training(self):
elif self.segmode == 'Register':
# ---------------------------------------------
# Registration nodes: Align segmentation of first
# modality to others using registration ith Elastix
# modality to others using registration with Elastix
self.add_elastix(label, nmod)

# Add to fingerprinting if required
if self.configs[0]['General']['Fingerprint'] == 'True':
# Since there are no segmentations yet of this modality, just use those of the first, provided modality
self.links_fingerprinting[f'{label}_segmentations'] = self.network.create_link(self.converters_seg_train[self.modlabels[0]].outputs['image'], self.node_fingerprinters[label].inputs['segmentations_train'])
self.links_fingerprinting[f'{label}_segmentations'].collapse = 'train'

# -----------------------------------------------------
# Optionally, add segmentix, the in-house segmentation
# processor of WORC
Expand Down Expand Up @@ -1077,7 +1086,7 @@ def build_training(self):
self.links_fingerprinting['classification'].collapse = 'train'

else:
raise WORCexceptions.WORCIOError("Please provide labels.")
raise WORCexceptions.WORCIOError("Please provide labels for training, i.e., WORC.labels_train or SimpleWORC.labels_from_this_file.")
else:
raise WORCexceptions.WORCIOError("Please provide either images or features.")

Expand Down Expand Up @@ -1401,7 +1410,7 @@ def add_elastix(self, label, nmod):
self.sources_segmentations_train[label] =\
self.network.create_source('ITKImageFile',
id='segmentations_train_' + label,
node_group='input',
node_group='train',
step_id='train_sources')

self.converters_seg_train[label] =\
Expand All @@ -1418,7 +1427,7 @@ def add_elastix(self, label, nmod):
self.sources_segmentations_test[label] =\
self.network.create_source('ITKImageFile',
id='segmentations_test_' + label,
node_group='input',
node_group='test',
step_id='test_sources')

self.converters_seg_test[label] =\
Expand Down Expand Up @@ -1641,11 +1650,6 @@ def add_elastix(self, label, nmod):
self.calcfeatures_test[label][i_node].inputs['segmentation'] =\
self.transformix_seg_nodes_test[label].outputs['image']

# Add to fingerprinting if required
if self.configs[0]['General']['Fingerprint'] == 'True':
self.links_fingerprinting[f'{label}_segmentations'] = self.network.create_link(self.transformix_seg_nodes_train[label].outputs['image'], self.node_fingerprinters[label].inputs['segmentations_train'])
self.links_fingerprinting[f'{label}_segmentations'].collapse = 'train'

# Save outputfor the training set
self.sinks_transformations_train[label] =\
self.network.create_sink('ElastixTransformFile',
Expand Down
Loading

0 comments on commit 4801a6f

Please sign in to comment.