Merge pull request #74 from MStarmans91/development

Release version 3.6.1
MStarmans91 · Feb 15, 2023 · 4801a6f · 4801a6f
2 parents 0658775 + b1284b8
commit 4801a6f
Show file tree

Hide file tree

Showing 136 changed files with 4,138 additions and 2,182 deletions.
diff --git a/.gitignore b/.gitignore
@@ -127,10 +127,14 @@ dmypy.json
 # Pyre type checker
 .pyre/
 
+# Visual studio code config
+.vscode
+
 # Example data
 WORC/exampledata/*.hdf5
 WORC/external/*
 WORC/exampledata/ICCvalues.csv
 WORC/tests/*.png
 WORC/tests/*.mat
+WORC/tests/performance*.json
 WORC/tests/WORC_Example_STWStrategyHN_*
diff --git a/CHANGELOG b/CHANGELOG
@@ -6,6 +6,38 @@ All notable changes to this project will be documented in this file.
 The format is based on `Keep a Changelog <http://keepachangelog.com/>`_
 and this project adheres to `Semantic Versioning <http://semver.org/>`_
 
+3.6.1 - 2023-02-15
+------------------
+
+Fixed
+~~~~~
+- Bug when using elastix, dimensionality got wrong name in fastr network.
+- Bug in BasicWORC when starting from features
+- For statistical test thresholding, if refit does not work during ensembling, skip this method
+  instead of returning NaN.
+- Createfixedsplits function was outdated, updates to newest conventions and added
+  to documentation.
+- When using Dummy's, segmentix now still copies metadata information from image
+  to segmentation when required.
+
+Changed
+~~~~~~~
+- When part of fiting and scoring a workflow fails, during optimization return NaN as performance,
+  but during refitting skip that step. During optimization, if we skip it, the relation between
+  the hyperparameter and performance gets disturbed. During refitting, we need to have a model,
+  so best option is to skip the step. Previously, there was only skipping.
+- Set default of XGB estimator parallelization to single job.
+
+Added
+~~~~~
+- Documentation updates
+- Option to save the workflows trained on the train-validation training datasets, besides the option to save workflows
+  trained on the full training dataset. Not an option for SMAC due to implementation of SMAC.
+- Validate function for fastr HDF5 datatype, as previously fastr deemed empty hdf5 types as valid.
+- Option to eliminate features which are all NaN during imputation.
+- Preflightcheck on number of image types provided.
+
+
 3.6.0 - 2022-04-05
 ------------------
 

diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-# WORC v3.6.0
+# WORC v3.6.1
 ## Workflow for Optimal Radiomics Classification
 
 ## Information
@@ -70,14 +70,12 @@ The official documentation can be found at [https://worc.readthedocs.io](https:/
 The publicly released WORC database is described in the following paper:
 
 ```bibtex
-@article {Starmans2021.08.19.21262238,
+@article {Starmans2021WORCDatabase,
 	author = {Starmans, Martijn P.A. and Timbergen, Milea J.M. and Vos, Melissa and Padmos, Guillaume A. and Gr{\"u}nhagen, Dirk J. and Verhoef, Cornelis and Sleijfer, Stefan and van Leenders, Geert J.L.H. and Buisman, Florian E. and Willemssen, Francois E.J.A. and Koerkamp, Bas Groot and Angus, Lindsay and van der Veldt, Astrid A.M. and Rajicic, Ana and Odink, Arlette E. and Renckens, Michel and Doukas, Michail and de Man, Rob A. and IJzermans, Jan N.M. and Miclea, Razvan L. and Vermeulen, Peter B. and Thomeer, Maarten G. and Visser, Jacob J. and Niessen, Wiro J. and Klein, Stefan},
 	title = {The WORC database: MRI and CT scans, segmentations, and clinical labels for 930 patients from six radiomics studies},
 	elocation-id = {2021.08.19.21262238},
 	year = {2021},
 	doi = {10.1101/2021.08.19.21262238},
-	publisher = {Cold Spring Harbor Laboratory Press},
-	abstract = {The WORC database consists in total of 930 patients composed of six datasets gathered at the Erasmus MC, consisting of patients with: 1) well-differentiated liposarcoma or lipoma (115 patients); 2) desmoid-type fibromatosis or extremity soft-tissue sarcomas (203 patients); 3) primary solid liver tumors, either malignant (hepatocellular carcinoma or intrahepatic cholangiocarcinoma) or benign (hepatocellular adenoma or focal nodular hyperplasia) (186 patients); 4) gastrointestinal stromal tumors (GISTs) and intra-abdominal gastrointestinal tumors radiologically resembling GISTs (246 patients); 5) colorectal liver metastases (77 patients); and 6) lung metastases of metastatic melanoma (103 patients). For each patient, either a magnetic resonance imaging (MRI) or computed tomography (CT) scan, collected from routine clinical care, one or multiple (semi-)automatic lesion segmentations, and ground truth labels from a gold standard (e.g., pathologically proven) are available. All datasets are multicenter imaging datasets, as patients referred to our institute often received imaging at their referring hospital. The dataset can be used to validate or develop radiomics methods, i.e., using machine or deep learning to relate the visual appearance to the ground truth labels, and automatic segmentation methods. See also the research article related to this dataset: Starmans et al., Reproducible radiomics through automated machine learning validated on twelve clinical applications, Submitted.View this table:Competing Interest StatementWiro J. Niessen is founder, scientific lead, and shareholder of Quantib BV. Jacob J. Visser is a medical advisor at Contextflow. Astrid A. M. van der Veldt is a consultant (fees paid to the institute) at BMS, Merck, MSD, Sanofi, Eisai, Pfizer, Roche, Novartis, Pierre Fabre and Ipsen. The other authors do not declare any conflicts of interest.Funding StatementThis research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The study protocol for the collection of the WORC database conformed to the ethical guidelines of the 1975 Declaration of Helsinki. Approval by the local institutional review board of the Erasmus MC (Rotterdam, the Netherlands) was obtained for collection of the WORC database (MEC-2020-0961), and separately for the six included studies (Lipo: MEC-2016-339, Desmoid: MEC-2016-339, Liver: MEC-2017-1035, GIST: MEC-2017-1187, CRLM: MEC-2017-479, Melanoma: MEC-2019-0693). The need for informed consent was waived due to the use of anonymized, retrospective data.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe data referred to in this manuscript is publicly available at https://xnat.bmia.nl/data/projects/worc. The code to download the data and reproduce the experiments from the radiomics study in which this data was presented can be found at https://github.com/MStarmans91/WORCDatabase. https://xnat.bmia.nl/data/projects/worc https://github.com/MStarmans91/WORCDatabase},
 	URL = {https://www.medrxiv.org/content/early/2021/08/25/2021.08.19.21262238},
 	eprint = {https://www.medrxiv.org/content/early/2021/08/25/2021.08.19.21262238.full.pdf},
 	journal = {medRxiv}

diff --git a/README.rst b/README.rst
@@ -1,4 +1,4 @@
-WORC v3.6.0
+WORC v3.6.1
 ===========
 
 Workflow for Optimal Radiomics Classification
@@ -16,8 +16,8 @@ Information
 Introduction
 ============
 
-WORC is an open-source python package for the easy execution of full
-radiomics pipelines.
+WORC is an open-source python package for the easy execution and fully
+automatic construction and optimization of radiomics workflows.
 
 We aim to establish a general radiomics platform supporting easy
 integration of other tools. With our modular build and support of
@@ -33,9 +33,28 @@ License
 This package is covered by the open source `APACHE 2.0
 License <APACHE-LICENSE-2.0>`__.
 
-When using WORC, please cite this repository as following:
-
-``Martijn P.A. Starmans, Sebastian R. van der Voort, Thomas Phil and Stefan Klein. Workflow for Optimal Radiomics Classification (WORC). Zenodo (2018). Available from:  https://github.com/MStarmans91/WORC. DOI: http://doi.org/10.5281/zenodo.3840534.``
+When using WORC, please cite this repository and the paper describing
+WORC as as follows:
+
+.. code:: bibtex
+
+    @article{starmans2021reproducible,
+       title={Reproducible radiomics through automated machine learning validated on twelve clinical applications}, 
+       author={Martijn P. A. Starmans and Sebastian R. van der Voort and Thomas Phil and Milea J. M. Timbergen and Melissa Vos and Guillaume A. Padmos and Wouter Kessels and David    Hanff and Dirk J. Grunhagen and Cornelis Verhoef and Stefan Sleijfer and Martin J. van den Bent and Marion Smits and Roy S. Dwarkasing and Christopher J. Els and Federico Fiduzi and Geert J. L. H. van Leenders and Anela Blazevic and Johannes Hofland and Tessa Brabander and Renza A. H. van Gils and Gaston J. H. Franssen and Richard A. Feelders and Wouter W. de Herder and Florian E. Buisman and Francois E. J. A. Willemssen and Bas Groot Koerkamp and Lindsay Angus and Astrid A. M. van der Veldt and Ana Rajicic and Arlette E. Odink and Mitchell Deen and Jose M. Castillo T. and Jifke Veenland and Ivo Schoots and Michel Renckens and Michail Doukas and Rob A. de Man and Jan N. M. IJzermans and Razvan L. Miclea and Peter B. Vermeulen and Esther E. Bron and Maarten G. Thomeer and Jacob J. Visser and Wiro J. Niessen and Stefan Klein},
+       year={2021},
+       eprint={2108.08618},
+       archivePrefix={arXiv},
+       primaryClass={eess.IV}
+    }
+
+    @software{starmans2018worc,
+      author       = {Martijn P. A. Starmans and Thomas Phil and Sebastian R. van der Voort and Stefan Klein},
+      title        = {Workflow for Optimal Radiomics Classification (WORC)},
+      year         = {2018},
+      publisher    = {Zenodo},
+      doi          = {10.5281/zenodo.3840534},
+      url          = {https://github.com/MStarmans91/WORC}
+    }
 
 For the DOI, visit |image4|.
 
@@ -48,14 +67,32 @@ occur. Please contact us through the channels below if you find any and
 we will try to fix them as soon as possible, or create an issue on this
 Github.
 
-Tutorial and Documentation
---------------------------
+Tutorial, documentation and dataset
+-----------------------------------
 
-The WORC tutorial is hosted in a `separate
-repository <https://github.com/MStarmans91/WORCTutorial>`__.
+The WORC tutorial is hosted at
+https://github.com/MStarmans91/WORCTutorial.
 
 The official documentation can be found at https://worc.readthedocs.io.
 
+The publicly released WORC database is described in the following paper:
+
+.. code:: bibtex
+
+    @article {Starmans2021WORCDatabase,
+        author = {Starmans, Martijn P.A. and Timbergen, Milea J.M. and Vos, Melissa and Padmos, Guillaume A. and Gr{\"u}nhagen, Dirk J. and Verhoef, Cornelis and Sleijfer, Stefan and van Leenders, Geert J.L.H. and Buisman, Florian E. and Willemssen, Francois E.J.A. and Koerkamp, Bas Groot and Angus, Lindsay and van der Veldt, Astrid A.M. and Rajicic, Ana and Odink, Arlette E. and Renckens, Michel and Doukas, Michail and de Man, Rob A. and IJzermans, Jan N.M. and Miclea, Razvan L. and Vermeulen, Peter B. and Thomeer, Maarten G. and Visser, Jacob J. and Niessen, Wiro J. and Klein, Stefan},
+        title = {The WORC database: MRI and CT scans, segmentations, and clinical labels for 930 patients from six radiomics studies},
+        elocation-id = {2021.08.19.21262238},
+        year = {2021},
+        doi = {10.1101/2021.08.19.21262238},
+        URL = {https://www.medrxiv.org/content/early/2021/08/25/2021.08.19.21262238},
+        eprint = {https://www.medrxiv.org/content/early/2021/08/25/2021.08.19.21262238.full.pdf},
+        journal = {medRxiv}
+    }
+
+The code to download the WORC database and reproduce our experiments can
+be found at https://github.com/MStarmans91/WORCDatabase.
+
 Installation
 ------------
 
@@ -109,13 +146,6 @@ Tutorial <https://github.com/MStarmans91/WORCTutorial>`__. Besides a
 Jupyter notebook with instructions, we provide there also an example
 script for you to get started with.
 
-WIP
----
-
--  We are writing the paper on WORC.
--  We are expanding the example experiments of WORC with open source
-   datasets.
-
 Contact
 -------
 

diff --git a/WORC/IOparser/config_io_classifier.py b/WORC/IOparser/config_io_classifier.py
@@ -59,6 +59,9 @@ def load_config(config_file_path):
     settings_dict['General']['tempsave'] =\
         settings['General'].getboolean('tempsave')
 
+    settings_dict['General']['DoTestNRSNEns'] =\
+        settings['General'].getboolean('DoTestNRSNEns')
+
     # Feature Scaling
     settings_dict['FeatureScaling']['scale_features'] =\
         settings['FeatureScaling'].getboolean('scale_features')
@@ -154,6 +157,9 @@ def load_config(config_file_path):
         [int(str(item).strip()) for item in
          settings['Imputation']['n_neighbors'].split(',')]
 
+    settings_dict['Imputation']['skipallNaN'] =\
+        [str(settings['Imputation']['skipallNaN'])]        
+
     # OneHotEncoding
     settings_dict['OneHotEncoding']['Use'] =\
         [str(item).strip() for item in
@@ -392,11 +398,13 @@ def load_config(config_file_path):
         settings['HyperOptimization'].getint('maxlen')
     settings_dict['HyperOptimization']['ranking_score'] = \
         str(settings['HyperOptimization']['ranking_score'])
-    settings_dict['HyperOptimization']['refit_workflows'] =\
-        settings['HyperOptimization'].getboolean('refit_workflows')
+    settings_dict['HyperOptimization']['refit_training_workflows'] =\
+        settings['HyperOptimization'].getboolean('refit_training_workflows')
+    settings_dict['HyperOptimization']['refit_validation_workflows'] =\
+        settings['HyperOptimization'].getboolean('refit_validation_workflows')
     settings_dict['HyperOptimization']['memory'] = \
         str(settings['HyperOptimization']['memory'])
-
+        
     # Settings for SMAC
     settings_dict['SMAC']['use'] =\
         settings['SMAC'].getboolean('use')

diff --git a/WORC/WORC.py b/WORC/WORC.py
@@ -200,6 +200,7 @@ def defaultconfig(self):
         config['General']['AssumeSameImageAndMaskMetadata'] = 'False'
         config['General']['ComBat'] = 'False'
         config['General']['Fingerprint'] = 'True'
+        config['General']['DoTestNRSNEns'] = 'False'
 
         # Fingerprinting
         config['Fingerprinting'] = dict()
@@ -352,6 +353,7 @@ def defaultconfig(self):
         config['Imputation']['use'] = 'True'
         config['Imputation']['strategy'] = 'mean, median, most_frequent, constant, knn'
         config['Imputation']['n_neighbors'] = '5, 5'
+        config['Imputation']['skipallNaN'] = 'True'
 
         # Feature scaling options
         config['FeatureScaling'] = dict()
@@ -496,7 +498,8 @@ def defaultconfig(self):
         config['HyperOptimization']['maxlen'] = '100'
         config['HyperOptimization']['ranking_score'] = 'test_score'
         config['HyperOptimization']['memory'] = '3G'
-        config['HyperOptimization']['refit_workflows'] = 'False'
+        config['HyperOptimization']['refit_training_workflows'] = 'False'
+        config['HyperOptimization']['refit_validation_workflows'] = 'False'
 
         # SMAC options
         config['SMAC'] = dict()
@@ -662,9 +665,9 @@ def build_training(self):
 
                 # Optional SMAC output
                 if self.configs[0]['SMAC']['use'] == 'True':
-                   self.sink_smac_results = self.network.create_sink('JsonFile', id='smac_results',
-                                                                     step_id='general_sinks')
-                   self.sink_smac_results.input = self.classify.outputs['smac_results']
+                    self.sink_smac_results = self.network.create_sink('JsonFile', id='smac_results',
+                                                                      step_id='general_sinks')
+                    self.sink_smac_results.input = self.classify.outputs['smac_results']
 
                 if self.TrainTest:
                     # FIXME: the naming here is ugly
@@ -942,9 +945,15 @@ def build_training(self):
                         elif self.segmode == 'Register':
                             # ---------------------------------------------
                             # Registration nodes: Align segmentation of first
-                            # modality to others using registration ith Elastix
+                            # modality to others using registration with Elastix
                             self.add_elastix(label, nmod)
 
+                            # Add to fingerprinting if required
+                            if self.configs[0]['General']['Fingerprint'] == 'True':
+                                # Since there are no segmentations yet of this modality, just use those of the first, provided modality
+                                self.links_fingerprinting[f'{label}_segmentations'] = self.network.create_link(self.converters_seg_train[self.modlabels[0]].outputs['image'], self.node_fingerprinters[label].inputs['segmentations_train'])
+                                self.links_fingerprinting[f'{label}_segmentations'].collapse = 'train'
+
                         # -----------------------------------------------------
                         # Optionally, add segmentix, the in-house segmentation
                         # processor of WORC
@@ -1077,7 +1086,7 @@ def build_training(self):
                                 self.links_fingerprinting['classification'].collapse = 'train'
 
             else:
-                raise WORCexceptions.WORCIOError("Please provide labels.")
+                raise WORCexceptions.WORCIOError("Please provide labels for training, i.e., WORC.labels_train or SimpleWORC.labels_from_this_file.")
         else:
             raise WORCexceptions.WORCIOError("Please provide either images or features.")
 
@@ -1401,7 +1410,7 @@ def add_elastix(self, label, nmod):
             self.sources_segmentations_train[label] =\
                 self.network.create_source('ITKImageFile',
                                            id='segmentations_train_' + label,
-                                           node_group='input',
+                                           node_group='train',
                                            step_id='train_sources')
 
             self.converters_seg_train[label] =\
@@ -1418,7 +1427,7 @@ def add_elastix(self, label, nmod):
                 self.sources_segmentations_test[label] =\
                     self.network.create_source('ITKImageFile',
                                                id='segmentations_test_' + label,
-                                               node_group='input',
+                                               node_group='test',
                                                step_id='test_sources')
 
                 self.converters_seg_test[label] =\
@@ -1641,11 +1650,6 @@ def add_elastix(self, label, nmod):
                         self.calcfeatures_test[label][i_node].inputs['segmentation'] =\
                             self.transformix_seg_nodes_test[label].outputs['image']
 
-                # Add to fingerprinting if required
-                if self.configs[0]['General']['Fingerprint'] == 'True':
-                    self.links_fingerprinting[f'{label}_segmentations'] = self.network.create_link(self.transformix_seg_nodes_train[label].outputs['image'], self.node_fingerprinters[label].inputs['segmentations_train'])
-                    self.links_fingerprinting[f'{label}_segmentations'].collapse = 'train'
-
             # Save outputfor the training set
             self.sinks_transformations_train[label] =\
                 self.network.create_sink('ElastixTransformFile',