diff --git a/CHANGELOG b/CHANGELOG
index 895652c7..9f1e290e 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -6,6 +6,40 @@ All notable changes to this project will be documented in this file.
The format is based on `Keep a Changelog `_
and this project adheres to `Semantic Versioning `_
+3.6.3 - 2023-08-15
+------------------
+
+Fixed
+~~~~~
+- Bug in computing confidence intervals when performance was always the same,
+ results now in a np.nan confidence interval.
+- Error catched and message added when you provide images and/or features in the
+ test set, but not labels.
+- SimpleWORC and BasicWORC now detect whether user has provided a separate training
+ and test set and thus bootstrapping should be used.
+- Bug in PREDICT was fixed that mixed up the mode in shape feature extraction (2D / 2.5D)
+- Bug in performance calculation of multiclass classification.
+- Bugs in statistical feature testing.
+
+Changed
+~~~~~~~
+- Statistical test feature selection before PCA: otherwise when combined,
+ it will select PCA components, not features.
+
+Added
+~~~~~
+- Histogram equalization to preprocessing.
+- Recursive feature elimination (RFE) feature selection.
+- Workflow to provide evaluate a trained model on new data
+- Option to set a fixed random seed in the hyperoptimization
+ for reproducibility.
+- Various FAQs
+- Updated user manual with more extensive debugging guide.
+- Thoroughly updated user manual documentation on different data flows, e.g.
+ train-test setups, multiple segmentations per patient.
+- function in SimpleWORC to add parameter files for elastix.
+
+
3.6.2 - 2023-03-14
------------------
diff --git a/README.md b/README.md
index f0023f3f..d7ad8dcd 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-# WORC v3.6.2
+# WORC v3.6.3
## Workflow for Optimal Radiomics Classification
## Information
@@ -37,12 +37,12 @@ When using WORC, please cite this repository and the paper describing WORC as as
```bibtex
@article{starmans2021reproducible,
- title={Reproducible radiomics through automated machine learning validated on twelve clinical applications},
- author={Martijn P. A. Starmans and Sebastian R. van der Voort and Thomas Phil and Milea J. M. Timbergen and Melissa Vos and Guillaume A. Padmos and Wouter Kessels and David Hanff and Dirk J. Grunhagen and Cornelis Verhoef and Stefan Sleijfer and Martin J. van den Bent and Marion Smits and Roy S. Dwarkasing and Christopher J. Els and Federico Fiduzi and Geert J. L. H. van Leenders and Anela Blazevic and Johannes Hofland and Tessa Brabander and Renza A. H. van Gils and Gaston J. H. Franssen and Richard A. Feelders and Wouter W. de Herder and Florian E. Buisman and Francois E. J. A. Willemssen and Bas Groot Koerkamp and Lindsay Angus and Astrid A. M. van der Veldt and Ana Rajicic and Arlette E. Odink and Mitchell Deen and Jose M. Castillo T. and Jifke Veenland and Ivo Schoots and Michel Renckens and Michail Doukas and Rob A. de Man and Jan N. M. IJzermans and Razvan L. Miclea and Peter B. Vermeulen and Esther E. Bron and Maarten G. Thomeer and Jacob J. Visser and Wiro J. Niessen and Stefan Klein},
- year={2021},
- eprint={2108.08618},
- archivePrefix={arXiv},
- primaryClass={eess.IV}
+ title = {Reproducible radiomics through automated machine learning validated on twelve clinical applications},
+ author = {Martijn P. A. Starmans and Sebastian R. van der Voort and Thomas Phil and Milea J. M. Timbergen and Melissa Vos and Guillaume A. Padmos and Wouter Kessels and David Hanff and Dirk J. Grunhagen and Cornelis Verhoef and Stefan Sleijfer and Martin J. van den Bent and Marion Smits and Roy S. Dwarkasing and Christopher J. Els and Federico Fiduzi and Geert J. L. H. van Leenders and Anela Blazevic and Johannes Hofland and Tessa Brabander and Renza A. H. van Gils and Gaston J. H. Franssen and Richard A. Feelders and Wouter W. de Herder and Florian E. Buisman and Francois E. J. A. Willemssen and Bas Groot Koerkamp and Lindsay Angus and Astrid A. M. van der Veldt and Ana Rajicic and Arlette E. Odink and Mitchell Deen and Jose M. Castillo T. and Jifke Veenland and Ivo Schoots and Michel Renckens and Michail Doukas and Rob A. de Man and Jan N. M. IJzermans and Razvan L. Miclea and Peter B. Vermeulen and Esther E. Bron and Maarten G. Thomeer and Jacob J. Visser and Wiro J. Niessen and Stefan Klein},
+ year = {2021},
+ eprint = {2108.08618},
+ archivePrefix = {arXiv},
+ primaryClass = {eess.IV}
}
@software{starmans2018worc,
@@ -140,9 +140,9 @@ Make sure you add the executable to the PATH when prompted.
### Elastix
Image registration is included in WORC through [elastix and transformix](http://elastix.isi.uu.nl/).
In order to use elastix, please download the binaries and place them in your
-fastr.config.mounts['apps'] path. Check the elastix tool description for the correct
-subdirectory structure. For example, on Linux, the binaries and libraries should be in "../apps/elastix/4.8/install/" and
-"../apps/elastix/4.8/install/lib" respectively.
+``fastr.config.mounts['apps']`` path. Check the elastix tool description for the correct
+subdirectory structure. For example, on Linux, the binaries and libraries should be in ``"../apps/elastix/4.8/install/"`` and
+``"../apps/elastix/4.8/install/lib"`` respectively.
Note: optionally, you can tell WORC to copy the metadata from the image file
to the segmentation file before applying the deformation field. This requires
diff --git a/README.rst b/README.rst
index 20115d67..fcc82303 100644
--- a/README.rst
+++ b/README.rst
@@ -1,4 +1,4 @@
-WORC v3.6.2
+WORC v3.6.3
===========
Workflow for Optimal Radiomics Classification
@@ -7,11 +7,12 @@ Workflow for Optimal Radiomics Classification
Information
-----------
-+---------------------+---------------------+---------------------+---------------+
-| Unit test | Documentation | PyPi | Citing WORC |
-+=====================+=====================+=====================+===============+
-| |image0| | |image1| | |image2| | |image3| |
-+---------------------+---------------------+---------------------+---------------+
++-------------------+------------------+------------------+------------+
+| Unit test | Documentation | PyPi | Citing |
+| | | | WORC |
++===================+==================+==================+============+
+| |image1| | |image2| | |image3| | |image4| |
++-------------------+------------------+------------------+------------+
Introduction
============
@@ -38,25 +39,25 @@ WORC as as follows:
.. code:: bibtex
- @article{starmans2021reproducible,
- title={Reproducible radiomics through automated machine learning validated on twelve clinical applications},
- author={Martijn P. A. Starmans and Sebastian R. van der Voort and Thomas Phil and Milea J. M. Timbergen and Melissa Vos and Guillaume A. Padmos and Wouter Kessels and David Hanff and Dirk J. Grunhagen and Cornelis Verhoef and Stefan Sleijfer and Martin J. van den Bent and Marion Smits and Roy S. Dwarkasing and Christopher J. Els and Federico Fiduzi and Geert J. L. H. van Leenders and Anela Blazevic and Johannes Hofland and Tessa Brabander and Renza A. H. van Gils and Gaston J. H. Franssen and Richard A. Feelders and Wouter W. de Herder and Florian E. Buisman and Francois E. J. A. Willemssen and Bas Groot Koerkamp and Lindsay Angus and Astrid A. M. van der Veldt and Ana Rajicic and Arlette E. Odink and Mitchell Deen and Jose M. Castillo T. and Jifke Veenland and Ivo Schoots and Michel Renckens and Michail Doukas and Rob A. de Man and Jan N. M. IJzermans and Razvan L. Miclea and Peter B. Vermeulen and Esther E. Bron and Maarten G. Thomeer and Jacob J. Visser and Wiro J. Niessen and Stefan Klein},
- year={2021},
- eprint={2108.08618},
- archivePrefix={arXiv},
- primaryClass={eess.IV}
- }
-
- @software{starmans2018worc,
- author = {Martijn P. A. Starmans and Thomas Phil and Sebastian R. van der Voort and Stefan Klein},
- title = {Workflow for Optimal Radiomics Classification (WORC)},
- year = {2018},
- publisher = {Zenodo},
- doi = {10.5281/zenodo.3840534},
- url = {https://github.com/MStarmans91/WORC}
- }
-
-For the DOI, visit |image4|.
+ @article{starmans2021reproducible,
+ title = {Reproducible radiomics through automated machine learning validated on twelve clinical applications},
+ author = {Martijn P. A. Starmans and Sebastian R. van der Voort and Thomas Phil and Milea J. M. Timbergen and Melissa Vos and Guillaume A. Padmos and Wouter Kessels and David Hanff and Dirk J. Grunhagen and Cornelis Verhoef and Stefan Sleijfer and Martin J. van den Bent and Marion Smits and Roy S. Dwarkasing and Christopher J. Els and Federico Fiduzi and Geert J. L. H. van Leenders and Anela Blazevic and Johannes Hofland and Tessa Brabander and Renza A. H. van Gils and Gaston J. H. Franssen and Richard A. Feelders and Wouter W. de Herder and Florian E. Buisman and Francois E. J. A. Willemssen and Bas Groot Koerkamp and Lindsay Angus and Astrid A. M. van der Veldt and Ana Rajicic and Arlette E. Odink and Mitchell Deen and Jose M. Castillo T. and Jifke Veenland and Ivo Schoots and Michel Renckens and Michail Doukas and Rob A. de Man and Jan N. M. IJzermans and Razvan L. Miclea and Peter B. Vermeulen and Esther E. Bron and Maarten G. Thomeer and Jacob J. Visser and Wiro J. Niessen and Stefan Klein},
+ year = {2021},
+ eprint = {2108.08618},
+ archivePrefix = {arXiv},
+ primaryClass = {eess.IV}
+ }
+
+ @software{starmans2018worc,
+ author = {Martijn P. A. Starmans and Thomas Phil and Sebastian R. van der Voort and Stefan Klein},
+ title = {Workflow for Optimal Radiomics Classification (WORC)},
+ year = {2018},
+ publisher = {Zenodo},
+ doi = {10.5281/zenodo.3840534},
+ url = {https://github.com/MStarmans91/WORC}
+ }
+
+For the DOI, visit |image5|.
Disclaimer
----------
@@ -79,16 +80,16 @@ The publicly released WORC database is described in the following paper:
.. code:: bibtex
- @article {Starmans2021WORCDatabase,
- author = {Starmans, Martijn P.A. and Timbergen, Milea J.M. and Vos, Melissa and Padmos, Guillaume A. and Gr{\"u}nhagen, Dirk J. and Verhoef, Cornelis and Sleijfer, Stefan and van Leenders, Geert J.L.H. and Buisman, Florian E. and Willemssen, Francois E.J.A. and Koerkamp, Bas Groot and Angus, Lindsay and van der Veldt, Astrid A.M. and Rajicic, Ana and Odink, Arlette E. and Renckens, Michel and Doukas, Michail and de Man, Rob A. and IJzermans, Jan N.M. and Miclea, Razvan L. and Vermeulen, Peter B. and Thomeer, Maarten G. and Visser, Jacob J. and Niessen, Wiro J. and Klein, Stefan},
- title = {The WORC database: MRI and CT scans, segmentations, and clinical labels for 930 patients from six radiomics studies},
- elocation-id = {2021.08.19.21262238},
- year = {2021},
- doi = {10.1101/2021.08.19.21262238},
- URL = {https://www.medrxiv.org/content/early/2021/08/25/2021.08.19.21262238},
- eprint = {https://www.medrxiv.org/content/early/2021/08/25/2021.08.19.21262238.full.pdf},
- journal = {medRxiv}
- }
+ @article {Starmans2021WORCDatabase,
+ author = {Starmans, Martijn P.A. and Timbergen, Milea J.M. and Vos, Melissa and Padmos, Guillaume A. and Gr{\"u}nhagen, Dirk J. and Verhoef, Cornelis and Sleijfer, Stefan and van Leenders, Geert J.L.H. and Buisman, Florian E. and Willemssen, Francois E.J.A. and Koerkamp, Bas Groot and Angus, Lindsay and van der Veldt, Astrid A.M. and Rajicic, Ana and Odink, Arlette E. and Renckens, Michel and Doukas, Michail and de Man, Rob A. and IJzermans, Jan N.M. and Miclea, Razvan L. and Vermeulen, Peter B. and Thomeer, Maarten G. and Visser, Jacob J. and Niessen, Wiro J. and Klein, Stefan},
+ title = {The WORC database: MRI and CT scans, segmentations, and clinical labels for 930 patients from six radiomics studies},
+ elocation-id = {2021.08.19.21262238},
+ year = {2021},
+ doi = {10.1101/2021.08.19.21262238},
+ URL = {https://www.medrxiv.org/content/early/2021/08/25/2021.08.19.21262238},
+ eprint = {https://www.medrxiv.org/content/early/2021/08/25/2021.08.19.21262238.full.pdf},
+ journal = {medRxiv}
+ }
The code to download the WORC database and reproduce our experiments can
be found at https://github.com/MStarmans91/WORCDatabase.
@@ -107,19 +108,19 @@ The package can be installed through pip:
::
- pip install WORC
+ pip install WORC
Alternatively, you can directly install WORC from this repository:
::
- python setup.py install
+ python setup.py install
Make sure you install the requirements first:
::
- pip install -r requirements.txt
+ pip install -r requirements.txt
3rd-party packages used in WORC:
--------------------------------
@@ -172,7 +173,7 @@ sure you install graphviz. On Ubuntu, simply run
::
- apt install graphiv
+ apt install graphiv
On Windows, follow the installation instructions provided on the
graphviz website. Make sure you add the executable to the PATH when
@@ -184,10 +185,10 @@ Elastix
Image registration is included in WORC through `elastix and
transformix `__. In order to use elastix,
please download the binaries and place them in your
-fastr.config.mounts['apps'] path. Check the elastix tool description for
-the correct subdirectory structure. For example, on Linux, the binaries
-and libraries should be in "../apps/elastix/4.8/install/" and
-"../apps/elastix/4.8/install/lib" respectively.
+``fastr.config.mounts['apps']`` path. Check the elastix tool description
+for the correct subdirectory structure. For example, on Linux, the
+binaries and libraries should be in ``"../apps/elastix/4.8/install/"``
+and ``"../apps/elastix/4.8/install/lib"`` respectively.
Note: optionally, you can tell WORC to copy the metadata from the image
file to the segmentation file before applying the deformation field.
@@ -205,13 +206,13 @@ to XNAT. We advise you to specify your account settings in a .netrc file
when using this feature for your own datasets, such that you do not need
to input them on every request.
-.. |image0| image:: https://github.com/MStarmans91/WORC/workflows/Unit%20test/badge.svg
+.. |image1| image:: https://github.com/MStarmans91/WORC/workflows/Unit%20test/badge.svg
:target: https://github.com/MStarmans91/WORC/actions?query=workflow%3A%22Unit+test%22
-.. |image1| image:: https://readthedocs.org/projects/worc/badge/?version=latest
+.. |image2| image:: https://readthedocs.org/projects/worc/badge/?version=latest
:target: https://worc.readthedocs.io/en/latest/?badge=latest
-.. |image2| image:: https://badge.fury.io/py/WORC.svg
+.. |image3| image:: https://badge.fury.io/py/WORC.svg
:target: https://badge.fury.io/py/WORC
-.. |image3| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.3840534.svg
- :target: https://zenodo.org/badge/latestdoi/92295542
.. |image4| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.3840534.svg
:target: https://zenodo.org/badge/latestdoi/92295542
+.. |image5| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.3840534.svg
+ :target: https://zenodo.org/badge/latestdoi/92295542
diff --git a/WORC/IOparser/config_WORC.py b/WORC/IOparser/config_WORC.py
index 6736d9b5..903af045 100644
--- a/WORC/IOparser/config_WORC.py
+++ b/WORC/IOparser/config_WORC.py
@@ -1,6 +1,6 @@
#!/usr/bin/env python
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2023 Biomedical Imaging Group Rotterdam, Departments of
# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
#
# Licensed under the Apache License, Version 2.0 (the "License");
@@ -37,14 +37,15 @@ def load_config(config_file_path):
settings = configparser.ConfigParser()
settings.read(config_file_path)
- settings_dict = {'ImageFeatures': dict(), 'General': dict(),
- 'SVMFeatures': dict()}
+ settings_dict = {'Preprocessing': dict(), 'ImageFeatures': dict(), 'General': dict(),
+ 'SVMFeatures': dict(), 'Ensemble': dict(),
+ 'Labels': dict()}
settings_dict['ImageFeatures']['image_type'] =\
str(settings['ImageFeatures']['image_type'])
settings_dict['General']['FeatureCalculators'] =\
- [str(item).strip() for item in
+ [str(item).strip('[]') for item in
settings['General']['FeatureCalculators'].split(',')]
settings_dict['General']['Preprocessing'] =\
@@ -55,5 +56,29 @@ def load_config(config_file_path):
settings_dict['General']['Segmentix'] =\
settings['General'].getboolean('Segmentix')
+
+ # Settings for ensembling
+ settings_dict['Ensemble']['Method'] =\
+ str(settings['Ensemble']['Method'])
+ settings_dict['Ensemble']['Size'] =\
+ int(settings['Ensemble']['Size'])
+
+ # Label settings
+ settings_dict['Labels']['label_names'] =\
+ [str(item).strip() for item in
+ settings['Labels']['label_names'].split(',')]
+
+ settings_dict['Labels']['modus'] =\
+ str(settings['Labels']['modus'])
+
+ # Whether to use some methods or not
+ settings_dict['General']['ComBat'] =\
+ str(settings['General']['ComBat'])
+
+ settings_dict['General']['Fingerprint'] =\
+ str(settings['General']['Fingerprint'])
+ settings_dict['Preprocessing']['Resampling'] =\
+ settings['Preprocessing'].getboolean('Resampling')
+
return settings_dict
diff --git a/WORC/IOparser/config_io_classifier.py b/WORC/IOparser/config_io_classifier.py
index a8c2d8c1..98724f58 100644
--- a/WORC/IOparser/config_io_classifier.py
+++ b/WORC/IOparser/config_io_classifier.py
@@ -1,6 +1,6 @@
#!/usr/bin/env python
-# Copyright 2016-2022 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2023 Biomedical Imaging Group Rotterdam, Departments of
# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
#
# Licensed under the Apache License, Version 2.0 (the "License");
@@ -93,6 +93,29 @@ def load_config(config_file_path):
[int(str(item).strip()) for item in
settings['Featsel']['SelectFromModel_n_trees'].split(',')]
+ settings_dict['Featsel']['RFE'] =\
+ settings['Featsel'].getfloat('RFE')
+
+ settings_dict['Featsel']['RFE_lasso_alpha'] =\
+ [float(str(item).strip()) for item in
+ settings['Featsel']['RFE_lasso_alpha'].split(',')]
+
+ settings_dict['Featsel']['RFE_estimator'] =\
+ [str(item).strip() for item in
+ settings['Featsel']['RFE_estimator'].split(',')]
+
+ settings_dict['Featsel']['RFE_n_trees'] =\
+ [int(str(item).strip()) for item in
+ settings['Featsel']['RFE_n_trees'].split(',')]
+
+ settings_dict['Featsel']['RFE_n_features_to_select'] =\
+ [float(str(item).strip()) for item in
+ settings['Featsel']['RFE_n_features_to_select'].split(',')]
+
+ settings_dict['Featsel']['RFE_step'] =\
+ [int(str(item).strip()) for item in
+ settings['Featsel']['RFE_step'].split(',')]
+
settings_dict['Featsel']['GroupwiseSearch'] =\
[str(item).strip() for item in
settings['Featsel']['GroupwiseSearch'].split(',')]
@@ -375,7 +398,7 @@ def load_config(config_file_path):
settings_dict['CrossValidation']['fixed_seed'] =\
settings['CrossValidation'].getboolean('fixed_seed')
- # Genetic settings
+ # Label settings
settings_dict['Labels']['label_names'] =\
[str(item).strip() for item in
settings['Labels']['label_names'].split(',')]
@@ -404,6 +427,8 @@ def load_config(config_file_path):
settings['HyperOptimization'].getboolean('refit_validation_workflows')
settings_dict['HyperOptimization']['memory'] = \
str(settings['HyperOptimization']['memory'])
+ settings_dict['HyperOptimization']['fix_random_seed'] = \
+ settings['HyperOptimization'].getboolean('fix_random_seed')
# Settings for SMAC
settings_dict['SMAC']['use'] =\
diff --git a/WORC/IOparser/config_io_combat.py b/WORC/IOparser/config_io_combat.py
index 398540bb..2508ddc8 100644
--- a/WORC/IOparser/config_io_combat.py
+++ b/WORC/IOparser/config_io_combat.py
@@ -1,6 +1,6 @@
#!/usr/bin/env python
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2023 Biomedical Imaging Group Rotterdam, Departments of
# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
#
# Licensed under the Apache License, Version 2.0 (the "License");
diff --git a/WORC/IOparser/config_preprocessing.py b/WORC/IOparser/config_preprocessing.py
index c7d0f504..338e0706 100644
--- a/WORC/IOparser/config_preprocessing.py
+++ b/WORC/IOparser/config_preprocessing.py
@@ -1,6 +1,6 @@
#!/usr/bin/env python
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2023 Biomedical Imaging Group Rotterdam, Departments of
# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
#
# Licensed under the Apache License, Version 2.0 (the "License");
@@ -63,6 +63,19 @@ def load_config(config_file_path):
if len(settings_dict['Preprocessing']['Clipping_Range']) != 2:
raise ae.WORCValueError(f"Clipping range should be two floats split by a comma, got {settings['Preprocessing']['Clipping_Range']}.")
+ # Histogram equalization
+ settings_dict['Preprocessing']['HistogramEqualization'] =\
+ settings['Preprocessing'].getboolean('HistogramEqualization')
+
+ settings_dict['Preprocessing']['HistogramEqualization_Alpha'] =\
+ float(settings['Preprocessing']['HistogramEqualization_Alpha'])
+
+ settings_dict['Preprocessing']['HistogramEqualization_Beta'] =\
+ float(settings['Preprocessing']['HistogramEqualization_Beta'])
+
+ settings_dict['Preprocessing']['HistogramEqualization_Radius'] =\
+ int(settings['Preprocessing']['HistogramEqualization_Radius'])
+
# Normalization
settings_dict['Preprocessing']['Normalize'] =\
settings['Preprocessing'].getboolean('Normalize')
diff --git a/WORC/WORC.py b/WORC/WORC.py
index 08f66fe1..21c0ac4a 100644
--- a/WORC/WORC.py
+++ b/WORC/WORC.py
@@ -1,6 +1,6 @@
#!/usr/bin/env python
-# Copyright 2016-2022 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2023 Biomedical Imaging Group Rotterdam, Departments of
# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
#
# Licensed under the Apache License, Version 2.0 (the "License");
@@ -142,6 +142,8 @@ def __init__(self, name='test'):
self.masks_normalize_test = list()
self.features_test = list()
self.metadata_test = list()
+
+ self.trained_model = None
self.Elastix_Para = list()
self.label_names = 'Label1, Label2'
@@ -159,6 +161,7 @@ def __init__(self, name='test'):
self.segmode = []
self._add_evaluation = False
self.TrainTest = False
+ self.OnlyTest = False
# Memory settings for all fastr nodes
self.fastr_memory_parameters = dict()
@@ -230,6 +233,11 @@ def defaultconfig(self):
config['Preprocessing']['BiasCorrection_Mask'] = 'False'
config['Preprocessing']['CheckOrientation'] = 'False'
config['Preprocessing']['OrientationPrimaryAxis'] = 'axial'
+ config['Preprocessing']['HistogramEqualization'] = 'False'
+ config['Preprocessing']['HistogramEqualization_Alpha'] = '0.3'
+ config['Preprocessing']['HistogramEqualization_Beta'] = '0.3'
+ config['Preprocessing']['HistogramEqualization_Radius'] = '5'
+
# Segmentix
config['Segmentix'] = dict()
@@ -384,6 +392,12 @@ def defaultconfig(self):
config['Featsel']['ReliefSampleSize'] = '0.75, 0.2'
config['Featsel']['ReliefDistanceP'] = '1, 3'
config['Featsel']['ReliefNumFeatures'] = '10, 40'
+ config['Featsel']['RFE'] = '0.0'
+ config['Featsel']['RFE_estimator'] = config['Featsel']['SelectFromModel_estimator']
+ config['Featsel']['RFE_lasso_alpha'] = config['Featsel']['SelectFromModel_lasso_alpha']
+ config['Featsel']['RFE_n_trees'] = config['Featsel']['SelectFromModel_n_trees']
+ config['Featsel']['RFE_n_features_to_select'] = '10, 90'
+ config['Featsel']['RFE_step'] = '1, 9'
# Groupwise Featureselection options
config['SelectFeatGroup'] = dict()
@@ -500,6 +514,7 @@ def defaultconfig(self):
config['HyperOptimization']['memory'] = '3G'
config['HyperOptimization']['refit_training_workflows'] = 'False'
config['HyperOptimization']['refit_validation_workflows'] = 'False'
+ config['HyperOptimization']['fix_random_seed'] = 'False'
# SMAC options
config['SMAC'] = dict()
@@ -531,30 +546,35 @@ def add_tools(self):
"""Add several tools to the WORC object."""
self.Tools = Tools()
- def build(self, wtype='training'):
+ def build(self, buildtype='training'):
"""Build the network based on the given attributes.
Parameters
----------
- wtype: string, default 'training'
+ buildtype: string, default 'training'
Specify the WORC execution type.
- - testing: use if you have a trained classifier and want to
+ - inference: use if you have a trained classifier and want to
train it on some new images.
- training: use if you want to train a classifier from a dataset.
"""
- self.wtype = wtype
- if wtype == 'training':
+ if buildtype == 'training':
self.build_training()
- elif wtype == 'testing':
- self.build_testing()
-
+ elif buildtype == 'inference':
+ raise WORCexceptions.WORCValueError("Inference workflow is still WIP and does not fully work yet.")
+ self.TrainTest = True
+ self.OnlyTest = True
+ self.build_inference()
+
def build_training(self):
"""Build the training network based on the given attributes."""
# We either need images or features for Radiomics
if self.images_test or self.features_test:
+ if not self.labels_test:
+ m = "You provided images and/or features for a test set, but not ground truth labels. Please also provide labels for the test set."
+ raise WORCexceptions.WORCValueError(m)
self.TrainTest = True
-
+
if self.images_train or self.features_train:
print('Building training network...')
# We currently require labels for supervised learning
@@ -571,9 +591,6 @@ def build_training(self):
# NOTE: We currently use the first configuration as general config
image_types = list()
for c in range(len(self.configs)):
- if type(self.configs[c]) == str:
- # Probably, c is a configuration file
- self.configs[c] = config_io.load_config(self.configs[c])
image_types.append(self.configs[c]['ImageFeatures']['image_type'])
if self.configs[0]['General']['Fingerprint'] == 'True' and any(imt not in all_modalities for imt in image_types):
@@ -1090,6 +1107,358 @@ def build_training(self):
else:
raise WORCexceptions.WORCIOError("Please provide either images or features.")
+ def build_inference(self):
+ """Build a network to test an already trained model on a test dataset based on the given attributes."""
+ #FIXME WIP
+ if self.images_test or self.features_test:
+ if not self.labels_test:
+ m = "You provided images and/or features for a test set, but not ground truth labels. Please also provide labels for the test set."
+ raise WORCexceptions.WORCValueError(m)
+ else:
+ m = "Please provide either images and/or features for your test set."
+ raise WORCexceptions.WORCValueError(m)
+
+ if not self.configs:
+ m = 'For a testing workflow, you need to provide a WORC config.ini file'
+ raise WORCexceptions.WORCValueError(m)
+
+ self.network = fastr.create_network(self.name)
+
+ # Add trained model node
+ memory = self.fastr_memory_parameters['Classification']
+ self.source_trained_model = self.network.create_source('HDF5',
+ id='trained_model',
+ node_group='trained_model', step_id='general_sources')
+
+ if self.images_test or self.features_test:
+ print('Building testing network...')
+ # We currently require labels for supervised learning
+ if self.labels_test:
+ self.network = fastr.create_network(self.name)
+
+ # Extract some information from the configs
+ image_types = list()
+ for conf_it in range(len(self.configs)):
+ if type(self.configs[conf_it]) == str:
+ # Config is a .ini file, load
+ temp_conf = config_io.load_config(self.configs[conf_it])
+ else:
+ temp_conf = self.configs[conf_it]
+
+ image_type = temp_conf['ImageFeatures']['image_type']
+ image_types.append(image_type)
+
+ # NOTE: We currently use the first configuration as general config
+ if conf_it == 0:
+ print(temp_conf)
+ ensemble_method = [temp_conf['Ensemble']['Method']]
+ ensemble_size = [temp_conf['Ensemble']['Size']]
+ label_names = [temp_conf['Labels']['label_names']]
+ use_ComBat = temp_conf['General']['ComBat']
+ use_segmentix = temp_conf['General']['Segmentix']
+
+ # Create various input sources
+ self.source_patientclass_test =\
+ self.network.create_source('PatientInfoFile',
+ id='patientclass_test',
+ node_group='pctest', step_id='test_sources')
+
+ self.source_ensemble_method =\
+ self.network.create_constant('String', ensemble_method,
+ id='ensemble_method',
+ step_id='Evaluation')
+
+ self.source_ensemble_size =\
+ self.network.create_constant('String', ensemble_size,
+ id='ensemble_size',
+ step_id='Evaluation')
+
+ self.source_LabelType =\
+ self.network.create_constant('String', label_names,
+ id='LabelType',
+ step_id='Evaluation')
+
+ memory = self.fastr_memory_parameters['PlotEstimator']
+ self.plot_estimator =\
+ self.network.create_node('worc/PlotEstimator:1.0', tool_version='1.0',
+ id='plot_Estimator',
+ resources=ResourceLimit(memory=memory),
+ step_id='Evaluation')
+
+ # Links to performance creator
+ self.plot_estimator.inputs['ensemble_method'] = self.source_ensemble_method.output
+ self.plot_estimator.inputs['ensemble_size'] = self.source_ensemble_size.output
+ self.plot_estimator.inputs['label_type'] = self.source_LabelType.output
+ pinfo = self.source_patientclass_test.output
+ self.plot_estimator.inputs['prediction'] = self.source_trained_model.output
+ self.plot_estimator.inputs['pinfo'] = pinfo
+
+ # Performance output
+ self.sink_performance = self.network.create_sink('JsonFile', id='performance', step_id='general_sinks')
+ self.sink_performance.input = self.plot_estimator.outputs['output_json']
+
+ if self.masks_normalize_test:
+ self.sources_masks_normalize_test = dict()
+
+ # -----------------------------------------------------
+ # Optionally, add ComBat Harmonization. Currently done
+ # on full dataset, not in a cross-validation
+ if use_ComBat == 'True':
+ message = '[ERROR] If you want to use ComBat, you need to provide training images or features as well.'
+ raise WORCexceptions.WORCNotImplementedError(message)
+
+ if not self.features_test:
+ # Create nodes to compute features
+ # General
+ self.sources_parameters = dict()
+ self.source_config_pyradiomics = dict()
+ self.source_toolbox_name = dict()
+
+ # testing only
+ self.calcfeatures_test = dict()
+ self.featureconverter_test = dict()
+ self.preprocessing_test = dict()
+ self.sources_images_test = dict()
+ self.sinks_features_test = dict()
+ self.sinks_configs = dict()
+ self.converters_im_test = dict()
+ self.converters_seg_test = dict()
+ self.links_C1_test = dict()
+
+ self.featurecalculators = dict()
+
+ # Check which nodes are necessary
+ if not self.segmentations_test:
+ message = "No automatic segmentation method is yet implemented."
+ raise WORCexceptions.WORCNotImplementedError(message)
+
+ elif len(self.segmentations_test) == len(image_types):
+ # Segmentations provided
+ self.sources_segmentations_test = dict()
+ self.segmode = 'Provided'
+
+ elif len(self.segmentations_test) == 1:
+ # Assume segmentations need to be registered to other modalities
+ print('\t - Adding Elastix node for image registration.')
+ self.add_elastix_sourcesandsinks()
+ pass
+
+ else:
+ nseg = len(self.segmentations_test)
+ nim = len(image_types)
+ m = f'Length of segmentations for testing is ' +\
+ f'{nseg}: should be equal to number of images' +\
+ f' ({nim}) or 1 when using registration.'
+ raise WORCexceptions.WORCValueError(m)
+
+ if use_segmentix == 'True':
+ # Use the segmentix toolbox for segmentation processing
+ print('\t - Adding segmentix node for segmentation preprocessing.')
+ self.sinks_segmentations_segmentix_test = dict()
+ self.sources_masks_test = dict()
+ self.converters_masks_test = dict()
+ self.nodes_segmentix_test = dict()
+
+ if self.semantics_test:
+ # Semantic features are supplied
+ self.sources_semantics_test = dict()
+
+ if self.metadata_test:
+ # Metadata to extract patient features from is supplied
+ self.sources_metadata_test = dict()
+
+ # Create a part of the pipeline for each modality
+ self.modlabels = list()
+ for nmod, mod in enumerate(image_types):
+ # Extract some modality specific config info
+ if type(self.configs[conf_it]) == str:
+ # Config is a .ini file, load
+ temp_conf = config_io.load_config(self.configs[nmod])
+ else:
+ temp_conf = self.configs[nmod]
+
+ # Create label for each modality/image
+ num = 0
+ label = mod + '_' + str(num)
+ while label in self.calcfeatures_test.keys():
+ # if label already exists, add number to label
+ num += 1
+ label = mod + '_' + str(num)
+ self.modlabels.append(label)
+
+ # Create required sources and sinks
+ self.sources_parameters[label] = self.network.create_source('ParameterFile', id=f'config_{label}', step_id='general_sources')
+ self.sources_images_test[label] = self.network.create_source('ITKImageFile', id='images_test_' + label, node_group='test', step_id='test_sources')
+
+ if self.metadata_test and len(self.metadata_test) >= nmod + 1:
+ self.sources_metadata_test[label] = self.network.create_source('DicomImageFile', id='metadata_test_' + label, node_group='test', step_id='test_sources')
+
+ if self.masks_test and len(self.masks_test) >= nmod + 1:
+ # Create mask source and convert
+ self.sources_masks_test[label] = self.network.create_source('ITKImageFile', id='mask_test_' + label, node_group='test', step_id='test_sources')
+ memory = self.fastr_memory_parameters['WORCCastConvert']
+ self.converters_masks_test[label] =\
+ self.network.create_node('worc/WORCCastConvert:0.3.2',
+ tool_version='0.1',
+ id='convert_mask_test_' + label,
+ node_group='test',
+ resources=ResourceLimit(memory=memory),
+ step_id='FileConversion')
+
+ self.converters_masks_test[label].inputs['image'] = self.sources_masks_test[label].output
+
+ # First convert the images
+ if any(modality in mod for modality in all_modalities):
+ # Use WORC PXCastConvet for converting image formats
+ memory = self.fastr_memory_parameters['WORCCastConvert']
+ self.converters_im_test[label] =\
+ self.network.create_node('worc/WORCCastConvert:0.3.2',
+ tool_version='0.1',
+ id='convert_im_test_' + label,
+ resources=ResourceLimit(memory=memory),
+ step_id='FileConversion')
+
+ else:
+ raise WORCexceptions.WORCTypeError(('No valid image type for modality {}: {} provided.').format(str(nmod), mod))
+
+ # Create required links
+ self.converters_im_test[label].inputs['image'] = self.sources_images_test[label].output
+
+ # -----------------------------------------------------
+ # Preprocessing
+ preprocess_node = str(temp_conf['General']['Preprocessing'])
+ print('\t - Adding preprocessing node for image preprocessing.')
+ self.add_preprocessing(preprocess_node, label, nmod)
+
+ # -----------------------------------------------------
+ # Feature calculation
+ feature_calculators =\
+ temp_conf['General']['FeatureCalculators']
+ if not isinstance(feature_calculators, list):
+ # Configparser object, need to split string
+ feature_calculators = feature_calculators.strip('][').split(', ')
+ self.featurecalculators[label] = [f.split('/')[0] for f in feature_calculators]
+ else:
+ self.featurecalculators[label] = feature_calculators
+
+
+ # Add lists for feature calculation and converter objects
+ self.calcfeatures_test[label] = list()
+ self.featureconverter_test[label] = list()
+
+ for f in feature_calculators:
+ print(f'\t - Adding feature calculation node: {f}.')
+ self.add_feature_calculator(f, label, nmod)
+
+ # -----------------------------------------------------
+ # Create the neccesary nodes for the segmentation
+ if self.segmode == 'Provided':
+ # Segmentation ----------------------------------------------------
+ # Use the provided segmantions for each modality
+ memory = self.fastr_memory_parameters['WORCCastConvert']
+ self.sources_segmentations_test[label] =\
+ self.network.create_source('ITKImageFile',
+ id='segmentations_test_' + label,
+ node_group='test',
+ step_id='test_sources')
+
+ self.converters_seg_test[label] =\
+ self.network.create_node('worc/WORCCastConvert:0.3.2',
+ tool_version='0.1',
+ id='convert_seg_test_' + label,
+ resources=ResourceLimit(memory=memory),
+ step_id='FileConversion')
+
+ self.converters_seg_test[label].inputs['image'] =\
+ self.sources_segmentations_test[label].output
+
+ elif self.segmode == 'Register':
+ # ---------------------------------------------
+ # Registration nodes: Align segmentation of first
+ # modality to others using registration with Elastix
+ self.add_elastix(label, nmod)
+
+ # -----------------------------------------------------
+ # Optionally, add segmentix, the in-house segmentation
+ # processor of WORC
+ if temp_conf['General']['Segmentix'] == 'True':
+ self.add_segmentix(label, nmod)
+ elif temp_conf['Preprocessing']['Resampling'] == 'True':
+ raise WORCexceptions.WORCValueError('If you use resampling, ' +
+ 'have to use segmentix to ' +
+ ' make sure the mask is ' +
+ 'also resampled. Please ' +
+ 'set ' +
+ 'config["General"]["Segmentix"]' +
+ 'to "True".')
+
+ else:
+ # Provide source or elastix segmentations to
+ # feature calculator
+ for i_node in range(len(self.calcfeatures_test[label])):
+ if self.segmode == 'Provided':
+ self.calcfeatures_test[label][i_node].inputs['segmentation'] =\
+ self.converters_seg_test[label].outputs['image']
+ elif self.segmode == 'Register':
+ if nmod > 0:
+ self.calcfeatures_test[label][i_node].inputs['segmentation'] =\
+ self.transformix_seg_nodes_test[label].outputs['image']
+ else:
+ self.calcfeatures_test[label][i_node].inputs['segmentation'] =\
+ self.converters_seg_test[label].outputs['image']
+
+ # -----------------------------------------------------
+ # Optionally, add ComBat Harmonization
+ if use_ComBat == 'True':
+ # Link features to ComBat
+ self.links_Combat1_test[label] = list()
+ for i_node, fname in enumerate(self.featurecalculators[label]):
+ self.links_Combat1_test[label].append(self.ComBat.inputs['features_test'][f'{label}_{self.featurecalculators[label][i_node]}'] << self.featureconverter_test[label][i_node].outputs['feat_out'])
+ self.links_Combat1_test[label][i_node].collapse = 'test'
+
+ # -----------------------------------------------------
+ # Output the features
+ # Add the features from this modality to the classifier node input
+ self.links_C1_test[label] = list()
+ self.sinks_features_test[label] = list()
+
+ for i_node, fname in enumerate(self.featurecalculators[label]):
+ # Create sink for feature outputs
+ node_id = 'features_test_' + label + '_' + fname
+ node_id = node_id.replace(':', '_').replace('.', '_').replace('/', '_')
+ self.sinks_features_test[label].append(self.network.create_sink('HDF5', id=node_id, step_id='test_sinks'))
+
+ # Save output
+ self.sinks_features_test[label][i_node].input = self.featureconverter_test[label][i_node].outputs['feat_out']
+
+ else:
+ # Features already provided: hence we can skip numerous nodes
+ self.sources_features_train = dict()
+ self.links_C1_train = dict()
+
+ if self.features_test:
+ self.sources_features_test = dict()
+ self.links_C1_test = dict()
+
+ # Create label for each modality/image
+ self.modlabels = list()
+ for num, mod in enumerate(image_types):
+ num = 0
+ label = mod + str(num)
+ while label in self.sources_features_train.keys():
+ # if label exists, add number to label
+ num += 1
+ label = mod + str(num)
+ self.modlabels.append(label)
+
+ # Create a node for the features
+ self.sources_features_test[label] = self.network.create_source('HDF5', id='features_test_' + label, node_group='test', step_id='test_sources')
+
+ else:
+ raise WORCexceptions.WORCIOError("Please provide labels for training, i.e., WORC.labels_train or SimpleWORC.labels_from_this_file.")
+ else:
+ raise WORCexceptions.WORCIOError("Please provide either images or features.")
+
def add_fingerprinter(self, id, type, config_source):
"""Add WORC Fingerprinter to the network.
@@ -1155,7 +1524,7 @@ def add_ComBat(self):
self.links_Combat_out_train.collapse = 'ComBat'
self.sinks_features_train_ComBat.input = self.ComBat.outputs['features_train_out']
- if self.TrainTest:
+ if self.TrainTest or self.OnlyTest:
# Create sink for ComBat output
self.sinks_features_test_ComBat = self.network.create_sink('HDF5', id='features_test_ComBat', step_id='ComBat')
@@ -1170,21 +1539,32 @@ def add_ComBat(self):
def add_preprocessing(self, preprocess_node, label, nmod):
"""Add nodes required for preprocessing of images."""
+
+ # Extract some general information on the setup
+ if type(self.configs[0]) == str:
+ # Config is a .ini file, load
+ temp_conf = config_io.load_config(self.configs[nmod])
+ else:
+ temp_conf = self.configs[nmod]
+
memory = self.fastr_memory_parameters['Preprocessing']
- self.preprocessing_train[label] = self.network.create_node(preprocess_node, tool_version='1.0', id='preprocessing_train_' + label, resources=ResourceLimit(memory=memory), step_id='Preprocessing')
+ if not self.OnlyTest:
+ self.preprocessing_train[label] = self.network.create_node(preprocess_node, tool_version='1.0', id='preprocessing_train_' + label, resources=ResourceLimit(memory=memory), step_id='Preprocessing')
+
if self.TrainTest:
self.preprocessing_test[label] = self.network.create_node(preprocess_node, tool_version='1.0', id='preprocessing_test_' + label, resources=ResourceLimit(memory=memory), step_id='Preprocessing')
# Create required links
- if self.configs[0]['General']['Fingerprint'] == 'True':
- self.preprocessing_train[label].inputs['parameters'] = self.node_fingerprinters[label].outputs['config']
- else:
- self.preprocessing_train[label].inputs['parameters'] = self.sources_parameters[label].output
+ if not self.OnlyTest:
+ if temp_conf['General']['Fingerprint'] == 'True':
+ self.preprocessing_train[label].inputs['parameters'] = self.node_fingerprinters[label].outputs['config']
+ else:
+ self.preprocessing_train[label].inputs['parameters'] = self.sources_parameters[label].output
- self.preprocessing_train[label].inputs['image'] = self.converters_im_train[label].outputs['image']
+ self.preprocessing_train[label].inputs['image'] = self.converters_im_train[label].outputs['image']
if self.TrainTest:
- if self.configs[0]['General']['Fingerprint'] == 'True':
+ if temp_conf['General']['Fingerprint'] == 'True' and not self.OnlyTest:
self.preprocessing_test[label].inputs['parameters'] = self.node_fingerprinters[label].outputs['config']
else:
self.preprocessing_test[label].inputs['parameters'] = self.sources_parameters[label].output
@@ -1214,12 +1594,13 @@ def add_feature_calculator(self, calcfeat_node, label, nmod):
label])
memory = self.fastr_memory_parameters['FeatureCalculator']
- node_train =\
- self.network.create_node(calcfeat_node,
- tool_version='1.0',
- id='calcfeatures_train_' + node_ID,
- resources=ResourceLimit(memory=memory),
- step_id='Feature_Extraction')
+ if not self.OnlyTest:
+ node_train =\
+ self.network.create_node(calcfeat_node,
+ tool_version='1.0',
+ id='calcfeatures_train_' + node_ID,
+ resources=ResourceLimit(memory=memory),
+ step_id='Feature_Extraction')
if self.TrainTest:
node_test =\
@@ -1246,8 +1627,9 @@ def add_feature_calculator(self, calcfeat_node, label, nmod):
id='format_pyradiomics_' + label,
node_group='train',
step_id='Feature_Extraction')
- node_train.inputs['format'] =\
- self.source_format_pyradiomics.output
+ if not self.OnlyTest:
+ node_train.inputs['format'] =\
+ self.source_format_pyradiomics.output
if self.TrainTest:
node_test.inputs['format'] =\
@@ -1255,25 +1637,37 @@ def add_feature_calculator(self, calcfeat_node, label, nmod):
# Create required links
# We can have a different config for different tools
- if 'pyradiomics' in calcfeat_node.lower():
- if self.configs[0]['General']['Fingerprint'] != 'True':
- node_train.inputs['parameters'] =\
- self.source_config_pyradiomics[label].output
+ if not self.OnlyTest:
+ if 'pyradiomics' in calcfeat_node.lower():
+ if self.configs[0]['General']['Fingerprint'] != 'True':
+ node_train.inputs['parameters'] =\
+ self.source_config_pyradiomics[label].output
+ else:
+ node_train.inputs['parameters'] =\
+ self.node_fingerprinters[label].outputs['config_pyradiomics']
else:
- node_train.inputs['parameters'] =\
- self.node_fingerprinters[label].outputs['config_pyradiomics']
- else:
- if self.configs[0]['General']['Fingerprint'] == 'True':
- node_train.inputs['parameters'] =\
- self.node_fingerprinters[label].outputs['config']
+ if self.configs[0]['General']['Fingerprint'] == 'True':
+ node_train.inputs['parameters'] =\
+ self.node_fingerprinters[label].outputs['config']
+ else:
+ node_train.inputs['parameters'] =\
+ self.sources_parameters[label].output
+
+ node_train.inputs['image'] =\
+ self.preprocessing_train[label].outputs['image']
+
+ if self.OnlyTest:
+ if 'pyradiomics' in calcfeat_node.lower():
+ node_test.inputs['parameters'] =\
+ self.source_config_pyradiomics[label].output
else:
- node_train.inputs['parameters'] =\
+ node_test.inputs['parameters'] =\
self.sources_parameters[label].output
- node_train.inputs['image'] =\
- self.preprocessing_train[label].outputs['image']
-
- if self.TrainTest:
+ node_test.inputs['image'] =\
+ self.preprocessing_test[label].outputs['image']
+
+ elif self.TrainTest:
if 'pyradiomics' in calcfeat_node.lower():
if self.configs[0]['General']['Fingerprint'] != 'True':
node_test.inputs['parameters'] =\
@@ -1321,14 +1715,15 @@ def add_feature_calculator(self, calcfeat_node, label, nmod):
self.sources_semantics_test[label].output
# Add feature converter to make features WORC compatible
- conv_train =\
- self.network.create_node('worc/FeatureConverter:1.0',
- tool_version='1.0',
- id='featureconverter_train_' + node_ID,
- resources=ResourceLimit(memory='4G'),
- step_id='Feature_Extraction')
+ if not self.OnlyTest:
+ conv_train =\
+ self.network.create_node('worc/FeatureConverter:1.0',
+ tool_version='1.0',
+ id='featureconverter_train_' + node_ID,
+ resources=ResourceLimit(memory='4G'),
+ step_id='Feature_Extraction')
- conv_train.inputs['feat_in'] = node_train.outputs['features']
+ conv_train.inputs['feat_in'] = node_train.outputs['features']
# Add source to tell converter which toolbox we use
if 'pyradiomics' in calcfeat_node.lower():
@@ -1344,12 +1739,13 @@ def add_feature_calculator(self, calcfeat_node, label, nmod):
id=f'toolbox_name_{toolbox}_{label}',
step_id='Feature_Extraction')
- conv_train.inputs['toolbox'] = self.source_toolbox_name[label].output
- if self.configs[0]['General']['Fingerprint'] == 'True':
- conv_train.inputs['config'] =\
- self.node_fingerprinters[label].outputs['config']
- else:
- conv_train.inputs['config'] = self.sources_parameters[label].output
+ if not self.OnlyTest:
+ conv_train.inputs['toolbox'] = self.source_toolbox_name[label].output
+ if self.configs[0]['General']['Fingerprint'] == 'True':
+ conv_train.inputs['config'] =\
+ self.node_fingerprinters[label].outputs['config']
+ else:
+ conv_train.inputs['config'] = self.sources_parameters[label].output
if self.TrainTest:
conv_test =\
@@ -1361,7 +1757,10 @@ def add_feature_calculator(self, calcfeat_node, label, nmod):
conv_test.inputs['feat_in'] = node_test.outputs['features']
conv_test.inputs['toolbox'] = self.source_toolbox_name[label].output
- if self.configs[0]['General']['Fingerprint'] == 'True':
+ if self.OnlyTest:
+ conv_test.inputs['config'] =\
+ self.sources_parameters[label].output
+ elif self.configs[0]['General']['Fingerprint'] == 'True':
conv_test.inputs['config'] =\
self.node_fingerprinters[label].outputs['config']
else:
@@ -1369,8 +1768,10 @@ def add_feature_calculator(self, calcfeat_node, label, nmod):
self.sources_parameters[label].output
# Append to nodes to list
- self.calcfeatures_train[label].append(node_train)
- self.featureconverter_train[label].append(conv_train)
+ if not self.OnlyTest:
+ self.calcfeatures_train[label].append(node_train)
+ self.featureconverter_train[label].append(conv_train)
+
if self.TrainTest:
self.calcfeatures_test[label].append(node_test)
self.featureconverter_test[label].append(conv_test)
@@ -1381,25 +1782,28 @@ def add_elastix_sourcesandsinks(self):
self.segmode = 'Register'
self.source_Elastix_Parameters = dict()
- self.elastix_nodes_train = dict()
- self.transformix_seg_nodes_train = dict()
- self.sources_segmentations_train = dict()
- self.sinks_transformations_train = dict()
- self.sinks_segmentations_elastix_train = dict()
- self.sinks_images_elastix_train = dict()
- self.converters_seg_train = dict()
- self.edittransformfile_nodes_train = dict()
- self.transformix_im_nodes_train = dict()
-
- self.elastix_nodes_test = dict()
- self.transformix_seg_nodes_test = dict()
- self.sources_segmentations_test = dict()
- self.sinks_transformations_test = dict()
- self.sinks_segmentations_elastix_test = dict()
- self.sinks_images_elastix_test = dict()
- self.converters_seg_test = dict()
- self.edittransformfile_nodes_test = dict()
- self.transformix_im_nodes_test = dict()
+
+ if not self.OnlyTest:
+ self.elastix_nodes_train = dict()
+ self.transformix_seg_nodes_train = dict()
+ self.sources_segmentations_train = dict()
+ self.sinks_transformations_train = dict()
+ self.sinks_segmentations_elastix_train = dict()
+ self.sinks_images_elastix_train = dict()
+ self.converters_seg_train = dict()
+ self.edittransformfile_nodes_train = dict()
+ self.transformix_im_nodes_train = dict()
+
+ if self.TrainTest:
+ self.elastix_nodes_test = dict()
+ self.transformix_seg_nodes_test = dict()
+ self.sources_segmentations_test = dict()
+ self.sinks_transformations_test = dict()
+ self.sinks_segmentations_elastix_test = dict()
+ self.sinks_images_elastix_test = dict()
+ self.converters_seg_test = dict()
+ self.edittransformfile_nodes_test = dict()
+ self.transformix_im_nodes_test = dict()
def add_elastix(self, label, nmod):
""" Add image registration through elastix to network."""
@@ -1407,21 +1811,22 @@ def add_elastix(self, label, nmod):
# which should be on the first modality
if nmod == 0:
memory = self.fastr_memory_parameters['WORCCastConvert']
- self.sources_segmentations_train[label] =\
- self.network.create_source('ITKImageFile',
- id='segmentations_train_' + label,
- node_group='train',
- step_id='train_sources')
-
- self.converters_seg_train[label] =\
- self.network.create_node('worc/WORCCastConvert:0.3.2',
- tool_version='0.1',
- id='convert_seg_train_' + label,
- resources=ResourceLimit(memory=memory),
- step_id='FileConversion')
+ if not self.OnlyTest:
+ self.sources_segmentations_train[label] =\
+ self.network.create_source('ITKImageFile',
+ id='segmentations_train_' + label,
+ node_group='train',
+ step_id='train_sources')
+
+ self.converters_seg_train[label] =\
+ self.network.create_node('worc/WORCCastConvert:0.3.2',
+ tool_version='0.1',
+ id='convert_seg_train_' + label,
+ resources=ResourceLimit(memory=memory),
+ step_id='FileConversion')
- self.converters_seg_train[label].inputs['image'] =\
- self.sources_segmentations_train[label].output
+ self.converters_seg_train[label].inputs['image'] =\
+ self.sources_segmentations_train[label].output
if self.TrainTest:
self.sources_segmentations_test[label] =\
@@ -1451,27 +1856,28 @@ def add_elastix(self, label, nmod):
str(self.configs[0]['General']['TransformationNode'])
memory_elastix = self.fastr_memory_parameters['Elastix']
- self.elastix_nodes_train[label] =\
- self.network.create_node(elastix_node,
- tool_version='0.2',
- id='elastix_train_' + label,
- resources=ResourceLimit(memory=memory_elastix),
- step_id='Image_Registration')
-
- memory_transformix = self.fastr_memory_parameters['Elastix']
- self.transformix_seg_nodes_train[label] =\
- self.network.create_node(transformix_node,
- tool_version='0.2',
- id='transformix_seg_train_' + label,
- resources=ResourceLimit(memory=memory_transformix),
- step_id='Image_Registration')
-
- self.transformix_im_nodes_train[label] =\
- self.network.create_node(transformix_node,
- tool_version='0.2',
- id='transformix_im_train_' + label,
- resources=ResourceLimit(memory=memory_transformix),
- step_id='Image_Registration')
+ if not self.OnlyTest:
+ self.elastix_nodes_train[label] =\
+ self.network.create_node(elastix_node,
+ tool_version='0.2',
+ id='elastix_train_' + label,
+ resources=ResourceLimit(memory=memory_elastix),
+ step_id='Image_Registration')
+
+ memory_transformix = self.fastr_memory_parameters['Elastix']
+ self.transformix_seg_nodes_train[label] =\
+ self.network.create_node(transformix_node,
+ tool_version='0.2',
+ id='transformix_seg_train_' + label,
+ resources=ResourceLimit(memory=memory_transformix),
+ step_id='Image_Registration')
+
+ self.transformix_im_nodes_train[label] =\
+ self.network.create_node(transformix_node,
+ tool_version='0.2',
+ id='transformix_im_train_' + label,
+ resources=ResourceLimit(memory=memory_transformix),
+ step_id='Image_Registration')
if self.TrainTest:
self.elastix_nodes_test[label] =\
@@ -1497,15 +1903,16 @@ def add_elastix(self, label, nmod):
# Create sources_segmentation
# M1 = moving, others = fixed
- self.elastix_nodes_train[label].inputs['fixed_image'] =\
- self.converters_im_train[label].outputs['image']
+ if not self.OnlyTest:
+ self.elastix_nodes_train[label].inputs['fixed_image'] =\
+ self.converters_im_train[label].outputs['image']
- self.elastix_nodes_train[label].inputs['moving_image'] =\
- self.converters_im_train[self.modlabels[0]].outputs['image']
+ self.elastix_nodes_train[label].inputs['moving_image'] =\
+ self.converters_im_train[self.modlabels[0]].outputs['image']
# Add node that copies metadata from the image to the
# segmentation if required
- if self.CopyMetadata:
+ if self.CopyMetadata and not self.OnlyTest:
# Copy metadata from the image which was registered to
# the segmentation, if it is not created yet
if not hasattr(self, "copymetadata_nodes_train"):
@@ -1567,12 +1974,12 @@ def add_elastix(self, label, nmod):
id='Elastix_Para_' + label,
node_group='elpara',
step_id='Image_Registration')
+ if not self.OnlyTest:
+ self.link_elparam_train =\
+ self.network.create_link(self.source_Elastix_Parameters[label].output,
+ self.elastix_nodes_train[label].inputs['parameters'])
- self.link_elparam_train =\
- self.network.create_link(self.source_Elastix_Parameters[label].output,
- self.elastix_nodes_train[label].inputs['parameters'])
-
- self.link_elparam_train.collapse = 'elpara'
+ self.link_elparam_train.collapse = 'elpara'
if self.TrainTest:
self.link_elparam_test =\
@@ -1596,17 +2003,18 @@ def add_elastix(self, label, nmod):
self.converters_masks_test[self.modlabels[0]].outputs['image']
# Change the FinalBSpline Interpolation order to 0 as required for binarie images: see https://github.com/SuperElastix/elastix/wiki/FAQ
- self.edittransformfile_nodes_train[label] =\
- self.network.create_node('elastixtools/EditElastixTransformFile:0.1',
- tool_version='0.1',
- id='EditElastixTransformFile_train_' + label,
- step_id='Image_Registration')
+ if not self.OnlyTest:
+ self.edittransformfile_nodes_train[label] =\
+ self.network.create_node('elastixtools/EditElastixTransformFile:0.1',
+ tool_version='0.1',
+ id='EditElastixTransformFile_train_' + label,
+ step_id='Image_Registration')
- self.edittransformfile_nodes_train[label].inputs['set'] =\
- ["FinalBSplineInterpolationOrder=0"]
+ self.edittransformfile_nodes_train[label].inputs['set'] =\
+ ["FinalBSplineInterpolationOrder=0"]
- self.edittransformfile_nodes_train[label].inputs['transform'] =\
- self.elastix_nodes_train[label].outputs['transform'][-1]
+ self.edittransformfile_nodes_train[label].inputs['transform'] =\
+ self.elastix_nodes_train[label].outputs['transform'][-1]
if self.TrainTest:
self.edittransformfile_nodes_test[label] =\
@@ -1622,14 +2030,15 @@ def add_elastix(self, label, nmod):
self.elastix_nodes_test[label].outputs['transform'][-1]
# Link data and transformation to transformix and source
- self.transformix_seg_nodes_train[label].inputs['transform'] =\
- self.edittransformfile_nodes_train[label].outputs['transform']
+ if not self.OnlyTest:
+ self.transformix_seg_nodes_train[label].inputs['transform'] =\
+ self.edittransformfile_nodes_train[label].outputs['transform']
- self.transformix_im_nodes_train[label].inputs['transform'] =\
- self.elastix_nodes_train[label].outputs['transform'][-1]
+ self.transformix_im_nodes_train[label].inputs['transform'] =\
+ self.elastix_nodes_train[label].outputs['transform'][-1]
- self.transformix_im_nodes_train[label].inputs['image'] =\
- self.converters_im_train[self.modlabels[0]].outputs['image']
+ self.transformix_im_nodes_train[label].inputs['image'] =\
+ self.converters_im_train[self.modlabels[0]].outputs['image']
if self.TrainTest:
self.transformix_seg_nodes_test[label].inputs['transform'] =\
@@ -1642,38 +2051,44 @@ def add_elastix(self, label, nmod):
self.converters_im_test[self.modlabels[0]].outputs['image']
if self.configs[nmod]['General']['Segmentix'] != 'True':
- # These segmentations serve as input for the feature calculation
- for i_node in range(len(self.calcfeatures_train[label])):
- self.calcfeatures_train[label][i_node].inputs['segmentation'] =\
- self.transformix_seg_nodes_train[label].outputs['image']
- if self.TrainTest:
+ if not self.OnlyTest:
+ # These segmentations serve as input for the feature calculation
+ for i_node in range(len(self.calcfeatures_train[label])):
+ self.calcfeatures_train[label][i_node].inputs['segmentation'] =\
+ self.transformix_seg_nodes_train[label].outputs['image']
+ if self.TrainTest:
+ self.calcfeatures_test[label][i_node].inputs['segmentation'] =\
+ self.transformix_seg_nodes_test[label].outputs['image']
+ else:
+ for i_node in range(len(self.calcfeatures_test[label])):
self.calcfeatures_test[label][i_node].inputs['segmentation'] =\
self.transformix_seg_nodes_test[label].outputs['image']
# Save outputfor the training set
- self.sinks_transformations_train[label] =\
- self.network.create_sink('ElastixTransformFile',
- id='transformations_train_' + label,
- step_id='train_sinks')
+ if not self.OnlyTest:
+ self.sinks_transformations_train[label] =\
+ self.network.create_sink('ElastixTransformFile',
+ id='transformations_train_' + label,
+ step_id='train_sinks')
- self.sinks_segmentations_elastix_train[label] =\
- self.network.create_sink('ITKImageFile',
- id='segmentations_out_elastix_train_' + label,
- step_id='train_sinks')
+ self.sinks_segmentations_elastix_train[label] =\
+ self.network.create_sink('ITKImageFile',
+ id='segmentations_out_elastix_train_' + label,
+ step_id='train_sinks')
- self.sinks_images_elastix_train[label] =\
- self.network.create_sink('ITKImageFile',
- id='images_out_elastix_train_' + label,
- step_id='train_sinks')
+ self.sinks_images_elastix_train[label] =\
+ self.network.create_sink('ITKImageFile',
+ id='images_out_elastix_train_' + label,
+ step_id='train_sinks')
- self.sinks_transformations_train[label].input =\
- self.elastix_nodes_train[label].outputs['transform']
+ self.sinks_transformations_train[label].input =\
+ self.elastix_nodes_train[label].outputs['transform']
- self.sinks_segmentations_elastix_train[label].input =\
- self.transformix_seg_nodes_train[label].outputs['image']
+ self.sinks_segmentations_elastix_train[label].input =\
+ self.transformix_seg_nodes_train[label].outputs['image']
- self.sinks_images_elastix_train[label].input =\
- self.transformix_im_nodes_train[label].outputs['image']
+ self.sinks_images_elastix_train[label].input =\
+ self.transformix_im_nodes_train[label].outputs['image']
# Save output for the test set
if self.TrainTest:
@@ -1702,53 +2117,56 @@ def add_segmentix(self, label, nmod):
# Segmentix nodes -------------------------------------------------
# Use segmentix node to convert input segmentation into
# correct contour
- if label not in self.sinks_segmentations_segmentix_train:
- self.sinks_segmentations_segmentix_train[label] =\
- self.network.create_sink('ITKImageFile',
- id='segmentations_out_segmentix_train_' + label,
- step_id='train_sinks')
+ if not self.OnlyTest:
+ if label not in self.sinks_segmentations_segmentix_train:
+ self.sinks_segmentations_segmentix_train[label] =\
+ self.network.create_sink('ITKImageFile',
+ id='segmentations_out_segmentix_train_' + label,
+ step_id='train_sinks')
- memory = self.fastr_memory_parameters['Segmentix']
- self.nodes_segmentix_train[label] =\
- self.network.create_node('segmentix/Segmentix:1.0',
- tool_version='1.0',
- id='segmentix_train_' + label,
- resources=ResourceLimit(memory=memory),
- step_id='Preprocessing')
+ memory = self.fastr_memory_parameters['Segmentix']
+ self.nodes_segmentix_train[label] =\
+ self.network.create_node('segmentix/Segmentix:1.0',
+ tool_version='1.0',
+ id='segmentix_train_' + label,
+ resources=ResourceLimit(memory=memory),
+ step_id='Preprocessing')
- # Input the image
- self.nodes_segmentix_train[label].inputs['image'] =\
- self.converters_im_train[label].outputs['image']
+ # Input the image
+ self.nodes_segmentix_train[label].inputs['image'] =\
+ self.converters_im_train[label].outputs['image']
# Input the metadata
if self.metadata_train and len(self.metadata_train) >= nmod + 1:
self.nodes_segmentix_train[label].inputs['metadata'] = self.sources_metadata_train[label].output
# Input the segmentation
- if hasattr(self, 'transformix_seg_nodes_train'):
- if label in self.transformix_seg_nodes_train.keys():
- # Use output of registration in segmentix
- self.nodes_segmentix_train[label].inputs['segmentation_in'] =\
- self.transformix_seg_nodes_train[label].outputs['image']
+ if not self.OnlyTest:
+ if hasattr(self, 'transformix_seg_nodes_train'):
+ if label in self.transformix_seg_nodes_train.keys():
+ # Use output of registration in segmentix
+ self.nodes_segmentix_train[label].inputs['segmentation_in'] =\
+ self.transformix_seg_nodes_train[label].outputs['image']
+ else:
+ # Use original segmentation
+ self.nodes_segmentix_train[label].inputs['segmentation_in'] =\
+ self.converters_seg_train[label].outputs['image']
else:
# Use original segmentation
self.nodes_segmentix_train[label].inputs['segmentation_in'] =\
self.converters_seg_train[label].outputs['image']
- else:
- # Use original segmentation
- self.nodes_segmentix_train[label].inputs['segmentation_in'] =\
- self.converters_seg_train[label].outputs['image']
# Input the parameters
- if self.configs[0]['General']['Fingerprint'] == 'True':
- self.nodes_segmentix_train[label].inputs['parameters'] =\
- self.node_fingerprinters[label].outputs['config']
- else:
- self.nodes_segmentix_train[label].inputs['parameters'] =\
- self.sources_parameters[label].output
+ if not self.OnlyTest:
+ if self.configs[0]['General']['Fingerprint'] == 'True':
+ self.nodes_segmentix_train[label].inputs['parameters'] =\
+ self.node_fingerprinters[label].outputs['config']
+ else:
+ self.nodes_segmentix_train[label].inputs['parameters'] =\
+ self.sources_parameters[label].output
- self.sinks_segmentations_segmentix_train[label].input =\
- self.nodes_segmentix_train[label].outputs['segmentation_out']
+ self.sinks_segmentations_segmentix_train[label].input =\
+ self.nodes_segmentix_train[label].outputs['segmentation_out']
if self.TrainTest:
self.sinks_segmentations_segmentix_test[label] =\
@@ -1785,7 +2203,7 @@ def add_segmentix(self, label, nmod):
self.nodes_segmentix_test[label].inputs['segmentation_in'] =\
self.converters_seg_test[label].outputs['image']
- if self.configs[0]['General']['Fingerprint'] == 'True':
+ if self.configs[0]['General']['Fingerprint'] == 'True' and not self.OnlyTest:
self.nodes_segmentix_test[label].inputs['parameters'] =\
self.node_fingerprinters[label].outputs['config']
else:
@@ -1795,14 +2213,19 @@ def add_segmentix(self, label, nmod):
self.sinks_segmentations_segmentix_test[label].input =\
self.nodes_segmentix_test[label].outputs['segmentation_out']
- for i_node in range(len(self.calcfeatures_train[label])):
- self.calcfeatures_train[label][i_node].inputs['segmentation'] =\
- self.nodes_segmentix_train[label].outputs['segmentation_out']
+ if not self.OnlyTest:
+ for i_node in range(len(self.calcfeatures_train[label])):
+ self.calcfeatures_train[label][i_node].inputs['segmentation'] =\
+ self.nodes_segmentix_train[label].outputs['segmentation_out']
- if self.TrainTest:
+ if self.TrainTest:
+ self.calcfeatures_test[label][i_node].inputs['segmentation'] =\
+ self.nodes_segmentix_test[label].outputs['segmentation_out']
+ else:
+ for i_node in range(len(self.calcfeatures_test[label])):
self.calcfeatures_test[label][i_node].inputs['segmentation'] =\
self.nodes_segmentix_test[label].outputs['segmentation_out']
-
+
if self.masks_train and len(self.masks_train) >= nmod + 1:
# Use masks
self.nodes_segmentix_train[label].inputs['mask'] =\
@@ -1820,7 +2243,10 @@ def set(self):
self.sink_data = dict()
# Save the configurations as files
- self.save_config()
+ if not self.OnlyTest:
+ self.save_config()
+ else:
+ self.fastrconfigs = self.configs
# fixed splits
if self.fixedsplits:
@@ -1829,6 +2255,7 @@ def set(self):
# Set source and sink data
self.source_data['patientclass_train'] = self.labels_train
self.source_data['patientclass_test'] = self.labels_test
+ self.source_data['trained_model'] = self.trained_model
self.sink_data['classification'] = ("vfs://output/{}/estimator_{{sample_id}}_{{cardinality}}{{ext}}").format(self.name)
self.sink_data['performance'] = ("vfs://output/{}/performance_{{sample_id}}_{{cardinality}}{{ext}}").format(self.name)
@@ -1837,12 +2264,19 @@ def set(self):
self.sink_data['features_train_ComBat'] = ("vfs://output/{}/ComBat/features_ComBat_{{sample_id}}_{{cardinality}}{{ext}}").format(self.name)
self.sink_data['features_test_ComBat'] = ("vfs://output/{}/ComBat/features_ComBat_{{sample_id}}_{{cardinality}}{{ext}}").format(self.name)
+ # Get info from the first config file
+ if type(self.configs[0]) == str:
+ # Config is a .ini file, load
+ temp_conf = config_io.load_config(self.configs[0])
+ else:
+ temp_conf = self.configs[0]
+
# Set the source data from the WORC objects you created
for num, label in enumerate(self.modlabels):
self.source_data['config_' + label] = self.fastrconfigs[num]
self.sink_data[f'config_{label}_sink'] = f"vfs://output/{self.name}/config_{label}_{{sample_id}}_{{cardinality}}{{ext}}"
- if 'pyradiomics' in self.configs[0]['General']['FeatureCalculators'] and self.configs[0]['General']['Fingerprint'] != 'True':
+ if 'pyradiomics' in temp_conf['General']['FeatureCalculators'] and temp_conf['General']['Fingerprint'] != 'True':
self.source_data['config_pyradiomics_' + label] = self.pyradiomics_configs[num]
# Add train data sources
@@ -1912,6 +2346,7 @@ def set(self):
self.sink_data['images_out_elastix_test_' + label] = ("vfs://output/{}/Images/im_{}_elastix_{{sample_id}}_{{cardinality}}{{ext}}").format(self.name, label)
if hasattr(self, 'featurecalculators'):
for f in self.featurecalculators[label]:
+ f = f.replace(':', '_').replace('.', '_').replace('/', '_')
self.sink_data['features_test_' + label + '_' + f] = ("vfs://output/{}/Features/features_{}_{}_{{sample_id}}_{{cardinality}}{{ext}}").format(self.name, f, label)
# Add elastix sinks if used
@@ -1942,11 +2377,12 @@ def execute(self):
except graphviz.backend.CalledProcessError as e:
print(f'[WORC WARNING] Graphviz executable gave an error: not drawing network diagram. Original error: {e}')
- # export hyper param. search space to LaTeX table
- for config in self.fastrconfigs:
- config_path = Path(url2pathname(urlparse(config).path))
- tex_path = f'{config_path.parent.absolute() / config_path.stem}_hyperparams_space.tex'
- export_hyper_params_to_latex(config_path, tex_path)
+ # export hyper param. search space to LaTeX table. Only for training models.
+ if not self.OnlyTest:
+ for config in self.fastrconfigs:
+ config_path = Path(url2pathname(urlparse(config).path))
+ tex_path = f'{config_path.parent.absolute() / config_path.stem}_hyperparams_space.tex'
+ export_hyper_params_to_latex(config_path, tex_path)
if DebugDetector().do_detection():
print("Source Data:")
diff --git a/WORC/classification/SearchCV.py b/WORC/classification/SearchCV.py
index 80f89092..2cf33f08 100644
--- a/WORC/classification/SearchCV.py
+++ b/WORC/classification/SearchCV.py
@@ -436,7 +436,7 @@ def score(self, X, y=None):
def _check_is_fitted(self, method_name):
if not self.refit:
- raise NotFittedError(('This GridSearchCV instance was initialized '
+ raise NotFittedError(('This SearchCV instance was initialized '
'with refit=False. %s is '
'available only after refitting on the best '
'parameters. ') % method_name)
@@ -625,12 +625,15 @@ def preprocess(self, X, y=None, training=False):
if self.best_modelsel is not None:
X = self.best_modelsel.transform(X)
- if self.best_pca is not None:
- X = self.best_pca.transform(X)
-
if self.best_statisticalsel is not None:
X = self.best_statisticalsel.transform(X)
+ if self.best_rfesel is not None:
+ X = self.best_rfesel.transform(X)
+
+ if self.best_pca is not None:
+ X = self.best_pca.transform(X)
+
# Only resampling in training phase, i.e. if we have the labels
if y is not None:
if self.best_Sampler is not None:
@@ -932,7 +935,7 @@ def refit_and_score(self, X, y, parameters_all,
# Associate best options with new fits
(save_data, GroupSel, VarSel, SelectModel, feature_labels, scalers,
- encoders, Imputers, PCAs, StatisticalSel, ReliefSel, Sampler) = out
+ encoders, Imputers, PCAs, StatisticalSel, RFESel, ReliefSel, Sampler) = out
fitted_estimator = save_data[-2]
self.best_groupsel = GroupSel
self.best_scaler = scalers
@@ -944,6 +947,7 @@ def refit_and_score(self, X, y, parameters_all,
self.best_pca = PCAs
self.best_featlab = feature_labels
self.best_statisticalsel = StatisticalSel
+ self.best_rfesel = RFESel
self.best_reliefsel = ReliefSel
self.best_Sampler = Sampler
self.best_estimator_ = fitted_estimator
@@ -1067,7 +1071,7 @@ def getpredictions():
train, valid, p_all,
return_all=True)
(save_data, GroupSel, VarSel, SelectModel, feature_labels, scalers,
- encoders, Imputers, PCAs, StatisticalSel, ReliefSel, Sampler) = out
+ encoders, Imputers, PCAs, StatisticalSel, RFESel, ReliefSel, Sampler) = out
new_estimator.best_groupsel = GroupSel
new_estimator.best_scaler = scalers
new_estimator.best_varsel = VarSel
@@ -1078,6 +1082,7 @@ def getpredictions():
new_estimator.best_pca = PCAs
new_estimator.best_featlab = feature_labels
new_estimator.best_statisticalsel = StatisticalSel
+ new_estimator.best_rfesel = RFESel
new_estimator.best_reliefsel = ReliefSel
new_estimator.best_Sampler = Sampler
@@ -1672,7 +1677,7 @@ def _fit(self, X, y, groups, parameter_iterable):
estimatordata = f"vfs://tmp/GS/{name}/{fname}"
# Create the fastr network
- network = fastr.create_network('WORC_GridSearch_' + name)
+ network = fastr.create_network('WORC_CASH_' + name)
estimator_data = network.create_source('HDF5', id='estimator_source', resources=ResourceLimit(memory='4G'))
traintest_data = network.create_source('HDF5', id='traintest', resources=ResourceLimit(memory='4G'))
parameter_data = network.create_source('JsonFile', id='parameters', resources=ResourceLimit(memory='4G'))
diff --git a/WORC/classification/fitandscore.py b/WORC/classification/fitandscore.py
index bad49f3b..bb88a4cc 100644
--- a/WORC/classification/fitandscore.py
+++ b/WORC/classification/fitandscore.py
@@ -1,6 +1,6 @@
#!/usr/bin/env python
-# Copyright 2016-2022 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2023 Biomedical Imaging Group Rotterdam, Departments of
# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
#
# Licensed under the Apache License, Version 2.0 (the "License");
@@ -18,7 +18,7 @@
from sklearn.model_selection._validation import _fit_and_score
import numpy as np
from sklearn.linear_model import Lasso, LogisticRegression
-from sklearn.feature_selection import SelectFromModel
+from sklearn.feature_selection import SelectFromModel, RFE
from sklearn.decomposition import PCA
from sklearn.multiclass import OneVsRestClassifier
from sklearn.ensemble import RandomForestClassifier
@@ -37,6 +37,7 @@
import WORC
import WORC.addexceptions as ae
import time
+from xgboost.sklearn import XGBRegressor
# Specific imports for error management
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
@@ -191,6 +192,10 @@ def fit_and_score(X, y, scoring,
Either None if the statistical test feature selection is not used, or
the fitted object.
+ RFESel: WORC RFESel Object
+ Either None if the recursive feature elimination feature selection is not used, or
+ the fitted object.
+
ReliefSel: WORC ReliefSel Object
Either None if the RELIEF feature selection is not used, or
the fitted object.
@@ -239,6 +244,7 @@ def fit_and_score(X, y, scoring,
SelectModel = None
pca = None
StatisticalSel = None
+ RFESel = None
VarSel = None
ReliefSel = None
if isinstance(scorers, dict):
@@ -449,7 +455,7 @@ def fit_and_score(X, y, scoring,
ret[2] = runtime
if return_all:
- return ret, GroupSel, VarSel, SelectModel, feature_labels[0], scaler, encoder, imputer, pca, StatisticalSel, ReliefSel, Sampler
+ return ret, GroupSel, VarSel, SelectModel, feature_labels[0], scaler, encoder, imputer, pca, StatisticalSel, RFESel, ReliefSel, Sampler
else:
return ret
@@ -490,7 +496,7 @@ def fit_and_score(X, y, scoring,
if return_all:
return ret, GroupSel, VarSel, SelectModel,\
feature_labels[0], scaler, encoder, imputer, pca,\
- StatisticalSel, ReliefSel, Sampler
+ StatisticalSel, RFESel, ReliefSel, Sampler
else:
return ret
@@ -588,7 +594,7 @@ def fit_and_score(X, y, scoring,
if return_all:
return ret, GroupSel, VarSel, SelectModel,\
feature_labels[0], scaler, encoder, imputer, pca,\
- StatisticalSel, ReliefSel, Sampler
+ StatisticalSel, RFESel, ReliefSel, Sampler
else:
return ret
else:
@@ -674,7 +680,7 @@ def fit_and_score(X, y, scoring,
if return_all:
return ret, GroupSel, VarSel, SelectModel,\
feature_labels[0], scaler, encoder, imputer, pca,\
- StatisticalSel, ReliefSel, Sampler
+ StatisticalSel, RFESel, ReliefSel, Sampler
else:
return ret
@@ -698,6 +704,198 @@ def fit_and_score(X, y, scoring,
if not return_all:
del SelectModel
+ # --------------------------------------------------------------------
+ # Feature selection based on a statistical test
+ if 'StatisticalTestUse' in para_estimator.keys():
+ if para_estimator['StatisticalTestUse'] == 'True':
+ metric = para_estimator['StatisticalTestMetric']
+ threshold = para_estimator['StatisticalTestThreshold']
+ if verbose:
+ print(f"Selecting features based on statistical test. Method {metric}, threshold {round(threshold, 5)}.")
+ print("\t Original Length: " + str(len(X_train[0])))
+
+ StatisticalSel = StatisticalTestThreshold(metric=metric,
+ threshold=threshold)
+
+ StatisticalSel.fit(X_train, y)
+ X_train_temp = StatisticalSel.transform(X_train)
+ if len(X_train_temp[0]) == 0:
+ if verbose:
+ print('[WARNING] No features are selected! Probably your statistical test feature selection was too strict.')
+
+ StatisticalSel = None
+ if skip:
+ if verbose:
+ print('[WARNING] Refitting, so we need an estimator, thus skipping this step.')
+ parameters['StatisticalTestUse'] = 'False'
+ else:
+ if verbose:
+ print('[WARNING] Returning NaN as performance.')
+
+ # return NaN as performance
+ para_estimator = delete_nonestimator_parameters(para_estimator)
+
+ # Update the runtime
+ end_time = time.time()
+ runtime = end_time - start_time
+ if return_train_score:
+ ret[3] = runtime
+ else:
+ ret[2] = runtime
+ if return_all:
+ return ret, GroupSel, VarSel, SelectModel,\
+ feature_labels[0], scaler, encoder, imputer, pca,\
+ StatisticalSel, RFESel, ReliefSel, Sampler
+ else:
+ return ret
+
+ else:
+ X_train = StatisticalSel.transform(X_train)
+ X_test = StatisticalSel.transform(X_test)
+ feature_labels = StatisticalSel.transform(feature_labels)
+
+ if verbose:
+ print("\t New Length: " + str(len(X_train[0])))
+
+ # Delete the statistical test keys
+ del para_estimator['StatisticalTestUse']
+ if 'StatisticalTestMetric' in para_estimator.keys():
+ del para_estimator['StatisticalTestMetric']
+
+ if 'StatisticalTestThreshold' in para_estimator.keys():
+ del para_estimator['StatisticalTestThreshold']
+
+ # Delete the object if we do not need to return it
+ if not return_all:
+ del StatisticalSel
+
+ # --------------------------------------------------------------------
+ # Feature selection through recursive feature elimination
+ if 'RFE' in para_estimator.keys():
+ model = para_estimator['RFE_estimator']
+ if para_estimator['RFE'] == 'True':
+ if verbose:
+ print(f"Selecting features using recursive feature elimination using model {model}.")
+
+ if model == 'Lasso':
+ # Use lasso model for feature selection
+ alpha = para_estimator['RFE_lasso_alpha']
+ selectestimator = Lasso(alpha=alpha, random_state=random_seed)
+
+ elif model == 'LR':
+ # Use logistic regression model for feature selection
+ selectestimator = LogisticRegression(random_state=random_seed)
+
+ elif model == 'RF':
+ # Use random forest model for feature selection
+ n_estimators = para_estimator['RFE_n_trees']
+ selectestimator = RandomForestClassifier(n_estimators=n_estimators,
+ random_state=random_seed)
+ else:
+ raise ae.WORCKeyError(f'Model {model} is not known for RFE. Use Lasso, LR, or RF.')
+
+ if len(y_train.shape) >= 2:
+ # Multilabel or regression. Regression: second dimension has length 1
+ if y_train.shape[1] > 1 and model != 'RF':
+ raise ae.WORCValueError(f'Model {model} is not suitable for multiclass classification. Please use RF or do not use RFE.')
+
+ # Prefit model
+ selectestimator.fit(X_train, y_train)
+
+ # Use fit to select optimal features
+ n_features_to_select = para_estimator['RFE_n_features_to_select']
+ step = para_estimator['RFE_step']
+ RFESel = RFE(selectestimator,
+ n_features_to_select=n_features_to_select,
+ step=step)
+ try:
+ RFESel.fit(X_train, y_train)
+ except ValueError:
+ if skip:
+ if verbose:
+ print('[WARNING] Refitting, so we need an estimator, thus skipping this step.')
+ parameters['RFE'] = 'False'
+
+ else:
+ if verbose:
+ print('[WARNING] RFE cannot be fitted with these settings, too few features left, returning NaN as performance.')
+
+ # return NaN as performance
+ para_estimator = delete_nonestimator_parameters(para_estimator)
+ RFESel = None
+
+ # Update the runtime
+ end_time = time.time()
+ runtime = end_time - start_time
+ if return_train_score:
+ ret[3] = runtime
+ else:
+ ret[2] = runtime
+ if return_all:
+ return ret, GroupSel, VarSel, SelectModel,\
+ feature_labels[0], scaler, encoder, imputer, pca,\
+ StatisticalSel, RFESel, ReliefSel, Sampler
+ else:
+ return ret
+ else:
+ if verbose:
+ print("\t Original Length: " + str(len(X_train[0])))
+
+ X_train_temp = RFESel.transform(X_train)
+ if len(X_train_temp[0]) == 0:
+ if verbose:
+ print('[WARNING]: No features are selected! Probably your data is too noisy or the selection too strict.')
+
+ RFESel = None
+ if skip:
+ if verbose:
+ print('[WARNING] Refitting, so we need an estimator, thus skipping this step.')
+ parameters['RFE'] = 'False'
+ else:
+ if verbose:
+ print('[WARNING] Returning NaN as performance.')
+
+ # return NaN as performance
+ para_estimator = delete_nonestimator_parameters(para_estimator)
+
+ # Update the runtime
+ end_time = time.time()
+ runtime = end_time - start_time
+ if return_train_score:
+ ret[3] = runtime
+ else:
+ ret[2] = runtime
+ if return_all:
+ return ret, GroupSel, VarSel, SelectModel,\
+ feature_labels[0], scaler, encoder, imputer, pca,\
+ StatisticalSel, RFESel, ReliefSel, Sampler
+ else:
+ return ret
+
+ else:
+ X_train = RFESel.transform(X_train)
+ X_test = RFESel.transform(X_test)
+ feature_labels = RFESel.transform(feature_labels)
+
+ if verbose:
+ print("\t New Length: " + str(len(X_train[0])))
+
+ del para_estimator['RFE']
+ if 'RFE_lasso_alpha' in para_estimator.keys():
+ del para_estimator['RFE_lasso_alpha']
+ if 'RFE_estimator' in para_estimator.keys():
+ del para_estimator['RFE_estimator']
+ if 'RFE_n_trees' in para_estimator.keys():
+ del para_estimator['RFE_n_trees']
+ if 'RFE_n_features_to_select' in para_estimator.keys():
+ del para_estimator['RFE_n_features_to_select']
+ if 'RFE_n_trees' in para_estimator.keys():
+ del para_estimator['RFE_n_trees']
+
+ # Delete the object if we do not need to return it
+ if not return_all:
+ del RFESel
+
# ----------------------------------------------------------------
# PCA dimensionality reduction
# Principle Component Analysis
@@ -736,7 +934,7 @@ def fit_and_score(X, y, scoring,
if return_all:
return ret, GroupSel, VarSel, SelectModel,\
feature_labels[0], scaler, encoder, imputer, pca,\
- StatisticalSel, ReliefSel, Sampler
+ StatisticalSel, RFESel, ReliefSel, Sampler
else:
return ret
@@ -778,7 +976,7 @@ def fit_and_score(X, y, scoring,
if return_all:
return ret, GroupSel, VarSel, SelectModel,\
feature_labels[0], scaler, encoder, imputer, pca,\
- StatisticalSel, ReliefSel, Sampler
+ StatisticalSel, RFESel, ReliefSel, Sampler
else:
return ret
else:
@@ -825,7 +1023,7 @@ def fit_and_score(X, y, scoring,
if return_all:
return ret, GroupSel, VarSel, SelectModel,\
feature_labels[0], scaler, encoder, imputer, pca,\
- StatisticalSel, ReliefSel, Sampler
+ StatisticalSel, RFESel, ReliefSel, Sampler
else:
return ret
@@ -841,71 +1039,6 @@ def fit_and_score(X, y, scoring,
if 'PCAType' in para_estimator.keys():
del para_estimator['PCAType']
- # --------------------------------------------------------------------
- # Feature selection based on a statistical test
- if 'StatisticalTestUse' in para_estimator.keys():
- if para_estimator['StatisticalTestUse'] == 'True':
- metric = para_estimator['StatisticalTestMetric']
- threshold = para_estimator['StatisticalTestThreshold']
- if verbose:
- print(f"Selecting features based on statistical test. Method {metric}, threshold {round(threshold, 5)}.")
- print("\t Original Length: " + str(len(X_train[0])))
-
- StatisticalSel = StatisticalTestThreshold(metric=metric,
- threshold=threshold)
-
- StatisticalSel.fit(X_train, y)
- X_train_temp = StatisticalSel.transform(X_train)
- if len(X_train_temp[0]) == 0:
- if verbose:
- print('[WARNING] No features are selected! Probably your statistical test feature selection was too strict.')
-
- StatisticalSel = None
- if skip:
- if verbose:
- print('[WARNING] Refitting, so we need an estimator, thus skipping this step.')
- parameters['StatisticalTestUse'] = 'False'
- else:
- if verbose:
- print('[WARNING] Returning NaN as performance.')
-
- # return NaN as performance
- para_estimator = delete_nonestimator_parameters(para_estimator)
-
- # Update the runtime
- end_time = time.time()
- runtime = end_time - start_time
- if return_train_score:
- ret[3] = runtime
- else:
- ret[2] = runtime
- if return_all:
- return ret, GroupSel, VarSel, SelectModel,\
- feature_labels[0], scaler, encoder, imputer, pca,\
- StatisticalSel, ReliefSel, Sampler
- else:
- return ret
-
- else:
- X_train = StatisticalSel.transform(X_train)
- X_test = StatisticalSel.transform(X_test)
- feature_labels = StatisticalSel.transform(feature_labels)
-
- if verbose:
- print("\t New Length: " + str(len(X_train[0])))
-
- # Delete the statistical test keys
- del para_estimator['StatisticalTestUse']
- if 'StatisticalTestMetric' in para_estimator.keys():
- del para_estimator['StatisticalTestMetric']
-
- if 'StatisticalTestThreshold' in para_estimator.keys():
- del para_estimator['StatisticalTestThreshold']
-
- # Delete the object if we do not need to return it
- if not return_all:
- del StatisticalSel
-
# ------------------------------------------------------------------------
# Use object resampling
if 'Resampling_Use' in para_estimator.keys():
@@ -969,7 +1102,9 @@ def fit_and_score(X, y, scoring,
ret[2] = runtime
if return_all:
- return ret, GroupSel, VarSel, SelectModel, feature_labels[0], scaler, encoder, imputer, pca, StatisticalSel, ReliefSel, Sampler
+ return ret, GroupSel, VarSel, SelectModel,\
+ feature_labels[0], scaler, encoder, imputer,\
+ pca, StatisticalSel, RFESel, ReliefSel, Sampler
else:
return ret
else:
@@ -1054,7 +1189,7 @@ def fit_and_score(X, y, scoring,
feature_values = np.concatenate((X_train, X_test), axis=0)
y_all = np.concatenate((y_train, y_test), axis=0)
para_estimator = None
-
+
try:
ret = _fit_and_score(estimator, feature_values, y_all,
scorers, new_train,
@@ -1080,7 +1215,7 @@ def fit_and_score(X, y, scoring,
ret[2] = runtime
if return_all:
- return ret, GroupSel, VarSel, SelectModel, feature_labels[0], scaler, encoder, imputer, pca, StatisticalSel, ReliefSel, Sampler
+ return ret, GroupSel, VarSel, SelectModel, feature_labels[0], scaler, encoder, imputer, pca, StatisticalSel, RFESel, ReliefSel, Sampler
else:
return ret
else:
@@ -1113,7 +1248,7 @@ def fit_and_score(X, y, scoring,
ret[2] = runtime
if return_all:
- return ret, GroupSel, VarSel, SelectModel, feature_labels[0], scaler, encoder, imputer, pca, StatisticalSel, ReliefSel, Sampler
+ return ret, GroupSel, VarSel, SelectModel, feature_labels[0], scaler, encoder, imputer, pca, StatisticalSel, RFESel, ReliefSel, Sampler
else:
return ret
@@ -1141,6 +1276,12 @@ def delete_nonestimator_parameters(parameters):
'SelectFromModel_lasso_alpha',
'SelectFromModel_estimator',
'SelectFromModel_n_trees',
+ 'RFE',
+ 'RFE_lasso_alpha',
+ 'RFE_estimator',
+ 'RFE_n_trees',
+ 'RFE_n_features_to_select',
+ 'RFE_step',
'Featsel_Variance',
'FeatPreProcess',
'FeatureScaling',
diff --git a/WORC/classification/parameter_optimization.py b/WORC/classification/parameter_optimization.py
index 2eb1144d..8852194c 100644
--- a/WORC/classification/parameter_optimization.py
+++ b/WORC/classification/parameter_optimization.py
@@ -56,9 +56,8 @@ def random_search_parameters(features, labels, N_iter, test_size,
random_search: sklearn randomsearch object containing the results.
"""
if random_seed is None:
- #random_seed = np.random.randint(1, 5000)
- # Fix the random seed for testing
- random_seed = 42
+ random_seed = np.random.randint(1, 5000)
+
random_state = check_random_state(random_seed)
regressors = ['SVR', 'RFR', 'SGDR', 'Lasso', 'ElasticNet']
diff --git a/WORC/classification/trainclassifier.py b/WORC/classification/trainclassifier.py
index 6dc4bfc7..f5340eb0 100644
--- a/WORC/classification/trainclassifier.py
+++ b/WORC/classification/trainclassifier.py
@@ -1,6 +1,6 @@
#!/usr/bin/env python
-# Copyright 2016-2022 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2023 Biomedical Imaging Group Rotterdam, Departments of
# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
#
# Licensed under the Apache License, Version 2.0 (the "License");
@@ -148,6 +148,9 @@ def trainclassifier(feat_train, patientinfo_train, config,
# Add non-classifier parameters
param_grid = add_parameters_to_grid(param_grid, config)
+
+ # Delete parameters for hyperoptimization which already have been used
+ del config['HyperOptimization']['fix_random_seed']
# For N_iter, perform k-fold crossvalidation
outputfolder = os.path.dirname(output_hdf)
@@ -245,6 +248,28 @@ def add_parameters_to_grid(param_grid, config):
discrete_uniform(loc=config['Featsel']['SelectFromModel_n_trees'][0],
scale=config['Featsel']['SelectFromModel_n_trees'][1])
+ param_grid['RFE'] =\
+ boolean_uniform(threshold=config['Featsel']['RFE'])
+
+ param_grid['RFE_lasso_alpha'] =\
+ uniform(loc=config['Featsel']['RFE_lasso_alpha'][0],
+ scale=config['Featsel']['RFE_lasso_alpha'][1])
+
+ param_grid['RFE_estimator'] =\
+ config['Featsel']['RFE_estimator']
+
+ param_grid['RFE_n_trees'] =\
+ discrete_uniform(loc=config['Featsel']['RFE_n_trees'][0],
+ scale=config['Featsel']['RFE_n_trees'][1])
+
+ param_grid['RFE_n_features_to_select'] =\
+ discrete_uniform(loc=config['Featsel']['RFE_n_features_to_select'][0],
+ scale=config['Featsel']['RFE_n_features_to_select'][1])
+
+ param_grid['RFE_step'] =\
+ discrete_uniform(loc=config['Featsel']['RFE_step'][0],
+ scale=config['Featsel']['RFE_step'][1])
+
param_grid['UsePCA'] =\
boolean_uniform(threshold=config['Featsel']['UsePCA'])
param_grid['PCAType'] = config['Featsel']['PCAType']
@@ -278,7 +303,11 @@ def add_parameters_to_grid(param_grid, config):
scale=config['Featsel']['ReliefNumFeatures'][1])
# Add a random seed, which is required for many methods
- param_grid['random_seed'] =\
- discrete_uniform(loc=0, scale=2**32 - 1)
+ if config['HyperOptimization']['fix_random_seed']:
+ # Fix the random seed
+ param_grid['random_seed'] = [22]
+ else:
+ param_grid['random_seed'] =\
+ discrete_uniform(loc=0, scale=2**32 - 1)
return param_grid
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.IOparser.doctree b/WORC/doc/_build/doctrees/autogen/WORC.IOparser.doctree
index 1a65a597..15fb7b78 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.IOparser.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.IOparser.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.classification.doctree b/WORC/doc/_build/doctrees/autogen/WORC.classification.doctree
index 7c5cd3d6..a22da7e6 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.classification.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.classification.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.config.doctree b/WORC/doc/_build/doctrees/autogen/WORC.config.doctree
index f4c0b516..09bbe4b0 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.config.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.config.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.detectors.doctree b/WORC/doc/_build/doctrees/autogen/WORC.detectors.doctree
index 1123c4e6..768d9865 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.detectors.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.detectors.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.doctree b/WORC/doc/_build/doctrees/autogen/WORC.doctree
index d9eec74f..94d1bb75 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.exampledata.doctree b/WORC/doc/_build/doctrees/autogen/WORC.exampledata.doctree
index cd080ad5..d9148c23 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.exampledata.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.exampledata.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.facade.doctree b/WORC/doc/_build/doctrees/autogen/WORC.facade.doctree
index 6248df52..291074d1 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.facade.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.facade.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.featureprocessing.doctree b/WORC/doc/_build/doctrees/autogen/WORC.featureprocessing.doctree
index 3ac12155..f70696f7 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.featureprocessing.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.featureprocessing.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.plotting.doctree b/WORC/doc/_build/doctrees/autogen/WORC.plotting.doctree
index 949c55a6..1cf2b89e 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.plotting.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.plotting.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.processing.doctree b/WORC/doc/_build/doctrees/autogen/WORC.processing.doctree
index df935957..5c643243 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.processing.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.processing.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.resources.doctree b/WORC/doc/_build/doctrees/autogen/WORC.resources.doctree
index 9cb804a5..b3c3dd26 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.resources.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.resources.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.resources.fastr_tests.doctree b/WORC/doc/_build/doctrees/autogen/WORC.resources.fastr_tests.doctree
index 3b15532b..6a3a51a3 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.resources.fastr_tests.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.resources.fastr_tests.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.resources.fastr_tools.doctree b/WORC/doc/_build/doctrees/autogen/WORC.resources.fastr_tools.doctree
index 5f50414d..6cad2427 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.resources.fastr_tools.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.resources.fastr_tools.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.tools.doctree b/WORC/doc/_build/doctrees/autogen/WORC.tools.doctree
index 27120bc0..e8a2c764 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.tools.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.tools.doctree differ
diff --git a/WORC/doc/_build/doctrees/environment.pickle b/WORC/doc/_build/doctrees/environment.pickle
index 45c3f13c..a9a2b869 100644
Binary files a/WORC/doc/_build/doctrees/environment.pickle and b/WORC/doc/_build/doctrees/environment.pickle differ
diff --git a/WORC/doc/_build/doctrees/index.doctree b/WORC/doc/_build/doctrees/index.doctree
index 6c01390b..e217e49e 100644
Binary files a/WORC/doc/_build/doctrees/index.doctree and b/WORC/doc/_build/doctrees/index.doctree differ
diff --git a/WORC/doc/_build/doctrees/static/changelog.doctree b/WORC/doc/_build/doctrees/static/changelog.doctree
index e1a5d542..456447a2 100644
Binary files a/WORC/doc/_build/doctrees/static/changelog.doctree and b/WORC/doc/_build/doctrees/static/changelog.doctree differ
diff --git a/WORC/doc/_build/doctrees/static/configuration.doctree b/WORC/doc/_build/doctrees/static/configuration.doctree
index cc67cde8..bee77b67 100644
Binary files a/WORC/doc/_build/doctrees/static/configuration.doctree and b/WORC/doc/_build/doctrees/static/configuration.doctree differ
diff --git a/WORC/doc/_build/doctrees/static/introduction.doctree b/WORC/doc/_build/doctrees/static/introduction.doctree
index 36565909..cc8fd726 100644
Binary files a/WORC/doc/_build/doctrees/static/introduction.doctree and b/WORC/doc/_build/doctrees/static/introduction.doctree differ
diff --git a/WORC/doc/_build/doctrees/static/quick_start.doctree b/WORC/doc/_build/doctrees/static/quick_start.doctree
index bf2f9ab5..1b40639a 100644
Binary files a/WORC/doc/_build/doctrees/static/quick_start.doctree and b/WORC/doc/_build/doctrees/static/quick_start.doctree differ
diff --git a/WORC/doc/_build/doctrees/static/user_manual.doctree b/WORC/doc/_build/doctrees/static/user_manual.doctree
index 3aab0d6a..9f28c6d3 100644
Binary files a/WORC/doc/_build/doctrees/static/user_manual.doctree and b/WORC/doc/_build/doctrees/static/user_manual.doctree differ
diff --git a/WORC/doc/_build/html/.buildinfo b/WORC/doc/_build/html/.buildinfo
index e526dc33..e87b83f2 100644
--- a/WORC/doc/_build/html/.buildinfo
+++ b/WORC/doc/_build/html/.buildinfo
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: 385a6cf65021e68246010618f9e139d5
+config: 9f4f6a5f34b465687f7ac7c47dd3b2c4
tags: 645f666f9bcd5a90fca523b33c5a78b7
diff --git a/WORC/doc/_build/html/_modules/WORC/IOparser/config_WORC.html b/WORC/doc/_build/html/_modules/WORC/IOparser/config_WORC.html
index 936b6821..f8685ee3 100644
--- a/WORC/doc/_build/html/_modules/WORC/IOparser/config_WORC.html
+++ b/WORC/doc/_build/html/_modules/WORC/IOparser/config_WORC.html
@@ -1,39 +1,42 @@
-
-
+
#!/usr/bin/env python
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2023 Biomedical Imaging Group Rotterdam, Departments of# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands## Licensed under the Apache License, Version 2.0 (the "License");
@@ -195,20 +202,21 @@
Source code for WORC.IOparser.config_WORC
settings_dict: dictionary containing all parsed settings. """ifnotos.path.exists(config_file_path):
- e=f'File {config_file_path} does not exist!'
+ e=f'File {config_file_path} does not exist!'raiseae.WORCKeyError(e)settings=configparser.ConfigParser()settings.read(config_file_path)
- settings_dict={'ImageFeatures':dict(),'General':dict(),
- 'SVMFeatures':dict()}
+ settings_dict={'Preprocessing':dict(),'ImageFeatures':dict(),'General':dict(),
+ 'SVMFeatures':dict(),'Ensemble':dict(),
+ 'Labels':dict()}settings_dict['ImageFeatures']['image_type']=\
str(settings['ImageFeatures']['image_type'])settings_dict['General']['FeatureCalculators']=\
- [str(item).strip()foritemin
+ [str(item).strip('[]')foriteminsettings['General']['FeatureCalculators'].split(',')]settings_dict['General']['Preprocessing']=\
@@ -219,7 +227,31 @@
Source code for WORC.IOparser.config_WORC
settings_dict['General']['Segmentix']=\
settings['General'].getboolean('Segmentix')
+
+ # Settings for ensembling
+ settings_dict['Ensemble']['Method']=\
+ str(settings['Ensemble']['Method'])
+ settings_dict['Ensemble']['Size']=\
+ int(settings['Ensemble']['Size'])
+
+ # Label settings
+ settings_dict['Labels']['label_names']=\
+ [str(item).strip()foritemin
+ settings['Labels']['label_names'].split(',')]
+ settings_dict['Labels']['modus']=\
+ str(settings['Labels']['modus'])
+
+ # Whether to use some methods or not
+ settings_dict['General']['ComBat']=\
+ str(settings['General']['ComBat'])
+
+ settings_dict['General']['Fingerprint']=\
+ str(settings['General']['Fingerprint'])
+
+ settings_dict['Preprocessing']['Resampling']=\
+ settings['Preprocessing'].getboolean('Resampling')
+
returnsettings_dict
Source code for WORC.IOparser.config_io_classifier
#!/usr/bin/env python
-# Copyright 2016-2022 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2023 Biomedical Imaging Group Rotterdam, Departments of# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands## Licensed under the Apache License, Version 2.0 (the "License");
@@ -196,7 +203,7 @@
Source code for WORC.IOparser.config_io_classifier
"""
ifnotos.path.exists(config_file_path):
- e=f'File {config_file_path} does not exist!'
+ e=f'File {config_file_path} does not exist!'raiseae.WORCKeyError(e)settings=configparser.ConfigParser()
@@ -257,6 +264,29 @@
Source code for WORC.IOparser.config_io_classifier
Source code for WORC.IOparser.config_preprocessing
#!/usr/bin/env python
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2023 Biomedical Imaging Group Rotterdam, Departments of# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands## Licensed under the Apache License, Version 2.0 (the "License");
@@ -195,7 +202,7 @@
Source code for WORC.IOparser.config_preprocessing
settings_dict: dictionary containing all parsed settings.
"""ifnotos.path.exists(config_file_path):
- e=f'File {config_file_path} does not exist!'
+ e=f'File {config_file_path} does not exist!'raiseae.WORCKeyError(e)settings=configparser.ConfigParser()
@@ -225,8 +232,21 @@
Source code for WORC.IOparser.config_preprocessing
settings['Preprocessing']['Clipping_Range'].split(',')]iflen(settings_dict['Preprocessing']['Clipping_Range'])!=2:
- raiseae.WORCValueError(f"Clipping range should be two floats split by a comma, got {settings['Preprocessing']['Clipping_Range']}.")
+ raiseae.WORCValueError(f"Clipping range should be two floats split by a comma, got {settings['Preprocessing']['Clipping_Range']}.")
+
+ # Histogram equalization
+ settings_dict['Preprocessing']['HistogramEqualization']=\
+ settings['Preprocessing'].getboolean('HistogramEqualization')
+
+ settings_dict['Preprocessing']['HistogramEqualization_Alpha']=\
+ float(settings['Preprocessing']['HistogramEqualization_Alpha'])
+ settings_dict['Preprocessing']['HistogramEqualization_Beta']=\
+ float(settings['Preprocessing']['HistogramEqualization_Beta'])
+
+ settings_dict['Preprocessing']['HistogramEqualization_Radius']=\
+ int(settings['Preprocessing']['HistogramEqualization_Radius'])
+
# Normalizationsettings_dict['Preprocessing']['Normalize']=\
settings['Preprocessing'].getboolean('Normalize')
@@ -270,7 +290,7 @@
Source code for WORC.IOparser.config_preprocessing
if len(settings_dict['Preprocessing']['Resampling_spacing'])!=3:s=settings_dict['Preprocessing']['Resampling_spacing']
- raiseae.WORCValueError(f'Resampling spacing should be three elements, got {s}')
+ raiseae.WORCValueError(f'Resampling spacing should be three elements, got {s}')returnsettings_dict
@@ -279,20 +299,25 @@
Source code for WORC.IOparser.config_preprocessing
#!/usr/bin/env python
-# Copyright 2016-2022 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2023 Biomedical Imaging Group Rotterdam, Departments of# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands## Licensed under the Apache License, Version 2.0 (the "License");
@@ -184,20 +191,20 @@
[docs]defbuild(self,buildtype='training'):"""Build the network based on the given attributes. Parameters ----------
- wtype: string, default 'training'
+ buildtype: string, default 'training' Specify the WORC execution type.
- - testing: use if you have a trained classifier and want to
+ - inference: use if you have a trained classifier and want to train it on some new images. - training: use if you want to train a classifier from a dataset. """
- self.wtype=wtype
- ifwtype=='training':
+ ifbuildtype=='training':self.build_training()
- elifwtype=='testing':
- self.build_testing()
-
+ elifbuildtype=='inference':
+ raiseWORCexceptions.WORCValueError("Inference workflow is still WIP and does not fully work yet.")
+ self.TrainTest=True
+ self.OnlyTest=True
+ self.build_inference()
+
[docs]defbuild_training(self):"""Build the training network based on the given attributes."""# We either need images or features for Radiomicsifself.images_testorself.features_test:
+ ifnotself.labels_test:
+ m="You provided images and/or features for a test set, but not ground truth labels. Please also provide labels for the test set."
+ raiseWORCexceptions.WORCValueError(m)self.TrainTest=True
-
+
ifself.images_trainorself.features_train:print('Building training network...')# We currently require labels for supervised learning
@@ -735,13 +762,10 @@
Source code for WORC.WORC
# NOTE: We currently use the first configuration as general configimage_types=list()forcinrange(len(self.configs)):
- iftype(self.configs[c])==str:
- # Probably, c is a configuration file
- self.configs[c]=config_io.load_config(self.configs[c])image_types.append(self.configs[c]['ImageFeatures']['image_type'])ifself.configs[0]['General']['Fingerprint']=='True'andany(imtnotinall_modalitiesforimtinimage_types):
- m=f'One of your image types {image_types} is not one of the valid image types {quantitative_modalities + qualitative_modalities}. This is mandatory to set when performing fingerprinting, see the WORC Documentation (https://worc.readthedocs.io/en/latest/static/configuration.html#imagefeatures).'
+ m=f'One of your image types {image_types} is not one of the valid image types {quantitative_modalities+qualitative_modalities}. This is mandatory to set when performing fingerprinting, see the WORC Documentation (https://worc.readthedocs.io/en/latest/static/configuration.html#imagefeatures).'raiseWORCexceptions.WORCValueError(m)# Create config source
@@ -904,9 +928,9 @@
Source code for WORC.WORC
else:nseg=len(self.segmentations_train)nim=len(image_types)
- m=f'Length of segmentations for training is '+\
- f'{nseg}: should be equal to number of images'+\
- f' ({nim}) or 1 when using registration.'
+ m=f'Length of segmentations for training is '+\
+ f'{nseg}: should be equal to number of images'+\
+ f' ({nim}) or 1 when using registration.'raiseWORCexceptions.WORCValueError(m)# BUG: We assume that first type defines if we use segmentix
@@ -954,8 +978,8 @@
# Add to fingerprinting if requiredifself.configs[0]['General']['Fingerprint']=='True':
- self.links_fingerprinting[f'{label}_segmentations']=self.network.create_link(self.converters_seg_train[label].outputs['image'],self.node_fingerprinters[label].inputs['segmentations_train'])
- self.links_fingerprinting[f'{label}_segmentations'].collapse='train'
+ self.links_fingerprinting[f'{label}_segmentations']=self.network.create_link(self.converters_seg_train[label].outputs['image'],self.node_fingerprinters[label].inputs['segmentations_train'])
+ self.links_fingerprinting[f'{label}_segmentations'].collapse='train'elifself.segmode=='Register':# ---------------------------------------------
@@ -1115,8 +1139,8 @@
Source code for WORC.WORC
# Add to fingerprinting if requiredifself.configs[0]['General']['Fingerprint']=='True':# Since there are no segmentations yet of this modality, just use those of the first, provided modality
- self.links_fingerprinting[f'{label}_segmentations']=self.network.create_link(self.converters_seg_train[self.modlabels[0]].outputs['image'],self.node_fingerprinters[label].inputs['segmentations_train'])
- self.links_fingerprinting[f'{label}_segmentations'].collapse='train'
+ self.links_fingerprinting[f'{label}_segmentations']=self.network.create_link(self.converters_seg_train[self.modlabels[0]].outputs['image'],self.node_fingerprinters[label].inputs['segmentations_train'])
+ self.links_fingerprinting[f'{label}_segmentations'].collapse='train'# -----------------------------------------------------# Optionally, add segmentix, the in-house segmentation
@@ -1165,13 +1189,13 @@
Source code for WORC.WORC
# Link features to ComBatself.links_Combat1_train[label]=list()fori_node,fnameinenumerate(self.featurecalculators[label]):
- self.links_Combat1_train[label].append(self.ComBat.inputs['features_train'][f'{label}_{self.featurecalculators[label][i_node]}']<<self.featureconverter_train[label][i_node].outputs['feat_out'])
+ self.links_Combat1_train[label].append(self.ComBat.inputs['features_train'][f'{label}_{self.featurecalculators[label][i_node]}']<<self.featureconverter_train[label][i_node].outputs['feat_out'])self.links_Combat1_train[label][i_node].collapse='train'ifself.TrainTest:self.links_Combat1_test[label]=list()fori_node,fnameinenumerate(self.featurecalculators[label]):
- self.links_Combat1_test[label].append(self.ComBat.inputs['features_test'][f'{label}_{self.featurecalculators[label][i_node]}']<<self.featureconverter_test[label][i_node].outputs['feat_out'])
+ self.links_Combat1_test[label].append(self.ComBat.inputs['features_test'][f'{label}_{self.featurecalculators[label][i_node]}']<<self.featureconverter_test[label][i_node].outputs['feat_out'])self.links_Combat1_test[label][i_node].collapse='test'# -----------------------------------------------------
@@ -1189,7 +1213,7 @@
Source code for WORC.WORC
# Append features to the classificationifnotself.configs[0]['General']['ComBat']=='True':
- self.links_C1_train[label].append(self.classify.inputs['features_train'][f'{label}_{self.featurecalculators[label][i_node]}']<<self.featureconverter_train[label][i_node].outputs['feat_out'])
+ self.links_C1_train[label].append(self.classify.inputs['features_train'][f'{label}_{self.featurecalculators[label][i_node]}']<<self.featureconverter_train[label][i_node].outputs['feat_out'])self.links_C1_train[label][i_node].collapse='train'# Save output
@@ -1202,7 +1226,7 @@
Source code for WORC.WORC
# Append features to the classificationifnotself.configs[0]['General']['ComBat']=='True':
- self.links_C1_test[label].append(self.classify.inputs['features_test'][f'{label}_{self.featurecalculators[label][i_node]}']<<self.featureconverter_test[label][i_node].outputs['feat_out'])
+ self.links_C1_test[label].append(self.classify.inputs['features_test'][f'{label}_{self.featurecalculators[label][i_node]}']<<self.featureconverter_test[label][i_node].outputs['feat_out'])self.links_C1_test[label][i_node].collapse='test'# Save output
@@ -1254,6 +1278,358 @@
Source code for WORC.WORC
else:raiseWORCexceptions.WORCIOError("Please provide either images or features.")
+
[docs]defbuild_inference(self):
+ """Build a network to test an already trained model on a test dataset based on the given attributes."""
+ #FIXME WIP
+ ifself.images_testorself.features_test:
+ ifnotself.labels_test:
+ m="You provided images and/or features for a test set, but not ground truth labels. Please also provide labels for the test set."
+ raiseWORCexceptions.WORCValueError(m)
+ else:
+ m="Please provide either images and/or features for your test set."
+ raiseWORCexceptions.WORCValueError(m)
+
+ ifnotself.configs:
+ m='For a testing workflow, you need to provide a WORC config.ini file'
+ raiseWORCexceptions.WORCValueError(m)
+
+ self.network=fastr.create_network(self.name)
+
+ # Add trained model node
+ memory=self.fastr_memory_parameters['Classification']
+ self.source_trained_model=self.network.create_source('HDF5',
+ id='trained_model',
+ node_group='trained_model',step_id='general_sources')
+
+ ifself.images_testorself.features_test:
+ print('Building testing network...')
+ # We currently require labels for supervised learning
+ ifself.labels_test:
+ self.network=fastr.create_network(self.name)
+
+ # Extract some information from the configs
+ image_types=list()
+ forconf_itinrange(len(self.configs)):
+ iftype(self.configs[conf_it])==str:
+ # Config is a .ini file, load
+ temp_conf=config_io.load_config(self.configs[conf_it])
+ else:
+ temp_conf=self.configs[conf_it]
+
+ image_type=temp_conf['ImageFeatures']['image_type']
+ image_types.append(image_type)
+
+ # NOTE: We currently use the first configuration as general config
+ ifconf_it==0:
+ print(temp_conf)
+ ensemble_method=[temp_conf['Ensemble']['Method']]
+ ensemble_size=[temp_conf['Ensemble']['Size']]
+ label_names=[temp_conf['Labels']['label_names']]
+ use_ComBat=temp_conf['General']['ComBat']
+ use_segmentix=temp_conf['General']['Segmentix']
+
+ # Create various input sources
+ self.source_patientclass_test=\
+ self.network.create_source('PatientInfoFile',
+ id='patientclass_test',
+ node_group='pctest',step_id='test_sources')
+
+ self.source_ensemble_method=\
+ self.network.create_constant('String',ensemble_method,
+ id='ensemble_method',
+ step_id='Evaluation')
+
+ self.source_ensemble_size=\
+ self.network.create_constant('String',ensemble_size,
+ id='ensemble_size',
+ step_id='Evaluation')
+
+ self.source_LabelType=\
+ self.network.create_constant('String',label_names,
+ id='LabelType',
+ step_id='Evaluation')
+
+ memory=self.fastr_memory_parameters['PlotEstimator']
+ self.plot_estimator=\
+ self.network.create_node('worc/PlotEstimator:1.0',tool_version='1.0',
+ id='plot_Estimator',
+ resources=ResourceLimit(memory=memory),
+ step_id='Evaluation')
+
+ # Links to performance creator
+ self.plot_estimator.inputs['ensemble_method']=self.source_ensemble_method.output
+ self.plot_estimator.inputs['ensemble_size']=self.source_ensemble_size.output
+ self.plot_estimator.inputs['label_type']=self.source_LabelType.output
+ pinfo=self.source_patientclass_test.output
+ self.plot_estimator.inputs['prediction']=self.source_trained_model.output
+ self.plot_estimator.inputs['pinfo']=pinfo
+
+ # Performance output
+ self.sink_performance=self.network.create_sink('JsonFile',id='performance',step_id='general_sinks')
+ self.sink_performance.input=self.plot_estimator.outputs['output_json']
+
+ ifself.masks_normalize_test:
+ self.sources_masks_normalize_test=dict()
+
+ # -----------------------------------------------------
+ # Optionally, add ComBat Harmonization. Currently done
+ # on full dataset, not in a cross-validation
+ ifuse_ComBat=='True':
+ message='[ERROR] If you want to use ComBat, you need to provide training images or features as well.'
+ raiseWORCexceptions.WORCNotImplementedError(message)
+
+ ifnotself.features_test:
+ # Create nodes to compute features
+ # General
+ self.sources_parameters=dict()
+ self.source_config_pyradiomics=dict()
+ self.source_toolbox_name=dict()
+
+ # testing only
+ self.calcfeatures_test=dict()
+ self.featureconverter_test=dict()
+ self.preprocessing_test=dict()
+ self.sources_images_test=dict()
+ self.sinks_features_test=dict()
+ self.sinks_configs=dict()
+ self.converters_im_test=dict()
+ self.converters_seg_test=dict()
+ self.links_C1_test=dict()
+
+ self.featurecalculators=dict()
+
+ # Check which nodes are necessary
+ ifnotself.segmentations_test:
+ message="No automatic segmentation method is yet implemented."
+ raiseWORCexceptions.WORCNotImplementedError(message)
+
+ eliflen(self.segmentations_test)==len(image_types):
+ # Segmentations provided
+ self.sources_segmentations_test=dict()
+ self.segmode='Provided'
+
+ eliflen(self.segmentations_test)==1:
+ # Assume segmentations need to be registered to other modalities
+ print('\t - Adding Elastix node for image registration.')
+ self.add_elastix_sourcesandsinks()
+ pass
+
+ else:
+ nseg=len(self.segmentations_test)
+ nim=len(image_types)
+ m=f'Length of segmentations for testing is '+\
+ f'{nseg}: should be equal to number of images'+\
+ f' ({nim}) or 1 when using registration.'
+ raiseWORCexceptions.WORCValueError(m)
+
+ ifuse_segmentix=='True':
+ # Use the segmentix toolbox for segmentation processing
+ print('\t - Adding segmentix node for segmentation preprocessing.')
+ self.sinks_segmentations_segmentix_test=dict()
+ self.sources_masks_test=dict()
+ self.converters_masks_test=dict()
+ self.nodes_segmentix_test=dict()
+
+ ifself.semantics_test:
+ # Semantic features are supplied
+ self.sources_semantics_test=dict()
+
+ ifself.metadata_test:
+ # Metadata to extract patient features from is supplied
+ self.sources_metadata_test=dict()
+
+ # Create a part of the pipeline for each modality
+ self.modlabels=list()
+ fornmod,modinenumerate(image_types):
+ # Extract some modality specific config info
+ iftype(self.configs[conf_it])==str:
+ # Config is a .ini file, load
+ temp_conf=config_io.load_config(self.configs[nmod])
+ else:
+ temp_conf=self.configs[nmod]
+
+ # Create label for each modality/image
+ num=0
+ label=mod+'_'+str(num)
+ whilelabelinself.calcfeatures_test.keys():
+ # if label already exists, add number to label
+ num+=1
+ label=mod+'_'+str(num)
+ self.modlabels.append(label)
+
+ # Create required sources and sinks
+ self.sources_parameters[label]=self.network.create_source('ParameterFile',id=f'config_{label}',step_id='general_sources')
+ self.sources_images_test[label]=self.network.create_source('ITKImageFile',id='images_test_'+label,node_group='test',step_id='test_sources')
+
+ ifself.metadata_testandlen(self.metadata_test)>=nmod+1:
+ self.sources_metadata_test[label]=self.network.create_source('DicomImageFile',id='metadata_test_'+label,node_group='test',step_id='test_sources')
+
+ ifself.masks_testandlen(self.masks_test)>=nmod+1:
+ # Create mask source and convert
+ self.sources_masks_test[label]=self.network.create_source('ITKImageFile',id='mask_test_'+label,node_group='test',step_id='test_sources')
+ memory=self.fastr_memory_parameters['WORCCastConvert']
+ self.converters_masks_test[label]=\
+ self.network.create_node('worc/WORCCastConvert:0.3.2',
+ tool_version='0.1',
+ id='convert_mask_test_'+label,
+ node_group='test',
+ resources=ResourceLimit(memory=memory),
+ step_id='FileConversion')
+
+ self.converters_masks_test[label].inputs['image']=self.sources_masks_test[label].output
+
+ # First convert the images
+ ifany(modalityinmodformodalityinall_modalities):
+ # Use WORC PXCastConvet for converting image formats
+ memory=self.fastr_memory_parameters['WORCCastConvert']
+ self.converters_im_test[label]=\
+ self.network.create_node('worc/WORCCastConvert:0.3.2',
+ tool_version='0.1',
+ id='convert_im_test_'+label,
+ resources=ResourceLimit(memory=memory),
+ step_id='FileConversion')
+
+ else:
+ raiseWORCexceptions.WORCTypeError(('No valid image type for modality {}: {} provided.').format(str(nmod),mod))
+
+ # Create required links
+ self.converters_im_test[label].inputs['image']=self.sources_images_test[label].output
+
+ # -----------------------------------------------------
+ # Preprocessing
+ preprocess_node=str(temp_conf['General']['Preprocessing'])
+ print('\t - Adding preprocessing node for image preprocessing.')
+ self.add_preprocessing(preprocess_node,label,nmod)
+
+ # -----------------------------------------------------
+ # Feature calculation
+ feature_calculators=\
+ temp_conf['General']['FeatureCalculators']
+ ifnotisinstance(feature_calculators,list):
+ # Configparser object, need to split string
+ feature_calculators=feature_calculators.strip('][').split(', ')
+ self.featurecalculators[label]=[f.split('/')[0]forfinfeature_calculators]
+ else:
+ self.featurecalculators[label]=feature_calculators
+
+
+ # Add lists for feature calculation and converter objects
+ self.calcfeatures_test[label]=list()
+ self.featureconverter_test[label]=list()
+
+ forfinfeature_calculators:
+ print(f'\t - Adding feature calculation node: {f}.')
+ self.add_feature_calculator(f,label,nmod)
+
+ # -----------------------------------------------------
+ # Create the neccesary nodes for the segmentation
+ ifself.segmode=='Provided':
+ # Segmentation ----------------------------------------------------
+ # Use the provided segmantions for each modality
+ memory=self.fastr_memory_parameters['WORCCastConvert']
+ self.sources_segmentations_test[label]=\
+ self.network.create_source('ITKImageFile',
+ id='segmentations_test_'+label,
+ node_group='test',
+ step_id='test_sources')
+
+ self.converters_seg_test[label]=\
+ self.network.create_node('worc/WORCCastConvert:0.3.2',
+ tool_version='0.1',
+ id='convert_seg_test_'+label,
+ resources=ResourceLimit(memory=memory),
+ step_id='FileConversion')
+
+ self.converters_seg_test[label].inputs['image']=\
+ self.sources_segmentations_test[label].output
+
+ elifself.segmode=='Register':
+ # ---------------------------------------------
+ # Registration nodes: Align segmentation of first
+ # modality to others using registration with Elastix
+ self.add_elastix(label,nmod)
+
+ # -----------------------------------------------------
+ # Optionally, add segmentix, the in-house segmentation
+ # processor of WORC
+ iftemp_conf['General']['Segmentix']=='True':
+ self.add_segmentix(label,nmod)
+ eliftemp_conf['Preprocessing']['Resampling']=='True':
+ raiseWORCexceptions.WORCValueError('If you use resampling, '+
+ 'have to use segmentix to '+
+ ' make sure the mask is '+
+ 'also resampled. Please '+
+ 'set '+
+ 'config["General"]["Segmentix"]'+
+ 'to "True".')
+
+ else:
+ # Provide source or elastix segmentations to
+ # feature calculator
+ fori_nodeinrange(len(self.calcfeatures_test[label])):
+ ifself.segmode=='Provided':
+ self.calcfeatures_test[label][i_node].inputs['segmentation']=\
+ self.converters_seg_test[label].outputs['image']
+ elifself.segmode=='Register':
+ ifnmod>0:
+ self.calcfeatures_test[label][i_node].inputs['segmentation']=\
+ self.transformix_seg_nodes_test[label].outputs['image']
+ else:
+ self.calcfeatures_test[label][i_node].inputs['segmentation']=\
+ self.converters_seg_test[label].outputs['image']
+
+ # -----------------------------------------------------
+ # Optionally, add ComBat Harmonization
+ ifuse_ComBat=='True':
+ # Link features to ComBat
+ self.links_Combat1_test[label]=list()
+ fori_node,fnameinenumerate(self.featurecalculators[label]):
+ self.links_Combat1_test[label].append(self.ComBat.inputs['features_test'][f'{label}_{self.featurecalculators[label][i_node]}']<<self.featureconverter_test[label][i_node].outputs['feat_out'])
+ self.links_Combat1_test[label][i_node].collapse='test'
+
+ # -----------------------------------------------------
+ # Output the features
+ # Add the features from this modality to the classifier node input
+ self.links_C1_test[label]=list()
+ self.sinks_features_test[label]=list()
+
+ fori_node,fnameinenumerate(self.featurecalculators[label]):
+ # Create sink for feature outputs
+ node_id='features_test_'+label+'_'+fname
+ node_id=node_id.replace(':','_').replace('.','_').replace('/','_')
+ self.sinks_features_test[label].append(self.network.create_sink('HDF5',id=node_id,step_id='test_sinks'))
+
+ # Save output
+ self.sinks_features_test[label][i_node].input=self.featureconverter_test[label][i_node].outputs['feat_out']
+
+ else:
+ # Features already provided: hence we can skip numerous nodes
+ self.sources_features_train=dict()
+ self.links_C1_train=dict()
+
+ ifself.features_test:
+ self.sources_features_test=dict()
+ self.links_C1_test=dict()
+
+ # Create label for each modality/image
+ self.modlabels=list()
+ fornum,modinenumerate(image_types):
+ num=0
+ label=mod+str(num)
+ whilelabelinself.sources_features_train.keys():
+ # if label exists, add number to label
+ num+=1
+ label=mod+str(num)
+ self.modlabels.append(label)
+
+ # Create a node for the features
+ self.sources_features_test[label]=self.network.create_source('HDF5',id='features_test_'+label,node_group='test',step_id='test_sources')
+
+ else:
+ raiseWORCexceptions.WORCIOError("Please provide labels for training, i.e., WORC.labels_train or SimpleWORC.labels_from_this_file.")
+ else:
+ raiseWORCexceptions.WORCIOError("Please provide either images or features.")
+
[docs]defadd_fingerprinter(self,id,type,config_source):"""Add WORC Fingerprinter to the network.
@@ -1264,7 +1640,7 @@
# Add type inputvalid_types=['classification','images']iftypenotinvalid_types:
- raiseWORCexceptions.WORCValueError(f'Type {type} is not valid for fingeprinting. Should be one of {valid_types}.')
+ raiseWORCexceptions.WORCValueError(f'Type {type} is not valid for fingeprinting. Should be one of {valid_types}.')type_node=self.network.create_constant('String',type,
- id=f'type_fingerprint_{id}',
+ id=f'type_fingerprint_{id}',node_group='train',step_id='FingerPrinting')fingerprinter_node.inputs['type']=type_node.output
@@ -1319,7 +1695,7 @@
# Create sources_segmentation# M1 = moving, others = fixed
- self.elastix_nodes_train[label].inputs['fixed_image']=\
- self.converters_im_train[label].outputs['image']
+ ifnotself.OnlyTest:
+ self.elastix_nodes_train[label].inputs['fixed_image']=\
+ self.converters_im_train[label].outputs['image']
- self.elastix_nodes_train[label].inputs['moving_image']=\
- self.converters_im_train[self.modlabels[0]].outputs['image']
+ self.elastix_nodes_train[label].inputs['moving_image']=\
+ self.converters_im_train[self.modlabels[0]].outputs['image']# Add node that copies metadata from the image to the# segmentation if required
- ifself.CopyMetadata:
+ ifself.CopyMetadataandnotself.OnlyTest:# Copy metadata from the image which was registered to# the segmentation, if it is not created yetifnothasattr(self,"copymetadata_nodes_train"):
@@ -1731,12 +2145,12 @@
self.sink_data=dict()# Save the configurations as files
- self.save_config()
+ ifnotself.OnlyTest:
+ self.save_config()
+ else:
+ self.fastrconfigs=self.configs# fixed splitsifself.fixedsplits:
@@ -1993,6 +2426,7 @@
Source code for WORC.WORC
# Set source and sink dataself.source_data['patientclass_train']=self.labels_trainself.source_data['patientclass_test']=self.labels_test
+ self.source_data['trained_model']=self.trained_modelself.sink_data['classification']=("vfs://output/{}/estimator_{{sample_id}}_{{cardinality}}{{ext}}").format(self.name)self.sink_data['performance']=("vfs://output/{}/performance_{{sample_id}}_{{cardinality}}{{ext}}").format(self.name)
@@ -2001,12 +2435,19 @@
Source code for WORC.WORC
self.sink_data['features_train_ComBat']=("vfs://output/{}/ComBat/features_ComBat_{{sample_id}}_{{cardinality}}{{ext}}").format(self.name)self.sink_data['features_test_ComBat']=("vfs://output/{}/ComBat/features_ComBat_{{sample_id}}_{{cardinality}}{{ext}}").format(self.name)
+ # Get info from the first config file
+ iftype(self.configs[0])==str:
+ # Config is a .ini file, load
+ temp_conf=config_io.load_config(self.configs[0])
+ else:
+ temp_conf=self.configs[0]
+
# Set the source data from the WORC objects you createdfornum,labelinenumerate(self.modlabels):self.source_data['config_'+label]=self.fastrconfigs[num]
- self.sink_data[f'config_{label}_sink']=f"vfs://output/{self.name}/config_{label}_{{sample_id}}_{{cardinality}}{{ext}}"
+ self.sink_data[f'config_{label}_sink']=f"vfs://output/{self.name}/config_{label}_{{sample_id}}_{{cardinality}}{{ext}}"
- if'pyradiomics'inself.configs[0]['General']['FeatureCalculators']andself.configs[0]['General']['Fingerprint']!='True':
+ if'pyradiomics'intemp_conf['General']['FeatureCalculators']andtemp_conf['General']['Fingerprint']!='True':self.source_data['config_pyradiomics_'+label]=self.pyradiomics_configs[num]# Add train data sources
@@ -2076,6 +2517,7 @@
Source code for WORC.WORC
self.sink_data['images_out_elastix_test_'+label]=("vfs://output/{}/Images/im_{}_elastix_{{sample_id}}_{{cardinality}}{{ext}}").format(self.name,label)ifhasattr(self,'featurecalculators'):forfinself.featurecalculators[label]:
+ f=f.replace(':','_').replace('.','_').replace('/','_')self.sink_data['features_test_'+label+'_'+f]=("vfs://output/{}/Features/features_{}_{}_{{sample_id}}_{{cardinality}}{{ext}}").format(self.name,f,label)# Add elastix sinks if used
@@ -2104,21 +2546,22 @@
Source code for WORC.WORC
exceptgraphviz.backend.ExecutableNotFound:print('[WORC WARNING] Graphviz executable not found: not drawing network diagram. Make sure the Graphviz executables are on your systems PATH.')exceptgraphviz.backend.CalledProcessErrorase:
- print(f'[WORC WARNING] Graphviz executable gave an error: not drawing network diagram. Original error: {e}')
+ print(f'[WORC WARNING] Graphviz executable gave an error: not drawing network diagram. Original error: {e}')
- # export hyper param. search space to LaTeX table
- forconfiginself.fastrconfigs:
- config_path=Path(url2pathname(urlparse(config).path))
- tex_path=f'{config_path.parent.absolute() / config_path.stem}_hyperparams_space.tex'
- export_hyper_params_to_latex(config_path,tex_path)
+ # export hyper param. search space to LaTeX table. Only for training models.
+ ifnotself.OnlyTest:
+ forconfiginself.fastrconfigs:
+ config_path=Path(url2pathname(urlparse(config).path))
+ tex_path=f'{config_path.parent.absolute()/config_path.stem}_hyperparams_space.tex'
+ export_hyper_params_to_latex(config_path,tex_path)ifDebugDetector().do_detection():print("Source Data:")forkinself.source_data.keys():
- print(f"\t{k}: {self.source_data[k]}.")
+ print(f"\t{k}: {self.source_data[k]}.")print("\n Sink Data:")forkinself.sink_data.keys():
- print(f"\t{k}: {self.sink_data[k]}.")
+ print(f"\t{k}: {self.sink_data[k]}.")# When debugging, set the tempdir to the default of fastr + nameself.fastr_tmpdir=os.path.join(fastr.config.mounts['tmp'],
@@ -2157,7 +2600,7 @@
# If PyRadiomics is used and there is no finterprinting, also write a config for PyRadiomicsif'pyradiomics'inc['General']['FeatureCalculators']andself.configs[0]['General']['Fingerprint']!='True':
- cfile_pyradiomics=os.path.join(self.fastr_tmpdir,f"config_pyradiomics_{self.name}_{num}.yaml")
+ cfile_pyradiomics=os.path.join(self.fastr_tmpdir,f"config_pyradiomics_{self.name}_{num}.yaml")config_pyradiomics=io.convert_config_pyradiomics(c)withopen(cfile_pyradiomics,'w')asfile:yaml.safe_dump(config_pyradiomics,file)
- cfile_pyradiomics=Path(self.fastr_tmpdir)/f"config_pyradiomics_{self.name}_{num}.yaml"
+ cfile_pyradiomics=Path(self.fastr_tmpdir)/f"config_pyradiomics_{self.name}_{num}.yaml"self.pyradiomics_configs.append(cfile_pyradiomics.as_uri().replace('%20',' '))# BUG: Make path with pathlib to create windows double slashes
- cfile=Path(self.fastr_tmpdir)/f"config_{self.name}_{num}.ini"
+ cfile=Path(self.fastr_tmpdir)/f"config_{self.name}_{num}.ini"self.fastrconfigs.append(cfile.as_uri().replace('%20',' '))
@@ -2188,7 +2631,7 @@
Source code for WORC.WORC
3. Slicer pipeline, to create pngs of middle slice of images. """
-
[docs]def__init__(self,estimators):"""Initialize object with list of estimators."""ifnotestimators:message='You supplied an empty list of estimators: No ensemble creation possible.'
@@ -523,7 +530,7 @@
Source code for WORC.classification.SearchCV
"""Base class for hyper parameter search with cross-validation."""
def _check_is_fitted(self,method_name):ifnotself.refit:
- raiseNotFittedError(('This GridSearchCV instance was initialized '
+ raiseNotFittedError(('This SearchCV instance was initialized ''with refit=False. %s is ''available only after refitting on the best ''parameters. ')%method_name)
@@ -789,12 +796,15 @@
Source code for WORC.classification.SearchCV
if self.best_modelselisnotNone:X=self.best_modelsel.transform(X)
- ifself.best_pcaisnotNone:
- X=self.best_pca.transform(X)
-
ifself.best_statisticalselisnotNone:X=self.best_statisticalsel.transform(X)
+ ifself.best_rfeselisnotNone:
+ X=self.best_rfesel.transform(X)
+
+ ifself.best_pcaisnotNone:
+ X=self.best_pca.transform(X)
+
# Only resampling in training phase, i.e. if we have the labelsifyisnotNone:ifself.best_SamplerisnotNone:
@@ -853,7 +863,7 @@
self.ensemble_validation_score=best_performanceifverbose:
- print(f"Ensembling best {scoring}: {best_performance}.")
- print(f"Single estimator best {scoring}: {single_estimator_performance}.")
- print(f'Ensemble consists of {len(ensemble)} estimators {ensemble}.')
+ print(f"Ensembling best {scoring}: {best_performance}.")
+ print(f"Single estimator best {scoring}: {single_estimator_performance}.")
+ print(f'Ensemble consists of {len(ensemble)} estimators {ensemble}.')elifmethod=='ForwardSelection':
@@ -1393,7 +1405,7 @@
Source code for WORC.classification.SearchCV
while new_performance>best_performance:Y_valid_score=copy.deepcopy(base_Y_valid_score)ifverbose:
- print(f"Iteration: {iteration}, best {scoring}: {new_performance}.")
+ print(f"Iteration: {iteration}, best {scoring}: {new_performance}.")best_performance=new_performance
@@ -1435,9 +1447,9 @@
Source code for WORC.classification.SearchCV
if verbose:# Print the performance gain
- print(f"Ensembling best {scoring}: {best_performance}.")
- print(f"Single estimator best {scoring}: {single_estimator_performance}.")
- print(f'Ensemble consists of {len(ensemble)} estimators {ensemble}.')
+ print(f"Ensembling best {scoring}: {best_performance}.")
+ print(f"Single estimator best {scoring}: {single_estimator_performance}.")
+ print(f'Ensemble consists of {len(ensemble)} estimators {ensemble}.')elifmethod=='Caruana':ifverbose:
@@ -1448,7 +1460,7 @@
Source code for WORC.classification.SearchCV
while iteration<20:Y_valid_score=copy.deepcopy(base_Y_valid_score)ifverbose:
- print(f"Iteration: {iteration}, best {scoring}: {new_performance}.")
+ print(f"Iteration: {iteration}, best {scoring}: {new_performance}.")ifiteration>1:# Stack scores: not needed for first iteration
@@ -1494,9 +1506,9 @@
Source code for WORC.classification.SearchCV
if verbose:# Print the performance gain
- print(f"Ensembling best {scoring}: {best_performance}.")
- print(f"Single estimator best {scoring}: {single_estimator_performance}.")
- print(f'Ensemble consists of {len(ensemble)} estimators {ensemble}.')
+ print(f"Ensembling best {scoring}: {best_performance}.")
+ print(f"Single estimator best {scoring}: {single_estimator_performance}.")
+ print(f'Ensemble consists of {len(ensemble)} estimators {ensemble}.')elifmethod=='Bagging':ifverbose:
@@ -1567,12 +1579,12 @@
Source code for WORC.classification.SearchCV
if verbose:# Print the performance gain
- print(f"Ensembling best {scoring}: {best_performance}.")
- print(f"Single estimator best {scoring}: {single_estimator_performance}.")
- print(f'Ensemble consists of {len(ensemble)} estimators {ensemble}.')
+ print(f"Ensembling best {scoring}: {best_performance}.")
+ print(f"Single estimator best {scoring}: {single_estimator_performance}.")
+ print(f'Ensemble consists of {len(ensemble)} estimators {ensemble}.')else:
- print(f'[WORC WARNING] No valid ensemble method given: {method}. Not ensembling')
+ print(f'[WORC WARNING] No valid ensemble method given: {method}. Not ensembling')returnself# Create the ensemble --------------------------------------------------
@@ -1600,7 +1612,7 @@
estimator.predict(np.asarray([X_train[0][0],X_train[1][0]]))estimators.append(estimator)except(NotFittedError,ValueError):
- print(f'\t\t - Estimator {enum} could not be fitted (correctly), do not include in ensemble.')
+ print(f'\t\t - Estimator {enum} could not be fitted (correctly), do not include in ensemble.')else:# Create the ensemble trained on the full training set
@@ -1625,7 +1637,7 @@
Source code for WORC.classification.SearchCV
for enum,p_allinenumerate(parameters_all):# Refit a SearchCV object with the provided parametersifverbose:
- print(f"Refitting estimator {enum + 1} / {nest}.")
+ print(f"Refitting estimator {enum+1} / {nest}.")base_estimator=clone(base_estimator)# Check if we need to create a multiclass estimator
@@ -1641,10 +1653,10 @@
Source code for WORC.classification.SearchCV
base_estimator.predict(np.asarray([X_train[0][0],X_train[1][0]]))estimators.append(base_estimator)except(NotFittedError,ValueError):
- print(f'\t\t - Estimator {enum} could not be fitted (correctly), do not include in ensemble.')
+ print(f'\t\t - Estimator {enum} could not be fitted (correctly), do not include in ensemble.')ifnotestimators:
- print(f'\t\t - Ensemble is empty, thus go on untill we find an estimator that works and that is the final ensemble.')
+ print(f'\t\t - Ensemble is empty, thus go on untill we find an estimator that works and that is the final ensemble.')whilenotestimators:# We cannot have an empy ensemble, thus go on untill we find an estimator that worksenum+=1
@@ -1667,7 +1679,7 @@
n_splits =cv.get_n_splits(X,y,groups)ifself.verbose>0andisinstance(parameter_iterable,Sized):n_candidates=len(parameter_iterable)
- print(f"Fitting {n_splits} folds for each of {n_candidates} candidates, totalling {n_candidates * n_splits} fits.")
+ print(f"Fitting {n_splits} folds for each of {n_candidates} candidates, totalling {n_candidates*n_splits} fits.")cv_iter=list(cv.split(X,y,groups))
@@ -1759,7 +1771,7 @@
Source code for WORC.classification.SearchCV
message ='One or more of the values in your parameter sampler '+\
'is either not iterable, or the distribution cannot '+\
'generate valid samples. Please check your '+\
- f' parameters. At least {k} gives an error.'
+ f' parameters. At least {k} gives an error.'raiseWORCexceptions.WORCValueError(message)# Split the parameters files in equal parts
@@ -1771,7 +1783,7 @@
Source code for WORC.classification.SearchCV
for numberink:temp_dict[number]=parameters_temp[number]
- fname=f'settings_{num}.json'
+ fname=f'settings_{num}.json'sourcename=os.path.join(tempfolder,'parameters',fname)ifnotos.path.exists(os.path.dirname(sourcename)):os.makedirs(os.path.dirname(sourcename))
@@ -1779,7 +1791,7 @@
difference =expected_no_files-len(sink_files)fname=os.path.join(tempfolder,'tmp')message=('Fitting classifiers has failed for '+
- f'{difference} / {expected_no_files} files. The temporary '+
- f'results where not deleted and can be found in {tempfolder}. '+
+ f'{difference} / {expected_no_files} files. The temporary '+
+ f'results where not deleted and can be found in {tempfolder}. '+'Probably your fitting and scoring failed: check out '+'the tmp/fitandscore folder within the tempfolder for '+'the fastr job temporary results or run: fastr trace '+
- f'"{fname}{os.path.sep}__sink_data__.json" --samples.')
+ f'"{fname}{os.path.sep}__sink_data__.json" --samples.')raiseWORCexceptions.WORCValueError(message)# Read in the output data once finished
@@ -2166,7 +2178,7 @@
n_splits =cv.get_n_splits(X,y,groups)ifself.verbose>0andisinstance(parameter_iterable,Sized):n_candidates=len(parameter_iterable)
- print(f"Fitting {n_splits} folds for each of {n_candidates}"+\
+ print(f"Fitting {n_splits} folds for each of {n_candidates}"+\
" candidates, totalling"+\
" {n_candidates * n_splits} fits")
@@ -2563,7 +2575,7 @@
difference =expected_no_files-len(sink_files)fname=os.path.join(tempfolder,'tmp')message=('Fitting classifiers has failed for '+
- f'{difference} / {expected_no_files} files. The temporary '+
- f'results where not deleted and can be found in {tempfolder}. '+
+ f'{difference} / {expected_no_files} files. The temporary '+
+ f'results where not deleted and can be found in {tempfolder}. '+'Probably your fitting and scoring failed: check out '+'the tmp/smac folder within the tempfolder for '+'the fastr job temporary results or run: fastr trace '+
- f'"{fname}{os.path.sep}__sink_data__.json" --samples.')
+ f'"{fname}{os.path.sep}__sink_data__.json" --samples.')raiseWORCexceptions.WORCValueError(message)# Read in the output data once finished
@@ -3520,7 +3532,7 @@
# If we are using fixed splits, set the n_iterations to the number of splits
iffixedsplitsisnotNone:n_iterations=int(fixedsplits.columns.shape[0]/2)
- print(f'Fixedsplits detected, adjusting n_iterations to {n_iterations}')
- logging.debug(f'Fixedsplits detected, adjusting n_iterations to {n_iterations}')
+ print(f'Fixedsplits detected, adjusting n_iterations to {n_iterations}')
+ logging.debug(f'Fixedsplits detected, adjusting n_iterations to {n_iterations}')foriinrange(start,n_iterations):print(('Cross-validation iteration {} / {} .').format(str(i+1),str(n_iterations)))logging.debug(('Cross-validation iteration {} / {} .').format(str(i+1),str(n_iterations)))timestamp=strftime("%Y-%m-%d %H:%M:%S",gmtime())
- print(f'\t Time: {timestamp}.')
- logging.debug(f'\t Time: {timestamp}.')
+ print(f'\t Time: {timestamp}.')
+ logging.debug(f'\t Time: {timestamp}.')iffixed_seed:random_seed=i**2else:
@@ -405,7 +412,7 @@
Source code for WORC.classification.crossval
# Test performance for various RS and ensemble sizes
ifconfig['General']['DoTestNRSNEns']:
- output_json=os.path.join(tempfolder,f'performance_RS_Ens_crossval_{i}.json')
+ output_json=os.path.join(tempfolder,f'performance_RS_Ens_crossval_{i}.json')test_RS_Ensemble(estimator_input=trained_classifier,X_train=X_train,Y_train=Y_train,X_test=X_test,Y_test=Y_test,
@@ -445,8 +452,8 @@
Source code for WORC.classification.crossval
# Print elapsed time
elapsed=int((time.time()-t)/60.0)
- print(f'\t Fitting took {elapsed} minutes.')
- logging.debug(f'\t Fitting took {elapsed} minutes.')
+ print(f'\t Fitting took {elapsed} minutes.')
+ logging.debug(f'\t Fitting took {elapsed} minutes.')returnsave_data
# Print elapsed time
elapsed=int((time.time()-t)/60.0)
- print(f'\t Fitting took {elapsed} minutes.')
- logging.debug(f'\t Fitting took {elapsed} minutes.')
+ print(f'\t Fitting took {elapsed} minutes.')
+ logging.debug(f'\t Fitting took {elapsed} minutes.')returnsave_data
@@ -781,7 +788,7 @@
Source code for WORC.classification.crossval
use_SMAC=use_SMAC,smac_result_file=smac_result_file)else:
- raiseae.WORCKeyError(f'{crossval_type} is not a recognized cross-validation type.')
+ raiseae.WORCKeyError(f'{crossval_type} is not a recognized cross-validation type.')[classifiers,X_train_set,X_test_set,Y_train_set,Y_test_set,patient_ID_train_set,patient_ID_test_set,seed_set]=\
@@ -959,7 +966,7 @@
Source code for WORC.classification.crossval
# FIXME: Use home folder, as this function does not know
# Where final or temporary output is locatedoutput_json=os.path.join(os.path.expanduser("~"),
- f'performance_RS_Ens.json')
+ f'performance_RS_Ens.json')test_RS_Ensemble(estimator_input=trained_classifier,X_train=X_train,Y_train=Y_train,
@@ -1009,14 +1016,14 @@
Source code for WORC.classification.crossval
if RS<=n_workflows:# Make a key for saving the scorenum=0
- key=f'RS {RS} try {str(num).zfill(2)}'
+ key=f'RS {RS} try {str(num).zfill(2)}'whilekeyinkeys:num+=1
- key=f'RS {RS} try {str(num).zfill(2)}'
+ key=f'RS {RS} try {str(num).zfill(2)}'keys.append(key)# Make a local copy of the estimator and select only subset of workflows
- print(f'\t Using RS {RS}.')
+ print(f'\t Using RS {RS}.')estimator=copy.deepcopy(estimator_original)# estimator.maxlen = RS # Why is this needed? This will only lead to a lot of extra workflows on top of the top 100 being fittedestimator.maxlen=min(RS,maxlen)
@@ -1081,40 +1088,40 @@
Source code for WORC.classification.crossval
F1_training =[F1_training[i]foriinselected_workflows]F1_training=[F1_training[i]foriinworkflow_ranking]
- performances[f'Mean training F1-score {key} top {maxlen}']=F1_validation
- performances[f'Mean validation F1-score {key} top {maxlen}']=F1_training
+ performances[f'Mean training F1-score {key} top {maxlen}']=F1_validation
+ performances[f'Mean validation F1-score {key} top {maxlen}']=F1_trainingforensembleinensembles:ifisinstance(ensemble,int):ifensemble>RS:continueelse:
- print(f'\t Using ensemble {ensemble}.')
+ print(f'\t Using ensemble {ensemble}.')# Create the ensembleestimator.create_ensemble(X_train_temp,Y_train,method='top_N',size=ensemble,verbose=verbose)else:
- print(f'\t Using ensemble {ensemble}.')
+ print(f'\t Using ensemble {ensemble}.')# Create the ensembleestimator.create_ensemble(X_train_temp,Y_train,method=ensemble,verbose=verbose)
- performances[f'Validation F1-score Ensemble {ensemble}{key}']=estimator.ensemble_validation_score
+ performances[f'Validation F1-score Ensemble {ensemble}{key}']=estimator.ensemble_validation_score# Compute performancey_prediction=estimator.predict(X_test)y_score=estimator.predict_proba(X_test)[:,1]auc=roc_auc_score(Y_test,y_score)f1_score_out=f1_score(Y_test,y_prediction,average='weighted')
- performances[f'Test F1-score Ensemble {ensemble}{key}']=f1_score_out
- performances[f'Test AUC Ensemble {ensemble}{key}']=auc
+ performances[f'Test F1-score Ensemble {ensemble}{key}']=f1_score_out
+ performances[f'Test AUC Ensemble {ensemble}{key}']=aucy_prediction=estimator.predict(X_train)y_score=estimator.predict_proba(X_train)[:,1]auc=roc_auc_score(Y_train,y_score)f1_score_out=f1_score(Y_train,y_prediction,average='weighted')
- performances[f'Train F1-score Ensemble {ensemble}{key}']=f1_score_out
- performances[f'Train AUC Ensemble {ensemble}{key}']=auc
+ performances[f'Train F1-score Ensemble {ensemble}{key}']=f1_score_out
+ performances[f'Train AUC Ensemble {ensemble}{key}']=auc# Write outputwithopen(output_json,'w')asfp:
@@ -1125,20 +1132,25 @@
#!/usr/bin/env python
-# Copyright 2016-2022 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2023 Biomedical Imaging Group Rotterdam, Departments of# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands## Licensed under the Apache License, Version 2.0 (the "License");
@@ -179,38 +186,39 @@
Source code for WORC.classification.fitandscore
<
# See the License for the specific language governing permissions and# limitations under the License.
-fromsklearn.model_selection._validationimport_fit_and_score
+fromsklearn.model_selection._validationimport_fit_and_scoreimportnumpyasnp
-fromsklearn.linear_modelimportLasso,LogisticRegression
-fromsklearn.feature_selectionimportSelectFromModel
-fromsklearn.decompositionimportPCA
-fromsklearn.multiclassimportOneVsRestClassifier
-fromsklearn.ensembleimportRandomForestClassifier
-fromWORC.classification.ObjectSamplerimportObjectSampler
-fromsklearn.utils.metaestimatorsimport_safe_split
-fromsklearn.utils.validationimport_num_samples
-fromWORC.classificationimportconstruct_classifierascc
-fromWORC.classification.metricsimportcheck_multimetric_scoring
-fromWORC.featureprocessing.ReliefimportSelectMulticlassRelief
-fromWORC.featureprocessing.ImputerimportImputer
-fromWORC.featureprocessing.ScalersimportWORCScaler
-fromWORC.featureprocessing.VarianceThresholdimportselfeat_variance
-fromWORC.featureprocessing.StatisticalTestThresholdimportStatisticalTestThreshold
-fromWORC.featureprocessing.SelectGroupsimportSelectGroups
-fromWORC.featureprocessing.OneHotEncoderWrapperimportOneHotEncoderWrapper
+fromsklearn.linear_modelimportLasso,LogisticRegression
+fromsklearn.feature_selectionimportSelectFromModel,RFE
+fromsklearn.decompositionimportPCA
+fromsklearn.multiclassimportOneVsRestClassifier
+fromsklearn.ensembleimportRandomForestClassifier
+fromWORC.classification.ObjectSamplerimportObjectSampler
+fromsklearn.utils.metaestimatorsimport_safe_split
+fromsklearn.utils.validationimport_num_samples
+fromWORC.classificationimportconstruct_classifierascc
+fromWORC.classification.metricsimportcheck_multimetric_scoring
+fromWORC.featureprocessing.ReliefimportSelectMulticlassRelief
+fromWORC.featureprocessing.ImputerimportImputer
+fromWORC.featureprocessing.ScalersimportWORCScaler
+fromWORC.featureprocessing.VarianceThresholdimportselfeat_variance
+fromWORC.featureprocessing.StatisticalTestThresholdimportStatisticalTestThreshold
+fromWORC.featureprocessing.SelectGroupsimportSelectGroups
+fromWORC.featureprocessing.OneHotEncoderWrapperimportOneHotEncoderWrapperimportWORCimportWORC.addexceptionsasaeimporttime
+fromxgboost.sklearnimportXGBRegressor# Specific imports for error management
-fromsklearn.discriminant_analysisimportLinearDiscriminantAnalysisasLDA
-fromnumpy.linalgimportLinAlgError
+fromsklearn.discriminant_analysisimportLinearDiscriminantAnalysisasLDA
+fromnumpy.linalgimportLinAlgError# Suppress some sklearn warnings. These occur when unused hyperparameters are# supplied, when estimators that are refitted do not converge, or parts# are deprecatedimportwarnings
-fromsklearn.exceptionsimportConvergenceWarning
+fromsklearn.exceptionsimportConvergenceWarningwarnings.filterwarnings("ignore",category=DeprecationWarning)warnings.filterwarnings("ignore",category=UserWarning)warnings.filterwarnings("ignore",category=ConvergenceWarning)
@@ -355,6 +363,10 @@
Source code for WORC.classification.fitandscore
<
Either None if the statistical test feature selection is not used, or the fitted object.
+ RFESel: WORC RFESel Object
+ Either None if the recursive feature elimination feature selection is not used, or
+ the fitted object.
+ ReliefSel: WORC ReliefSel Object Either None if the RELIEF feature selection is not used, or the fitted object.
@@ -403,6 +415,7 @@
<
if'OneHotEncoding'inpara_estimator.keys():ifpara_estimator['OneHotEncoding']=='True':ifverbose:
- print(f'Applying OneHotEncoding, will ignore unknowns.')
+ print(f'Applying OneHotEncoding, will ignore unknowns.')feature_labels_tofit=\
para_estimator['OneHotEncoding_feature_labels_tofit']encoder=\
@@ -471,7 +484,7 @@
Source code for WORC.classification.fitandscore
<
ifpara_estimator['Imputation']=='True':imp_type=para_estimator['ImputationMethod']ifverbose:
- print(f'Imputing NaN with {imp_type}.')
+ print(f'Imputing NaN with {imp_type}.')# Only used with KNN in SMAC, otherwise assign defaultif'ImputationNeighbours'inpara_estimator.keys():
@@ -489,7 +502,7 @@
Source code for WORC.classification.fitandscore
<
iforiginal_shape!=imputed_shape:removed_features=original_shape[1]-imputed_shape[1]ifpara_estimator['ImputationSkipAllNaN']=='True':
- print(f"[WARNING]: Several features ({removed_features}) were np.NaN for all objects. config['Imputation']['skipallNaN'] set to True, so simply eliminate these features.")
+ print(f"[WARNING]: Several features ({removed_features}) were np.NaN for all objects. config['Imputation']['skipallNaN'] set to True, so simply eliminate these features.")ifhasattr(imputer.Imputer,'statistics_'):X_train=imputer.transform(X_train)X_test=imputer.transform(X_test)
@@ -504,7 +517,7 @@
Source code for WORC.classification.fitandscore
<
feature_labels_zero=[flforfnum,flinenumerate(feature_labels[0])ifnotnp.isnan(temp_imputer.Imputer.statistics_[fnum])]feature_labels=[feature_labels_zeroforiinX_train]else:
- raiseae.WORCValueError(f'Several features ({removed_features}) were np.NaN for all objects. Hence, imputation was not possible. Either make sure this is correct and turn of imputation, or correct the feature.')
+ raiseae.WORCValueError(f'Several features ({removed_features}) were np.NaN for all objects. Hence, imputation was not possible. Either make sure this is correct and turn of imputation, or correct the feature.')else:X_train=imputer.transform(X_train)X_test=imputer.transform(X_test)
@@ -613,7 +626,7 @@
<
ifpara_estimator['SelectFromModel']=='True':model=para_estimator['SelectFromModel_estimator']ifverbose:
- print(f"Selecting features using model {model}.")
+ print(f"Selecting features using model {model}.")ifmodel=='Lasso':# Use lasso model for feature selection
@@ -796,12 +809,12 @@
Source code for WORC.classification.fitandscore
<
selectestimator=RandomForestClassifier(n_estimators=n_estimators,random_state=random_seed)else:
- raiseae.WORCKeyError(f'Model {model} is not known for SelectFromModel. Use Lasso, LR, or RF.')
+ raiseae.WORCKeyError(f'Model {model} is not known for SelectFromModel. Use Lasso, LR, or RF.')iflen(y_train.shape)>=2:# Multilabel or regression. Regression: second dimension has length 1ify_train.shape[1]>1andmodel!='RF':
- raiseae.WORCValueError(f'Model {model} is not suitable for multiclass classification. Please use RF or do not use SelectFromModel.')
+ raiseae.WORCValueError(f'Model {model} is not suitable for multiclass classification. Please use RF or do not use SelectFromModel.')# Prefit modelselectestimator.fit(X_train,y_train)
@@ -838,7 +851,7 @@
<
neg=int(len(y_train_temp)-pos)ifpos<10orneg<10:ifverbose:
- print(f'[WARNING] Skipping resampling: to few objects returned in one or both classes (pos: {pos}, neg: {neg}).')
+ print(f'[WARNING] Skipping resampling: to few objects returned in one or both classes (pos: {pos}, neg: {neg}).')Sampler=Noneparameters['Resampling_Use']='False'
@@ -1155,8 +1297,8 @@
Source code for WORC.classification.fitandscore
<
pos=int(np.sum(y_train))neg=int(len(y_train)-pos)ifverbose:
- message=f"Resampling from {len_in} ({pos_initial} pos,"+\
- f" {neg_initial} neg) to {len(y_train)} ({pos} pos, {neg} neg) patients."
+ message=f"Resampling from {len_in} ({pos_initial} pos,"+\
+ f" {neg_initial} neg) to {len(y_train)} ({pos} pos, {neg} neg) patients."print(message)# Also reset train and test indices
@@ -1212,13 +1354,13 @@
Source code for WORC.classification.fitandscore
<
estimator=OneVsRestClassifier(estimator)ifverbose:
- print(f"Fitting ML method: {parameters['classifiers']}.")
+ print(f"Fitting ML method: {parameters['classifiers']}.")# Recombine feature values and label for train and test setfeature_values=np.concatenate((X_train,X_test),axis=0)y_all=np.concatenate((y_train,y_test),axis=0)para_estimator=None
-
+
try:ret=_fit_and_score(estimator,feature_values,y_all,scorers,new_train,
@@ -1233,7 +1375,7 @@
Source code for WORC.classification.fitandscore
<
except(ValueError,LinAlgError)ase:iftype(estimator)==LDA:ifverbose:
- print(f'[WARNING]: skipping this setting due to LDA Error: {e}.')
+ print(f'[WARNING]: skipping this setting due to LDA Error: {e}.')# Update the runtimeend_time=time.time()
@@ -1244,7 +1386,7 @@
<
ifnp.isnan(value):ifverbose:iffeature_labelsisnotNone:
- print(f"[WARNING] NaN found, patient {pnum}, label {feature_labels[fnum]}. Replacing with zero.")
+ print(f"[WARNING] NaN found, patient {pnum}, label {feature_labels[fnum]}. Replacing with zero.")else:
- print(f"[WARNING] NaN found, patient {pnum}, label {fnum}. Replacing with zero.")
+ print(f"[WARNING] NaN found, patient {pnum}, label {fnum}. Replacing with zero.")# Note: X is a list of lists, hence we cannot index the element directlyimage_features_temp[pnum,fnum]=0
@@ -1399,20 +1547,25 @@
Source code for WORC.classification.parameter_optimization
random_search: sklearn randomsearch object containing the results.
"""ifrandom_seedisNone:
- #random_seed = np.random.randint(1, 5000)
- # Fix the random seed for testing
- random_seed=42
+ random_seed=np.random.randint(1,5000)
+
random_state=check_random_state(random_seed)regressors=['SVR','RFR','SGDR','Lasso','ElasticNet']
@@ -263,8 +269,8 @@
Source code for WORC.classification.parameter_optimization
random_search.fit(features,labels)print("Best found parameters:")foriinrandom_search.best_params_:
- print(f'{i}: {random_search.best_params_[i]}.')
- print(f"\n Best score using best parameters: {scoring_method} = {random_search.best_score_}")
+ print(f'{i}: {random_search.best_params_[i]}.')
+ print(f"\n Best score using best parameters: {scoring_method} = {random_search.best_score_}")returnrandom_search
@@ -336,7 +342,7 @@
Source code for WORC.classification.parameter_optimization
guided_search.fit(features,labels)print("Best found parameters:")foriinguided_search.best_params_:
- print(f'{i}: {guided_search.best_params_[i]}.')
+ print(f'{i}: {guided_search.best_params_[i]}.')print("\n Best score using best parameters:")print(guided_search.best_score_)
@@ -347,20 +353,25 @@
Source code for WORC.classification.parameter_optimization
Source code for WORC.classification.trainclassifier
#!/usr/bin/env python
-# Copyright 2016-2022 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2023 Biomedical Imaging Group Rotterdam, Departments of# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands## Licensed under the Apache License, Version 2.0 (the "License");
@@ -181,12 +188,12 @@
Source code for WORC.classification.trainclassifier
Source code for WORC.classification.trainclassifier
# Add non-classifier parameters
param_grid=add_parameters_to_grid(param_grid,config)
+
+ # Delete parameters for hyperoptimization which already have been used
+ delconfig['HyperOptimization']['fix_random_seed']# For N_iter, perform k-fold crossvalidationoutputfolder=os.path.dirname(output_hdf)
@@ -409,6 +419,28 @@
Source code for WORC.classification.trainclassifier
Source code for WORC.classification.trainclassifier
scale=config['Featsel']['ReliefNumFeatures'][1])# Add a random seed, which is required for many methods
- param_grid['random_seed']=\
- discrete_uniform(loc=0,scale=2**32-1)
+ ifconfig['HyperOptimization']['fix_random_seed']:
+ # Fix the random seed
+ param_grid['random_seed']=[22]
+ else:
+ param_grid['random_seed']=\
+ discrete_uniform(loc=0,scale=2**32-1)returnparam_grid
@@ -452,20 +488,25 @@
Source code for WORC.classification.trainclassifier
Source code for WORC.featureprocessing.StatisticalTestFeatures
#!/usr/bin/env python
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2023 Biomedical Imaging Group Rotterdam, Departments of# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands## Licensed under the Apache License, Version 2.0 (the "License");
@@ -186,11 +193,11 @@
Source code for WORC.featureprocessing.StatisticalTestFeatures
Source code for WORC.featureprocessing.StatisticalTestFeatures
Bonferonni=True,fontsize='small',yspacing=1,threshold=0.05,verbose=True,label_type=None):
- """Perform several statistical tests on features, such as a student t-test.
+"""Perform several statistical tests on features, such as a student t-test. Parameters ----------
@@ -244,7 +251,6 @@
Source code for WORC.featureprocessing.StatisticalTestFeatures
if type(output_tex)islist:output_tex=''.join(output_tex)
- print(output_png,output_tex)# Create output folder if requiredifnotos.path.exists(os.path.dirname(output_csv)):os.makedirs(os.path.dirname(output_csv))
@@ -317,7 +323,12 @@
Source code for WORC.featureprocessing.StatisticalTestFeatures
pvalueswelch.append(ttest_ind(class1,class2,equal_var=False)[1])pvalueswil.append(ranksums(class1,class2)[1])try:
- pvaluesmw.append(mannwhitneyu(class1,class2)[1])
+ pmwu=mannwhitneyu(class1,class2)[1]
+ ifpmwu==0.0:
+ print("[WORC Warning] Mann-Whitney U test resulted in a p-value of exactly 0.0, which is not valid. Replacing metric value by NaN.")
+ pvaluesmw.append(np.nan)
+ else:
+ pvaluesmw.append(pmwu)exceptValueErrorase:print("[WORC Warning] "+str(e)+'. Replacing metric value by NaN.')pvaluesmw.append(np.nan)
@@ -325,16 +336,23 @@
Source code for WORC.featureprocessing.StatisticalTestFeatures
# Optional: perform chi2 test. Only do this when categorical, which we define as less than 20 options.