diff --git a/.gitignore b/.gitignore
index 5b21d833..8bfd8033 100755
--- a/.gitignore
+++ b/.gitignore
@@ -132,3 +132,4 @@ WORC/external/*
WORC/exampledata/ICCvalues.csv
WORC/tests/*.png
WORC/tests/*.mat
+WORC/tests/WORC_Example_STWStrategyHN_Regression
diff --git a/.travis.yml b/.travis.yml
index f08ed107..5ec250b8 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -52,7 +52,13 @@ matrix:
- fastr trace /tmp/WORC_Example_STWStrategyHN/__sink_data__.json --sinks classification --samples all
- fastr trace /tmp/WORC_Example_STWStrategyHN/__sink_data__.json --sinks performance --samples all
- fastr trace /tmp/GS/DEBUG_0/tmp/__sink_data__.json --sinks output --samples id_0__0000__0000
-
+ # Change the tutorial script to also run a regression experiment,
+ # using the previously calculated features
+ - rm -r /tmp/GS/DEBUG_0
+ - python WORC/tests/WORCTutorialSimple_travis_regression.py
+ - fastr trace /tmp/WORC_Example_STWStrategyHN_Regression/__sink_data__.json --sinks classification --samples all
+ - fastr trace /tmp/WORC_Example_STWStrategyHN_Regression/__sink_data__.json --sinks performance --samples all
+ - fastr trace /tmp/GS/DEBUG_0/tmp/__sink_data__.json --sinks output --samples id_0__0000__0000
notifications:
slack:
secure: ytP+qd6Rx1m1uXYMaN7dFHnFNu+bCIcyugSnAY7BtbumJwCuEt8hbWvQ/sDoAKqxj5VYcnBlTRDn1gjg2t2shs7pBGgjdeZQpQglXyAtN4bz3suSUbQ9/RIwt+RPmbiTXkWQtoZ4q0DotydozKMnq8Cvhdy+d5pMqToER6kMq/WCC+Y/99mmnqO2VrWpvAvP6bBOWDvrk/C4u3y5m3Rp5iE7uAYR3TDTprIW9UNEntDoEYT2T+bidkDRl7DMsi8R4q4s/A6EhZpB4Tnhwz7ama155z77ywdZLhdmk5HJvngXcunVwH4v/l8DbBZU0PqMEJzaRMn/tQCCqjx1/unpyFCv+QuhmP5K4wo17R77jHlcn7SBkdzYr/CKHrilWuShmvOMCckBShpQw3H9PivcI6/G5mVA23tH+gJSQUbzZmBR683x7oQHmnK3g977yD/ufEvV6qME9HFXt3+jIzVEwsUjtJsTV/NsbHlErJfhBp8HJTpq6IRhtKcX9QS1i/APXcYcCSCFJe8tOTLN6xmAKBgONG3XOAvJwfwXbF+rmfjX0x6KMUuD5WmHLjMLhQp0dS00LV7C9s18UkFBgKydqvF2AMPUsbgIGyZ/Vz3v5nz7JiNLDfp0HxQpqAABpdwDHR3/CfuhCDcqzIXAgRgXaFrqCxqoH6OrsgRH6UxUXnM=
diff --git a/CHANGELOG b/CHANGELOG
index 0060644f..fd0d62dc 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -6,6 +6,37 @@ All notable changes to this project will be documented in this file.
The format is based on `Keep a Changelog `_
and this project adheres to `Semantic Versioning `_
+
+3.4.1 - 2021-05-18
+------------------
+
+Fixed
+~~~~~
+- Bugfix when PCA cannot be fitted.
+- Bugfix when using LOO cross-validation in performance evaluation.
+- Fix XGboost verson, as newest version automatically uses multihreading,
+ which is unsuitable for clusters.
+- Bug in decomposition for Evaluation.
+- RankedPosteriors naming of images was rounded to an integer, now unrounded
+- Several fixes for regression.
+- Regression in unit test.
+- Several fixes for using 2D images.
+
+Changed
+~~~~~~~
+- Reverted back to weighted f1-score without predictproba for optimization,
+ more stable.
+- Updated regressors in SimpleWORC.
+
+Added
+~~~~~~~
+- Option to combine features from a varying number of objects per patient,
+ e.g. by averaging or taking the maximum.
+- Logarithmic z-score scaler to be more robust to non-normal distributions
+ and outliers.
+- Linear and Ridge regression.
+- Precision-recall curves.
+
3.4.0 - 2021-02-02
------------------
diff --git a/README.md b/README.md
index 27122efe..01262ef9 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-# WORC v3.4.0
+# WORC v3.4.1
## Workflow for Optimal Radiomics Classification
## Information
@@ -33,6 +33,15 @@ and support of different software languages (python, MATLAB, ruby, java etc.), w
collaboration, standardisation and comparison of different radiomics approaches. By combining this in a single framework,
we hope to find a universal radiomics strategy that can address various problems.
+## License
+This package is covered by the open source [APACHE 2.0 License](APACHE-LICENSE-2.0).
+
+When using WORC, please cite this repository as following:
+
+``Martijn P.A. Starmans, Sebastian R. van der Voort, Thomas Phil and Stefan Klein. Workflow for Optimal Radiomics Classification (WORC). Zenodo (2018). Available from: https://github.com/MStarmans91/WORC. DOI: http://doi.org/10.5281/zenodo.3840534.``
+
+For the DOI, visit [![][DOI]][DOI-lnk].
+
## Disclaimer
This package is still under development. We try to thoroughly test and evaluate every new build and function, but
bugs can off course still occur. Please contact us through the channels below if you find any and we will try to fix
@@ -86,15 +95,6 @@ Besides a Jupyter notebook with instructions, we provide there also an example s
- We are writing the paper on WORC.
- We are expanding the example experiments of WORC with open source datasets.
-## License
-This package is covered by the open source [APACHE 2.0 License](APACHE-LICENSE-2.0).
-
-When using WORC, please cite this repository as following:
-
-``Martijn P.A. Starmans, Sebastian R. van der Voort, Thomas Phil and Stefan Klein. Workflow for Optimal Radiomics Classification (WORC). Zenodo (2018). Available from: https://github.com/MStarmans91/WORC. DOI: http://doi.org/10.5281/zenodo.3840534.``
-
-For the DOI, visit [![][DOI]][DOI-lnk].
-
## Contact
We are happy to help you with any questions. Please sent us a mail or place an issue on the Github.
diff --git a/README.rst b/README.rst
index c850c77e..851cb773 100644
--- a/README.rst
+++ b/README.rst
@@ -1,4 +1,4 @@
-WORC v3.4.0
+WORC v3.4.1
===========
Workflow for Optimal Radiomics Classification
@@ -28,6 +28,18 @@ comparison of different radiomics approaches. By combining this in a
single framework, we hope to find a universal radiomics strategy that
can address various problems.
+License
+-------
+
+This package is covered by the open source `APACHE 2.0
+License `__.
+
+When using WORC, please cite this repository as following:
+
+``Martijn P.A. Starmans, Sebastian R. van der Voort, Thomas Phil and Stefan Klein. Workflow for Optimal Radiomics Classification (WORC). Zenodo (2018). Available from: https://github.com/MStarmans91/WORC. DOI: http://doi.org/10.5281/zenodo.3840534.``
+
+For the DOI, visit |image5|.
+
Disclaimer
----------
@@ -111,18 +123,6 @@ WIP
- We are expanding the example experiments of WORC with open source
datasets.
-License
--------
-
-This package is covered by the open source `APACHE 2.0
-License `__.
-
-When using WORC, please cite this repository as following:
-
-``Martijn P.A. Starmans, Sebastian R. van der Voort, Thomas Phil and Stefan Klein. Workflow for Optimal Radiomics Classification (WORC). Zenodo (2018). Available from: https://github.com/MStarmans91/WORC. DOI: http://doi.org/10.5281/zenodo.3840534.``
-
-For the DOI, visit |image5|.
-
Contact
-------
diff --git a/WORC/IOparser/config_io_classifier.py b/WORC/IOparser/config_io_classifier.py
index 35f750a8..f3582efb 100644
--- a/WORC/IOparser/config_io_classifier.py
+++ b/WORC/IOparser/config_io_classifier.py
@@ -1,6 +1,6 @@
#!/usr/bin/env python
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2021 Biomedical Imaging Group Rotterdam, Departments of
# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
#
# Licensed under the Apache License, Version 2.0 (the "License");
@@ -131,9 +131,16 @@ def load_config(config_file_path):
[int(str(item).strip()) for item in
settings['Featsel']['ReliefNumFeatures'].split(',')]
+ # Feature preprocessing before the whole HyperOptimization
settings_dict['FeatPreProcess']['Use'] =\
[str(settings['FeatPreProcess']['Use'])]
+ settings_dict['FeatPreProcess']['Combine'] =\
+ settings['FeatPreProcess'].getboolean('Combine')
+
+ settings_dict['FeatPreProcess']['Combine_method'] =\
+ str(settings['FeatPreProcess']['Combine_method'])
+
# Imputation
settings_dict['Imputation']['use'] =\
[str(item).strip() for item in
diff --git a/WORC/IOparser/file_io.py b/WORC/IOparser/file_io.py
index 051d9641..d45eaf2d 100644
--- a/WORC/IOparser/file_io.py
+++ b/WORC/IOparser/file_io.py
@@ -1,6 +1,6 @@
#!/usr/bin/env python
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2021 Biomedical Imaging Group Rotterdam, Departments of
# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
#
# Licensed under the Apache License, Version 2.0 (the "License");
@@ -23,8 +23,10 @@
import os
-def load_data(featurefiles, patientinfo=None, label_names=None, modnames=[]):
- ''' Read feature files and stack the features per patient in an array.
+def load_data(featurefiles, patientinfo=None, label_names=None, modnames=[],
+ combine_features=False, combine_method='mean'):
+ """Read feature files and stack the features per patient in an array.
+
Additionally, if a patient label file is supplied, the features from
a patient will be matched to the labels.
@@ -44,8 +46,13 @@ def load_data(featurefiles, patientinfo=None, label_names=None, modnames=[]):
List containing all the labels that should be extracted from
the patientinfo file.
- '''
+ combine_features: boolean, default False
+ Determines whether to combine the features from all samples
+ of the same patient or not.
+ combine_methods: string, mean or max
+ If features per patient should be combined, determine how.
+ """
# Read out all feature values and labels
image_features_temp = list()
feature_labels_all = list()
@@ -138,11 +145,64 @@ def load_data(featurefiles, patientinfo=None, label_names=None, modnames=[]):
label_data = dict()
label_data['patient_IDs'] = patient_IDs
+ # Optionally, combine features of same patient
+ if combine_features:
+ print('Combining features of the same patient.')
+ feature_labels = image_features[0][1]
+ label_name = label_data['label_name']
+ new_label_data = list()
+ new_pids = list()
+ new_features = list()
+ pid_length = len(label_data['patient_IDs'])
+ print(f'\tOriginal number of samples / patients: {pid_length}.')
+
+ already_processed = list()
+ for pnum, pid in enumerate(label_data['patient_IDs']):
+ if pid not in already_processed:
+ # NOTE: should check whether we have already processed this patient
+ occurrences = list(label_data['patient_IDs']).count(pid)
+
+ # NOTE: Assume all object from one patient have the same label
+ label = label_data['label'][0][pnum]
+ new_label_data.append(label)
+ new_pids.append(pid)
+
+ # Only process patients which occur multiple times
+ if occurrences > 1:
+ print(f'\tFound {occurrences} occurrences for {pid}.')
+ indices = [i for i, x in enumerate(label_data['patient_IDs']) if x == pid]
+ feature_values_thispatient = np.asarray([image_features[i][0] for i in indices])
+ if combine_method == 'mean':
+ feature_values_thispatient = np.nanmean(feature_values_thispatient, axis=0).tolist()
+ else:
+ raise WORCexceptions.KeyError(f'{combine_method} is not a valid combination method, should be mean or max.')
+ features = (feature_values_thispatient, feature_labels)
+
+ # And add the new one
+ new_features.append(features)
+ else:
+ new_features.append(image_features[pnum])
+
+ already_processed.append(pid)
+
+ # Adjust the labels and features for further processing
+ label_data = dict()
+ label_data['patient_IDs'] = np.asarray(new_pids)
+ label_data['label'] = np.asarray([new_label_data])
+ label_data['label_name'] = label_name
+
+ image_features = new_features
+
+ pid_length = len(label_data['patient_IDs'])
+ print(f'\tNumber of samples / patients after combining: {pid_length}.')
+
return label_data, image_features
-def load_features(feat, patientinfo, label_type):
- ''' Read feature files and stack the features per patient in an array.
+def load_features(feat, patientinfo, label_type, combine_features=False,
+ combine_method='mean'):
+ """Read feature files and stack the features per patient in an array.
+
Additionally, if a patient label file is supplied, the features from
a patient will be matched to the labels.
@@ -162,7 +222,14 @@ def load_features(feat, patientinfo, label_type):
List containing all the labels that should be extracted from
the patientinfo file.
- '''
+ combine_features: boolean, default False
+ Determines whether to combine the features from all samples
+ of the same patient or not.
+
+ combine_methods: string, mean or max
+ If features per patient should be combined, determine how.
+
+ """
# Check if features is a simple list, or just one string
if '=' not in feat[0]:
feat = ['Mod0=' + ','.join(feat)]
@@ -186,7 +253,9 @@ def load_features(feat, patientinfo, label_type):
# Read the features and classification data
label_data, image_features =\
load_data(feat, patientinfo,
- label_type, modnames)
+ label_type, modnames,
+ combine_features,
+ combine_method)
return label_data, image_features
diff --git a/WORC/WORC.py b/WORC/WORC.py
index 801bb2eb..2a8b0aca 100644
--- a/WORC/WORC.py
+++ b/WORC/WORC.py
@@ -1,6 +1,6 @@
#!/usr/bin/env python
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2021 Biomedical Imaging Group Rotterdam, Departments of
# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
#
# Licensed under the Apache License, Version 2.0 (the "License");
@@ -142,6 +142,8 @@ def __init__(self, name='test'):
self.Elastix_Para = list()
self.label_names = 'Label1, Label2'
+ self.fixedsplits = list()
+
# Set some defaults, name
self.fastr_plugin = 'LinearExecution'
if name == '':
@@ -343,9 +345,11 @@ def defaultconfig(self):
config['FeatureScaling']['scaling_method'] = 'robust_z_score'
config['FeatureScaling']['skip_features'] = 'semf_, pf_'
- # Feature preprocessing before all below takes place
+ # Feature preprocessing before the whole HyperOptimization
config['FeatPreProcess'] = dict()
config['FeatPreProcess']['Use'] = 'False'
+ config['FeatPreProcess']['Combine'] = 'False'
+ config['FeatPreProcess']['Combine_method'] = 'mean'
# Feature selection
config['Featsel'] = dict()
@@ -405,7 +409,7 @@ def defaultconfig(self):
'RandomUnderSampling, RandomOverSampling, NearMiss, ' +\
'NeighbourhoodCleaningRule, ADASYN, BorderlineSMOTE, SMOTE, ' +\
'SMOTEENN, SMOTETomek'
- config['Resampling']['sampling_strategy'] = 'majority, not minority, not majority, all'
+ config['Resampling']['sampling_strategy'] = 'majority, minority, not minority, not majority, all'
config['Resampling']['n_neighbors'] = '3, 12'
config['Resampling']['k_neighbors'] = '5, 15'
config['Resampling']['threshold_cleaning'] = '0.25, 0.5'
@@ -443,7 +447,7 @@ def defaultconfig(self):
config['Classification']['ElasticNet_l1_ratio'] = '0, 1'
config['Classification']['SGD_alpha'] = '-5, 5'
config['Classification']['SGD_l1_ratio'] = '0, 1'
- config['Classification']['SGD_loss'] = 'hinge, squared_hinge, modified_huber'
+ config['Classification']['SGD_loss'] = 'squared_loss, huber, epsilon_insensitive, squared_epsilon_insensitive'
config['Classification']['SGD_penalty'] = 'none, l2, l1'
config['Classification']['CNB_alpha'] = '0, 1'
config['Classification']['AdaBoost_n_estimators'] = config['Classification']['RFn_estimators']
@@ -467,7 +471,7 @@ def defaultconfig(self):
# Hyperparameter optimization options
config['HyperOptimization'] = dict()
- config['HyperOptimization']['scoring_method'] = 'f1_weighted_predictproba'
+ config['HyperOptimization']['scoring_method'] = 'f1_weighted'
config['HyperOptimization']['test_size'] = '0.15'
config['HyperOptimization']['n_splits'] = '5'
config['HyperOptimization']['N_iterations'] = '1000'
@@ -556,6 +560,10 @@ def build_training(self):
resources=ResourceLimit(memory=memory),
step_id='WorkflowOptimization')
+ if self.fixedsplits:
+ self.fixedsplits_node = self.network.create_source('CSVFile', id='fixedsplits_source', node_group='conf', step_id='general_sources')
+ self.classify.inputs['fixedsplits'] = self.fixedsplits_node.output
+
self.source_Ensemble =\
self.network.create_constant('String', [self.configs[0]['Ensemble']['Use']],
id='Ensemble',
@@ -1632,6 +1640,10 @@ def set(self):
# Save the configurations as files
self.save_config()
+ # fixed splits
+ if self.fixedsplits:
+ self.source_data['fixedsplits_source'] = self.fixedsplits
+
# Generate gridsearch parameter files if required
self.source_data['config_classification_source'] = self.fastrconfigs[0]
@@ -1645,6 +1657,8 @@ def set(self):
self.sink_data['features_train_ComBat'] = ("vfs://output/{}/ComBat/features_ComBat_{{sample_id}}_{{cardinality}}{{ext}}").format(self.name)
self.sink_data['features_test_ComBat'] = ("vfs://output/{}/ComBat/features_ComBat_{{sample_id}}_{{cardinality}}{{ext}}").format(self.name)
+
+
# Set the source data from the WORC objects you created
for num, label in enumerate(self.modlabels):
self.source_data['config_' + label] = self.fastrconfigs[num]
@@ -1755,7 +1769,7 @@ def execute(self):
self.network.execute(self.source_data, self.sink_data, execution_plugin=self.fastr_plugin, tmpdir=self.fastr_tmpdir)
- def add_evaluation(self, label_type):
+ def add_evaluation(self, label_type, modus='classification'):
"""Add branch for evaluation of performance to network.
Note: should be done after build, before set:
@@ -1765,7 +1779,8 @@ def add_evaluation(self, label_type):
WORC.execute()
"""
- self.Evaluate = Evaluate(label_type=label_type, parent=self)
+ self.Evaluate =\
+ Evaluate(label_type=label_type, parent=self, modus=modus)
self._add_evaluation = True
def save_config(self):
diff --git a/WORC/addexceptions.py b/WORC/addexceptions.py
index 39822712..df6660f3 100644
--- a/WORC/addexceptions.py
+++ b/WORC/addexceptions.py
@@ -1,6 +1,6 @@
#!/usr/bin/env python
-# Copyright 2016-2019 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2021 Biomedical Imaging Group Rotterdam, Departments of
# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
#
# Licensed under the Apache License, Version 2.0 (the "License");
@@ -19,10 +19,6 @@
This module contains all WORC-related Exceptions
"""
-# import inspect
-# import os
-# import textwrap
-
# pylint: disable=too-many-ancestors
# Because fo inheriting from FastrError and a common exception causes this
# exception, even though this behaviour is desired
@@ -33,43 +29,6 @@ class WORCError(Exception):
This is the base class for all WORC related exceptions. Catching this
class of exceptions should ensure a proper execution of WORC.
"""
- # def __init__(self, *args, **kwargs):
- # """
- # Constructor for all exceptions. Saves the caller object fullid (if
- # found) and the file, function and line number where the object was
- # created.
- # """
- # super(WORCError, self).__init__(*args, **kwargs)
- #
- # frame = inspect.stack()[1][0]
- # call_object = frame.f_locals.get('self', None)
- # if call_object is not None and hasattr(call_object, 'fullid'):
- # self.WORC_object = call_object.fullid
- # else:
- # self.WORC_object = None
- #
- # info = inspect.getframeinfo(frame)
- # self.filename = info.filename
- # self.function = info.function
- # self.linenumber = info.lineno
- #
- # def __str__(self):
- # """
- # String representation of the error
- #
- # :return: error string
- # :rtype: str
- # """
- # if self.WORC_object is not None:
- # return '[{}] {}'.format(self.WORC_object, super(WORCError, self).__str__())
- # else:
- # return super(WORCError, self).__str__()
- #
- # def excerpt(self):
- # """
- # Return a excerpt of the Error as a tuple.
- # """
- # return type(self).__name__, self.message, self.filename, self.linenumber
pass
diff --git a/WORC/classification/ObjectSampler.py b/WORC/classification/ObjectSampler.py
index 6bb59e74..5491270d 100644
--- a/WORC/classification/ObjectSampler.py
+++ b/WORC/classification/ObjectSampler.py
@@ -1,6 +1,6 @@
#!/usr/bin/env python
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2021 Biomedical Imaging Group Rotterdam, Departments of
# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
#
# Licensed under the Apache License, Version 2.0 (the "License");
@@ -32,7 +32,7 @@ class ObjectSampler(object):
"""
- def __init__(self, method,
+ def __init__(self, method, random_seed,
sampling_strategy='auto',
n_jobs=1,
n_neighbors=3,
@@ -41,8 +41,8 @@ def __init__(self, method,
verbose=True):
"""Initialize object."""
# Initialize a random state
- self.random_seed = np.random.randint(5000)
- self.random_state = check_random_state(self.random_seed)
+ self.random_seed = random_seed
+ self.random_state = check_random_state(random_seed)
# Initialize all objects as Nones: overriden when required by functions
self.object = None
diff --git a/WORC/classification/SearchCV.py b/WORC/classification/SearchCV.py
index 4dc5e140..47c8c500 100644
--- a/WORC/classification/SearchCV.py
+++ b/WORC/classification/SearchCV.py
@@ -1,6 +1,6 @@
#!/usr/bin/env python
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2021 Biomedical Imaging Group Rotterdam, Departments of
# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
#
# Licensed under the Apache License, Version 2.0 (the "License");
diff --git a/WORC/classification/construct_classifier.py b/WORC/classification/construct_classifier.py
index ae1cfdd5..ab3ddf49 100644
--- a/WORC/classification/construct_classifier.py
+++ b/WORC/classification/construct_classifier.py
@@ -1,6 +1,6 @@
#!/usr/bin/env python
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2021 Biomedical Imaging Group Rotterdam, Departments of
# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
#
# Licensed under the Apache License, Version 2.0 (the "License");
@@ -20,7 +20,8 @@
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.ensemble import AdaBoostClassifier, AdaBoostRegressor
from sklearn.linear_model import SGDClassifier, ElasticNet, SGDRegressor
-from sklearn.linear_model import LogisticRegression, Lasso
+from sklearn.linear_model import LogisticRegression, LinearRegression, Lasso
+from sklearn.linear_model import Ridge
from sklearn.naive_bayes import GaussianNB, ComplementNB
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis as QDA
@@ -145,7 +146,7 @@ def construct_classifier(config):
elif config['classifiers'] == 'SGDR':
# Stochastic Gradient Descent regressor
- classifier = SGDRegressor(n_iter=config['max_iter'],
+ classifier = SGDRegressor(max_iter=config['max_iter'],
alpha=config['SGD_alpha'],
l1_ratio=config['SGD_l1_ratio'],
loss=config['SGD_loss'],
@@ -168,6 +169,17 @@ def construct_classifier(config):
l1_ratio=config['LR_l1_ratio'],
C=config['LRC'],
random_state=config['random_seed'])
+
+ elif config['classifiers'] == 'LinR':
+ # Linear Regression
+ classifier = LinearRegression()
+
+ elif config['classifiers'] == 'Ridge':
+ # Ridge Regression
+ classifier = Ridge(alpha=config['ElasticNet_alpha'],
+ max_iter=max_iter,
+ random_state=config['random_seed'])
+
elif config['classifiers'] == 'GaussianNB':
# Naive Bayes classifier using Gaussian distributions
classifier = GaussianNB()
@@ -214,7 +226,7 @@ def construct_SVM(config, regression=False):
clf = SVC(class_weight='balanced', probability=True, max_iter=max_iter,
random_state=config['random_seed'])
else:
- clf = SVMR(max_iter=max_iter, random_state=config['random_seed'])
+ clf = SVMR(max_iter=max_iter)
clf.kernel = str(config['SVMKernel'])
clf.C = config['SVMC']
diff --git a/WORC/classification/crossval.py b/WORC/classification/crossval.py
index 8a00bf62..25061d1a 100644
--- a/WORC/classification/crossval.py
+++ b/WORC/classification/crossval.py
@@ -60,6 +60,11 @@ def random_split_cross_validation(image_features, feature_labels, classes,
# Start from zero, thus empty list of previos data
save_data = list()
+ # If we are using fixed splits, set the n_iterations to the number of splits
+ if fixedsplits is not None:
+ n_iterations = int(fixedsplits.columns.shape[0] / 2)
+ print(f'Fixedsplits detected, adjusting n_iterations to {n_iterations}')
+
for i in range(start, n_iterations):
print(('Cross-validation iteration {} / {} .').format(str(i + 1), str(n_iterations)))
logging.debug(('Cross-validation iteration {} / {} .').format(str(i + 1), str(n_iterations)))
@@ -77,6 +82,7 @@ def random_split_cross_validation(image_features, feature_labels, classes,
# label is maintained
if any(clf in regressors for clf in param_grid['classifiers']):
# We cannot do a stratified shuffle split with regression
+ classes_temp = classes
stratify = None
else:
if modus == 'singlelabel':
@@ -161,8 +167,8 @@ def random_split_cross_validation(image_features, feature_labels, classes,
else:
# Use pre defined splits
- train = fixedsplits[str(i) + '_train'].values
- test = fixedsplits[str(i) + '_test'].values
+ train = fixedsplits[str(i) + '_train'].dropna().values
+ test = fixedsplits[str(i) + '_test'].dropna().values
# Convert the numbers to the correct indices
ind_train = list()
@@ -517,6 +523,11 @@ def crossval(config, label_data, image_features,
if fixedsplits is not None and '.csv' in fixedsplits:
fixedsplits = pd.read_csv(fixedsplits, header=0)
+ # Fixedsplits need to be performed in random split fashion, makes no sense for LOO
+ if crossval_type == 'LOO':
+ print('[WORC WARNING] Fixedsplits need to be performed in random split fashion, makes no sense for LOO.')
+ crossval_type = 'random_split'
+
if modus == 'singlelabel':
print('Performing single-class classification.')
logging.debug('Performing single-class classification.')
diff --git a/WORC/classification/fitandscore.py b/WORC/classification/fitandscore.py
index cab5e1e2..d71ebad4 100644
--- a/WORC/classification/fitandscore.py
+++ b/WORC/classification/fitandscore.py
@@ -1,6 +1,6 @@
#!/usr/bin/env python
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2021 Biomedical Imaging Group Rotterdam, Departments of
# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
#
# Licensed under the Apache License, Version 2.0 (the "License");
@@ -540,16 +540,17 @@ def fit_and_score(X, y, scoring,
if model == 'Lasso':
# Use lasso model for feature selection
alpha = para_estimator['SelectFromModel_lasso_alpha']
- selectestimator = Lasso(alpha=alpha)
+ selectestimator = Lasso(alpha=alpha, random_state=random_seed)
elif model == 'LR':
# Use logistic regression model for feature selection
- selectestimator = LogisticRegression()
+ selectestimator = LogisticRegression(random_state=random_seed)
elif model == 'RF':
# Use random forest model for feature selection
n_estimators = para_estimator['SelectFromModel_n_trees']
- selectestimator = RandomForestClassifier(n_estimators=n_estimators)
+ selectestimator = RandomForestClassifier(n_estimators=n_estimators,
+ random_state=random_seed)
else:
raise ae.WORCKeyError(f'Model {model} is not known for SelectFromModel. Use Lasso, LR, or RF.')
@@ -614,6 +615,7 @@ def fit_and_score(X, y, scoring,
if verbose:
print(f'[WARNING]: skipping this setting due to PCA Error: {e}.')
+ pca = None
if return_all:
return ret, GroupSel, VarSel, SelectModel, feature_labels[0], scaler, encoder, imputer, pca, StatisticalSel, ReliefSel, Sampler
else:
@@ -634,6 +636,7 @@ def fit_and_score(X, y, scoring,
if verbose:
print(f'[WARNING]: skipping this setting due to PCA Error: {e}.')
+ pca = None
if return_all:
return ret, GroupSel, VarSel, SelectModel, feature_labels[0], scaler, encoder, imputer, pca, StatisticalSel, ReliefSel, Sampler
else:
@@ -652,7 +655,18 @@ def fit_and_score(X, y, scoring,
print(f"[WORC WARNING] PCA n_components ({n_components})> n_features ({len(X_train[0])}): skipping PCA.")
else:
pca = PCA(n_components=n_components, random_state=random_seed)
- pca.fit(X_train)
+ try:
+ pca.fit(X_train)
+ except (ValueError, LinAlgError) as e:
+ if verbose:
+ print(f'[WARNING]: skipping this setting due to PCA Error: {e}.')
+
+ pca = None
+ if return_all:
+ return ret, GroupSel, VarSel, SelectModel, feature_labels[0], scaler, encoder, imputer, pca, StatisticalSel, ReliefSel, Sampler
+ else:
+ return ret
+
X_train = pca.transform(X_train)
X_test = pca.transform(X_test)
@@ -722,7 +736,8 @@ def fit_and_score(X, y, scoring,
n_neighbors=para_estimator['Resampling_n_neighbors'],
k_neighbors=para_estimator['Resampling_k_neighbors'],
threshold_cleaning=para_estimator['Resampling_threshold_cleaning'],
- verbose=verbose)
+ verbose=verbose,
+ random_seed=random_seed)
try:
Sampler.fit(X_train, y_train)
diff --git a/WORC/classification/metrics.py b/WORC/classification/metrics.py
index d064fe95..bad17608 100644
--- a/WORC/classification/metrics.py
+++ b/WORC/classification/metrics.py
@@ -34,6 +34,7 @@ def performance_singlelabel(y_truth, y_prediction, y_score, regression=False):
Singleclass performance metrics
'''
if regression:
+ y_truth = np.array(y_truth).flatten()
r2score = metrics.r2_score(y_truth, y_prediction)
MSE = metrics.mean_squared_error(y_truth, y_prediction)
coefICC = ICC(np.column_stack((y_prediction, y_truth)))
diff --git a/WORC/classification/regressors.py b/WORC/classification/regressors.py
index cf2aa2eb..4bb9d247 100644
--- a/WORC/classification/regressors.py
+++ b/WORC/classification/regressors.py
@@ -1,6 +1,6 @@
#!/usr/bin/env python
-# Copyright 2016-2019 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2021 Biomedical Imaging Group Rotterdam, Departments of
# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
#
# Licensed under the Apache License, Version 2.0 (the "License");
@@ -16,4 +16,4 @@
# limitations under the License.
# Define all possible regressors
-regressors = ['SVR', 'RFR', 'SGDR', 'Lasso', 'ElasticNet']
+regressors = ['SVR', 'RFR', 'ElasticNet', 'Lasso', 'SGDR', 'XGBRegressor', 'AdaBoostRegressor', 'LinR', 'Ridge']
diff --git a/WORC/classification/trainclassifier.py b/WORC/classification/trainclassifier.py
index 7773bfb8..5f53b2bf 100644
--- a/WORC/classification/trainclassifier.py
+++ b/WORC/classification/trainclassifier.py
@@ -1,6 +1,6 @@
#!/usr/bin/env python
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2021 Biomedical Imaging Group Rotterdam, Departments of
# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
#
# Licensed under the Apache License, Version 2.0 (the "License");
@@ -109,14 +109,18 @@ def trainclassifier(feat_train, patientinfo_train, config,
config = config_io.load_config(config)
label_type = config['Labels']['label_names']
modus = config['Labels']['modus']
+ combine_features = config['FeatPreProcess']['Combine']
+ combine_method = config['FeatPreProcess']['Combine_method']
# Load the feature files and match to label data
label_data_train, image_features_train =\
- load_features(feat_train, patientinfo_train, label_type)
+ load_features(feat_train, patientinfo_train, label_type,
+ combine_features, combine_method)
if feat_test:
label_data_test, image_features_test =\
- load_features(feat_test, patientinfo_test, label_type)
+ load_features(feat_test, patientinfo_test, label_type,
+ combine_features, combine_method)
# Create tempdir name from patientinfo file name
basename = os.path.basename(patientinfo_train)
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.IOparser.doctree b/WORC/doc/_build/doctrees/autogen/WORC.IOparser.doctree
index d36efc2d..1a65a597 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.IOparser.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.IOparser.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.classification.doctree b/WORC/doc/_build/doctrees/autogen/WORC.classification.doctree
index f48cdb84..5eeea9d5 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.classification.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.classification.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.config.doctree b/WORC/doc/_build/doctrees/autogen/WORC.config.doctree
index fe5503d3..9a89809b 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.config.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.config.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.detectors.doctree b/WORC/doc/_build/doctrees/autogen/WORC.detectors.doctree
index df1f0090..b9d0aed1 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.detectors.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.detectors.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.doctree b/WORC/doc/_build/doctrees/autogen/WORC.doctree
index 7e84230c..36f3f783 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.exampledata.doctree b/WORC/doc/_build/doctrees/autogen/WORC.exampledata.doctree
index 436aafc7..cd080ad5 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.exampledata.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.exampledata.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.facade.doctree b/WORC/doc/_build/doctrees/autogen/WORC.facade.doctree
index 087c35ed..14329baf 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.facade.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.facade.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.featureprocessing.doctree b/WORC/doc/_build/doctrees/autogen/WORC.featureprocessing.doctree
index 72507df9..b25a75e2 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.featureprocessing.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.featureprocessing.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.plotting.doctree b/WORC/doc/_build/doctrees/autogen/WORC.plotting.doctree
index 82797c97..7092eb14 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.plotting.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.plotting.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.processing.doctree b/WORC/doc/_build/doctrees/autogen/WORC.processing.doctree
index acc934f7..2e27f347 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.processing.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.processing.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.resources.doctree b/WORC/doc/_build/doctrees/autogen/WORC.resources.doctree
index 1580f4ae..9cb804a5 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.resources.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.resources.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.resources.fastr_tests.doctree b/WORC/doc/_build/doctrees/autogen/WORC.resources.fastr_tests.doctree
index 17237717..3b15532b 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.resources.fastr_tests.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.resources.fastr_tests.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.resources.fastr_tools.doctree b/WORC/doc/_build/doctrees/autogen/WORC.resources.fastr_tools.doctree
index 03e58961..5f50414d 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.resources.fastr_tools.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.resources.fastr_tools.doctree differ
diff --git a/WORC/doc/_build/doctrees/autogen/WORC.tools.doctree b/WORC/doc/_build/doctrees/autogen/WORC.tools.doctree
index 919f1cfc..061b3403 100644
Binary files a/WORC/doc/_build/doctrees/autogen/WORC.tools.doctree and b/WORC/doc/_build/doctrees/autogen/WORC.tools.doctree differ
diff --git a/WORC/doc/_build/doctrees/environment.pickle b/WORC/doc/_build/doctrees/environment.pickle
index dae49052..cbab534a 100644
Binary files a/WORC/doc/_build/doctrees/environment.pickle and b/WORC/doc/_build/doctrees/environment.pickle differ
diff --git a/WORC/doc/_build/doctrees/index.doctree b/WORC/doc/_build/doctrees/index.doctree
index be382f9e..92459e7c 100644
Binary files a/WORC/doc/_build/doctrees/index.doctree and b/WORC/doc/_build/doctrees/index.doctree differ
diff --git a/WORC/doc/_build/doctrees/static/changelog.doctree b/WORC/doc/_build/doctrees/static/changelog.doctree
index b1698c29..141fa9d1 100644
Binary files a/WORC/doc/_build/doctrees/static/changelog.doctree and b/WORC/doc/_build/doctrees/static/changelog.doctree differ
diff --git a/WORC/doc/_build/doctrees/static/configuration.doctree b/WORC/doc/_build/doctrees/static/configuration.doctree
index f3c64033..7a41317b 100644
Binary files a/WORC/doc/_build/doctrees/static/configuration.doctree and b/WORC/doc/_build/doctrees/static/configuration.doctree differ
diff --git a/WORC/doc/_build/doctrees/static/file_description.doctree b/WORC/doc/_build/doctrees/static/file_description.doctree
index 859b7b09..b4bdbd41 100644
Binary files a/WORC/doc/_build/doctrees/static/file_description.doctree and b/WORC/doc/_build/doctrees/static/file_description.doctree differ
diff --git a/WORC/doc/_build/doctrees/static/introduction.doctree b/WORC/doc/_build/doctrees/static/introduction.doctree
index 48deb457..36565909 100644
Binary files a/WORC/doc/_build/doctrees/static/introduction.doctree and b/WORC/doc/_build/doctrees/static/introduction.doctree differ
diff --git a/WORC/doc/_build/doctrees/static/quick_start.doctree b/WORC/doc/_build/doctrees/static/quick_start.doctree
index 382eb9e5..3495ef58 100644
Binary files a/WORC/doc/_build/doctrees/static/quick_start.doctree and b/WORC/doc/_build/doctrees/static/quick_start.doctree differ
diff --git a/WORC/doc/_build/doctrees/static/user_manual.doctree b/WORC/doc/_build/doctrees/static/user_manual.doctree
index fe60689b..901ae5eb 100644
Binary files a/WORC/doc/_build/doctrees/static/user_manual.doctree and b/WORC/doc/_build/doctrees/static/user_manual.doctree differ
diff --git a/WORC/doc/_build/html/.buildinfo b/WORC/doc/_build/html/.buildinfo
index 9ffab4d9..7902be9b 100644
--- a/WORC/doc/_build/html/.buildinfo
+++ b/WORC/doc/_build/html/.buildinfo
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: 257c3dc15c2579567adee9a27eea0949
+config: 73f1da5a99fea16f90c552d633d66cea
tags: 645f666f9bcd5a90fca523b33c5a78b7
diff --git a/WORC/doc/_build/html/_modules/WORC/IOparser/config_WORC.html b/WORC/doc/_build/html/_modules/WORC/IOparser/config_WORC.html
index 60b65bf7..dc79c9bc 100644
--- a/WORC/doc/_build/html/_modules/WORC/IOparser/config_WORC.html
+++ b/WORC/doc/_build/html/_modules/WORC/IOparser/config_WORC.html
@@ -1,40 +1,39 @@
-
+
+
- WORC.IOparser.config_WORC — WORC 3.4.0 documentation
+ WORC.IOparser.config_WORC — WORC 3.4.1 documentation
-
-
-
-
-
-
+
-
-
-
-
+
+
+
+
+
+
+
+
+
@@ -50,7 +49,7 @@
- WORC
+ WORC
@@ -63,7 +62,7 @@
settings_dict: dictionary containing all parsed settings. """ifnotos.path.exists(config_file_path):
- e=f'File {config_file_path} does not exist!'
+ e=f'File {config_file_path} does not exist!'raiseae.WORCKeyError(e)settings=configparser.ConfigParser()
@@ -237,19 +234,11 @@
Source code for WORC.IOparser.config_io_classifier
#!/usr/bin/env python
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2021 Biomedical Imaging Group Rotterdam, Departments of# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands## Licensed under the Apache License, Version 2.0 (the "License");
@@ -200,7 +197,7 @@
Source code for WORC.IOparser.config_io_classifier
"""
ifnotos.path.exists(config_file_path):
- e=f'File {config_file_path} does not exist!'
+ e=f'File {config_file_path} does not exist!'raiseae.WORCKeyError(e)settings=configparser.ConfigParser()
@@ -299,9 +296,16 @@
Source code for WORC.IOparser.config_io_classifier
Source code for WORC.IOparser.config_preprocessing
settings_dict: dictionary containing all parsed settings.
"""ifnotos.path.exists(config_file_path):
- e=f'File {config_file_path} does not exist!'
+ e=f'File {config_file_path} does not exist!'raiseae.WORCKeyError(e)settings=configparser.ConfigParser()
@@ -229,7 +226,7 @@
Source code for WORC.IOparser.config_preprocessing
settings['Preprocessing']['Clipping_Range'].split(',')]iflen(settings_dict['Preprocessing']['Clipping_Range'])!=2:
- raiseae.WORCValueError(f"Clipping range should be two floats split by a comma, got {settings['Preprocessing']['Clipping_Range']}.")
+ raiseae.WORCValueError(f"Clipping range should be two floats split by a comma, got {settings['Preprocessing']['Clipping_Range']}.")# Normalizationsettings_dict['Preprocessing']['Normalize']=\
@@ -274,7 +271,7 @@
Source code for WORC.IOparser.config_preprocessing
if len(settings_dict['Preprocessing']['Resampling_spacing'])!=3:s=settings_dict['Preprocessing']['Resampling_spacing']
- raiseae.WORCValueError(f'Resampling spacing should be three elements, got {s}')
+ raiseae.WORCValueError(f'Resampling spacing should be three elements, got {s}')returnsettings_dict
@@ -289,19 +286,11 @@
Source code for WORC.IOparser.config_preprocessing
settings_dict: dictionary containing all parsed settings.
"""ifnotos.path.exists(config_file_path):
- e=f'File {config_file_path} does not exist!'
+ e=f'File {config_file_path} does not exist!'raiseae.WORCKeyError(e)settings=configparser.ConfigParser()
@@ -254,7 +251,7 @@
Source code for WORC.IOparser.config_segmentix
if len(settings_dict['Preprocessing']['Resampling_spacing'])!=3:s=settings_dict['Preprocessing']['Resampling_spacing']
- raiseae.WORCValueError(f'Resampling spacing should be three elements, got {s}')
+ raiseae.WORCValueError(f'Resampling spacing should be three elements, got {s}')returnsettings_dict
#!/usr/bin/env python
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2021 Biomedical Imaging Group Rotterdam, Departments of# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands## Licensed under the Apache License, Version 2.0 (the "License");
@@ -191,8 +188,10 @@
Source code for WORC.IOparser.file_io
importos
-
[docs]defload_data(featurefiles,patientinfo=None,label_names=None,modnames=[]):
- ''' Read feature files and stack the features per patient in an array.
+
[docs]defload_data(featurefiles,patientinfo=None,label_names=None,modnames=[],
+ combine_features=False,combine_method='mean'):
+ """Read feature files and stack the features per patient in an array.
+
Additionally, if a patient label file is supplied, the features from a patient will be matched to the labels.
@@ -212,8 +211,13 @@
Source code for WORC.IOparser.file_io
List containing all the labels that should be extracted from the patientinfo file.
- '''
+ combine_features: boolean, default False
+ Determines whether to combine the features from all samples
+ of the same patient or not.
+ combine_methods: string, mean or max
+ If features per patient should be combined, determine how.
+ """# Read out all feature values and labelsimage_features_temp=list()feature_labels_all=list()
@@ -243,11 +247,11 @@
Source code for WORC.IOparser.file_io
# Check when we found patient ID's, if we did for all objectsifpids:iflen(pids)!=len(image_features_temp):
- raiseWORCexceptions.WORCValueError(f'Length of pids {len(pids)}'+
+ raiseWORCexceptions.WORCValueError(f'Length of pids {len(pids)}'+'does not match'+'number of objects '+str(len(image_features_temp))+
- f'Found {pids}.')
+ f'Found {pids}.')# If some objects miss certain features, we will identify these with NaN valuesfeature_labels_all.sort()
@@ -306,11 +310,64 @@
Source code for WORC.IOparser.file_io
label_data=dict()label_data['patient_IDs']=patient_IDs
+ # Optionally, combine features of same patient
+ ifcombine_features:
+ print('Combining features of the same patient.')
+ feature_labels=image_features[0][1]
+ label_name=label_data['label_name']
+ new_label_data=list()
+ new_pids=list()
+ new_features=list()
+ pid_length=len(label_data['patient_IDs'])
+ print(f'\tOriginal number of samples / patients: {pid_length}.')
+
+ already_processed=list()
+ forpnum,pidinenumerate(label_data['patient_IDs']):
+ ifpidnotinalready_processed:
+ # NOTE: should check whether we have already processed this patient
+ occurrences=list(label_data['patient_IDs']).count(pid)
+
+ # NOTE: Assume all object from one patient have the same label
+ label=label_data['label'][0][pnum]
+ new_label_data.append(label)
+ new_pids.append(pid)
+
+ # Only process patients which occur multiple times
+ ifoccurrences>1:
+ print(f'\tFound {occurrences} occurrences for {pid}.')
+ indices=[ifori,xinenumerate(label_data['patient_IDs'])ifx==pid]
+ feature_values_thispatient=np.asarray([image_features[i][0]foriinindices])
+ ifcombine_method=='mean':
+ feature_values_thispatient=np.nanmean(feature_values_thispatient,axis=0).tolist()
+ else:
+ raiseWORCexceptions.KeyError(f'{combine_method} is not a valid combination method, should be mean or max.')
+ features=(feature_values_thispatient,feature_labels)
+
+ # And add the new one
+ new_features.append(features)
+ else:
+ new_features.append(image_features[pnum])
+
+ already_processed.append(pid)
+
+ # Adjust the labels and features for further processing
+ label_data=dict()
+ label_data['patient_IDs']=np.asarray(new_pids)
+ label_data['label']=np.asarray([new_label_data])
+ label_data['label_name']=label_name
+
+ image_features=new_features
+
+ pid_length=len(label_data['patient_IDs'])
+ print(f'\tNumber of samples / patients after combining: {pid_length}.')
+
returnlabel_data,image_features
-
[docs]defload_features(feat,patientinfo,label_type):
- ''' Read feature files and stack the features per patient in an array.
+
[docs]defload_features(feat,patientinfo,label_type,combine_features=False,
+ combine_method='mean'):
+ """Read feature files and stack the features per patient in an array.
+
Additionally, if a patient label file is supplied, the features from a patient will be matched to the labels.
@@ -330,7 +387,14 @@
Source code for WORC.IOparser.file_io
List containing all the labels that should be extracted from the patientinfo file.
- '''
+ combine_features: boolean, default False
+ Determines whether to combine the features from all samples
+ of the same patient or not.
+
+ combine_methods: string, mean or max
+ If features per patient should be combined, determine how.
+
+ """# Check if features is a simple list, or just one stringif'='notinfeat[0]:feat=['Mod0='+','.join(feat)]
@@ -354,7 +418,9 @@
Source code for WORC.IOparser.file_io
# Read the features and classification datalabel_data,image_features=\
load_data(feat,patientinfo,
- label_type,modnames)
+ label_type,modnames,
+ combine_features,
+ combine_method)returnlabel_data,image_features
@@ -424,7 +490,7 @@
Source code for WORC.IOparser.file_io
[float(i)foriinconfig['PyRadiomics']['resampledPixelSpacing'].split(',')]iflen(outputconfig['setting']['resampledPixelSpacing'])!=3:length=len(outputconfig['setting']['resampledPixelSpacing'])
- raiseWORCexceptions.WORCValueError(f'Length PyRadiomics resampledPixelSpacing should be 3, got {length}.')
+ raiseWORCexceptions.WORCValueError(f'Length PyRadiomics resampledPixelSpacing should be 3, got {length}.')# Extract several general values as well# Convert strings with values to list of ints
@@ -476,19 +542,11 @@
#!/usr/bin/env python
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2021 Biomedical Imaging Group Rotterdam, Departments of# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands## Licensed under the Apache License, Version 2.0 (the "License");
@@ -188,16 +185,16 @@
else:nseg=len(self.segmentations_train)nim=len(image_types)
- m=f'Length of segmentations for training is '+\
- f'{nseg}: should be equal to number of images'+\
- f' ({nim}) or 1 when using registration.'
+ m=f'Length of segmentations for training is '+\
+ f'{nseg}: should be equal to number of images'+\
+ f' ({nim}) or 1 when using registration.'raiseWORCexceptions.WORCValueError(m)# BUG: We assume that first type defines if we use segmentix
@@ -969,7 +974,7 @@
# Link features to ComBatself.links_Combat1_train[label]=list()fori_node,fnameinenumerate(self.featurecalculators[label]):
- self.links_Combat1_train[label].append(self.ComBat.inputs['features_train'][f'{label}_{self.featurecalculators[label][i_node]}']<<self.featureconverter_train[label][i_node].outputs['feat_out'])
+ self.links_Combat1_train[label].append(self.ComBat.inputs['features_train'][f'{label}_{self.featurecalculators[label][i_node]}']<<self.featureconverter_train[label][i_node].outputs['feat_out'])self.links_Combat1_train[label][i_node].collapse='train'ifself.TrainTest:self.links_Combat1_test[label]=list()fori_node,fnameinenumerate(self.featurecalculators[label]):
- self.links_Combat1_test[label].append(self.ComBat.inputs['features_test'][f'{label}_{self.featurecalculators[label][i_node]}']<<self.featureconverter_test[label][i_node].outputs['feat_out'])
+ self.links_Combat1_test[label].append(self.ComBat.inputs['features_test'][f'{label}_{self.featurecalculators[label][i_node]}']<<self.featureconverter_test[label][i_node].outputs['feat_out'])self.links_Combat1_test[label][i_node].collapse='test'# -----------------------------------------------------
@@ -1088,7 +1093,7 @@
Source code for WORC.WORC
# Append features to the classificationifnotself.configs[0]['General']['ComBat']=='True':
- self.links_C1_train[label].append(self.classify.inputs['features_train'][f'{label}_{self.featurecalculators[label][i_node]}']<<self.featureconverter_train[label][i_node].outputs['feat_out'])
+ self.links_C1_train[label].append(self.classify.inputs['features_train'][f'{label}_{self.featurecalculators[label][i_node]}']<<self.featureconverter_train[label][i_node].outputs['feat_out'])self.links_C1_train[label][i_node].collapse='train'# Save output
@@ -1101,7 +1106,7 @@
Source code for WORC.WORC
# Append features to the classificationifnotself.configs[0]['General']['ComBat']=='True':
- self.links_C1_test[label].append(self.classify.inputs['features_test'][f'{label}_{self.featurecalculators[label][i_node]}']<<self.featureconverter_test[label][i_node].outputs['feat_out'])
+ self.links_C1_test[label].append(self.classify.inputs['features_test'][f'{label}_{self.featurecalculators[label][i_node]}']<<self.featureconverter_test[label][i_node].outputs['feat_out'])self.links_C1_test[label][i_node].collapse='test'# Save output
@@ -1332,12 +1337,12 @@
Source code for WORC.WORC
elif'predict'incalcfeat_node.lower():toolbox='PREDICT'else:
- message=f'Toolbox {calcfeat_node} not recognized!'
+ message=f'Toolbox {calcfeat_node} not recognized!'raiseWORCexceptions.WORCKeyError(message)self.source_toolbox_name[label]=\
self.network.create_constant('String',toolbox,
- id=f'toolbox_name_{toolbox}_{label}',
+ id=f'toolbox_name_{toolbox}_{label}',step_id='Feature_Extraction')conv_train.inputs['toolbox']=self.source_toolbox_name[label].output
@@ -1800,6 +1805,10 @@
Source code for WORC.WORC
# Save the configurations as filesself.save_config()
+ # fixed splits
+ ifself.fixedsplits:
+ self.source_data['fixedsplits_source']=self.fixedsplits
+
# Generate gridsearch parameter files if requiredself.source_data['config_classification_source']=self.fastrconfigs[0]
@@ -1813,6 +1822,8 @@
Source code for WORC.WORC
self.sink_data['features_train_ComBat']=("vfs://output/{}/ComBat/features_ComBat_{{sample_id}}_{{cardinality}}{{ext}}").format(self.name)self.sink_data['features_test_ComBat']=("vfs://output/{}/ComBat/features_ComBat_{{sample_id}}_{{cardinality}}{{ext}}").format(self.name)
+
+
# Set the source data from the WORC objects you createdfornum,labelinenumerate(self.modlabels):self.source_data['config_'+label]=self.fastrconfigs[num]
@@ -1907,15 +1918,15 @@
Source code for WORC.WORC
exceptgraphviz.backend.ExecutableNotFound:print('[WORC WARNING] Graphviz executable not found: not drawing network diagram. Make sure the Graphviz executables are on your systems PATH.')exceptgraphviz.backend.CalledProcessErrorase:
- print(f'[WORC WARNING] Graphviz executable gave an error: not drawing network diagram. Original error: {e}')
+ print(f'[WORC WARNING] Graphviz executable gave an error: not drawing network diagram. Original error: {e}')ifDebugDetector().do_detection():print("Source Data:")forkinself.source_data.keys():
- print(f"\t{k}: {self.source_data[k]}.")
+ print(f"\t{k}: {self.source_data[k]}.")print("\n Sink Data:")forkinself.sink_data.keys():
- print(f"\t{k}: {self.sink_data[k]}.")
+ print(f"\t{k}: {self.sink_data[k]}.")# When debugging, set the tempdir to the default of fastr + nameself.fastr_tmpdir=os.path.join(fastr.config.mounts['tmp'],
@@ -1923,7 +1934,7 @@
[docs]defadd_evaluation(self,label_type,modus='classification'):"""Add branch for evaluation of performance to network. Note: should be done after build, before set:
@@ -1933,7 +1944,8 @@
# If PyRadiomics is used, also write a config for PyRadiomicsif'pyradiomics'inc['General']['FeatureCalculators']:
- cfile_pyradiomics=os.path.join(self.fastr_tmpdir,f"config_pyradiomics_{self.name}_{num}.yaml")
+ cfile_pyradiomics=os.path.join(self.fastr_tmpdir,f"config_pyradiomics_{self.name}_{num}.yaml")config_pyradiomics=io.convert_config_pyradiomics(c)withopen(cfile_pyradiomics,'w')asfile:yaml.safe_dump(config_pyradiomics,file)
- cfile_pyradiomics=Path(self.fastr_tmpdir)/f"config_pyradiomics_{self.name}_{num}.yaml"
+ cfile_pyradiomics=Path(self.fastr_tmpdir)/f"config_pyradiomics_{self.name}_{num}.yaml"self.pyradiomics_configs.append(cfile_pyradiomics.as_uri().replace('%20',' '))# BUG: Make path with pathlib to create windows double slashes
- cfile=Path(self.fastr_tmpdir)/f"config_{self.name}_{num}.ini"
+ cfile=Path(self.fastr_tmpdir)/f"config_{self.name}_{num}.ini"self.fastrconfigs.append(cfile.as_uri().replace('%20',' '))
@@ -1982,7 +1994,7 @@
Source code for WORC.WORC
3. Slicer pipeline, to create pngs of middle slice of images. """
-
#!/usr/bin/env python
-# Copyright 2016-2019 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2021 Biomedical Imaging Group Rotterdam, Departments of# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands## Licensed under the Apache License, Version 2.0 (the "License");
@@ -187,10 +184,6 @@
Source code for WORC.addexceptions
This module contains all WORC-related Exceptions"""
-# import inspect
-# import os
-# import textwrap
-
# pylint: disable=too-many-ancestors# Because fo inheriting from FastrError and a common exception causes this# exception, even though this behaviour is desired
@@ -201,43 +194,6 @@
Source code for WORC.addexceptions
This is the base class for all WORC related exceptions. Catching this class of exceptions should ensure a proper execution of WORC. """
- # def __init__(self, *args, **kwargs):
- # """
- # Constructor for all exceptions. Saves the caller object fullid (if
- # found) and the file, function and line number where the object was
- # created.
- # """
- # super(WORCError, self).__init__(*args, **kwargs)
- #
- # frame = inspect.stack()[1][0]
- # call_object = frame.f_locals.get('self', None)
- # if call_object is not None and hasattr(call_object, 'fullid'):
- # self.WORC_object = call_object.fullid
- # else:
- # self.WORC_object = None
- #
- # info = inspect.getframeinfo(frame)
- # self.filename = info.filename
- # self.function = info.function
- # self.linenumber = info.lineno
- #
- # def __str__(self):
- # """
- # String representation of the error
- #
- # :return: error string
- # :rtype: str
- # """
- # if self.WORC_object is not None:
- # return '[{}] {}'.format(self.WORC_object, super(WORCError, self).__str__())
- # else:
- # return super(WORCError, self).__str__()
- #
- # def excerpt(self):
- # """
- # Return a excerpt of the Error as a tuple.
- # """
- # return type(self).__name__, self.message, self.filename, self.linenumberpass
Source code for WORC.classification.AdvancedSampler
# See the License for the specific language governing permissions and
# limitations under the License.
-fromsklearn.utilsimportcheck_random_state
+fromsklearn.utilsimportcheck_random_stateimportnumpyasnpimportsix
-fromghaltonimportHalton
+fromghaltonimportHalton# from sobol_seq import i4_sobol_generate as Sobolimportscipy
-fromscipy.statsimportuniform
+fromscipy.statsimportuniformimportmath
#!/usr/bin/env python
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2021 Biomedical Imaging Group Rotterdam, Departments of# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands## Licensed under the Apache License, Version 2.0 (the "License");
@@ -183,9 +180,9 @@
Source code for WORC.classification.ObjectSampler
# See the License for the specific language governing permissions and
# limitations under the License.
-fromimblearnimportover_sampling,under_sampling,combine
+fromimblearnimportover_sampling,under_sampling,combineimportnumpyasnp
-fromsklearn.utilsimportcheck_random_state
+fromsklearn.utilsimportcheck_random_stateimportWORC.addexceptionsasae
@@ -200,7 +197,7 @@
verbose=True):"""Initialize object."""# Initialize a random state
- self.random_seed=np.random.randint(5000)
- self.random_state=check_random_state(self.random_seed)
+ self.random_seed=random_seed
+ self.random_state=check_random_state(random_seed)# Initialize all objects as Nones: overriden when required by functionsself.object=None
@@ -241,7 +238,7 @@
Source code for WORC.classification.ObjectSampler
elif method=='SMOTETomek':self.init_SMOTETomek(sampling_strategy)else:
- raiseae.WORCKeyError(f'{method} is not a valid sampling method!')
+ raiseae.WORCKeyError(f'{method} is not a valid sampling method!')
[docs]definit_RandomUnderSampling(self,sampling_strategy):"""Creata a random under sampler object."""
@@ -363,19 +360,11 @@
#!/usr/bin/env python
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2021 Biomedical Imaging Group Rotterdam, Departments of# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands## Licensed under the Apache License, Version 2.0 (the "License");
@@ -184,44 +181,44 @@
[docs]def__init__(self,estimators):"""Initialize object with list of estimators."""ifnotestimators:message='You supplied an empty list of estimators: No ensemble creation possible.'
@@ -522,7 +519,7 @@
Source code for WORC.classification.SearchCV
"""Base class for hyper parameter search with cross-validation."""
if type(method)isint:# Simply take the top50 best hyperparametersifverbose:
- print(f'Creating ensemble using top {str(method)} individual classifiers.')
+ print(f'Creating ensemble using top {str(method)} individual classifiers.')ifmethod==1:# Next functions expect listensemble=[0]
@@ -1185,7 +1182,7 @@
Source code for WORC.classification.SearchCV
performances =np.zeros((n_iter,n_classifiers))forit,(train,valid)inenumerate(self.cv_iter):ifverbose:
- print(f' - iteration {it+1} / {n_iter}.')
+ print(f' - iteration {it + 1} / {n_iter}.')Y_valid_score_it=np.zeros((n_classifiers,len(valid)))# Loop over the 100 best estimators
@@ -1293,9 +1290,9 @@
Source code for WORC.classification.SearchCV
best_performance =new_performance# Print the performance gain
- print(f"Ensembling best {scoring}: {best_performance}.")
- print(f"Single estimator best {scoring}: {single_estimator_performance}.")
- print(f'Ensemble consists of {len(ensemble)} estimators {ensemble}.')
+ print(f"Ensembling best {scoring}: {best_performance}.")
+ print(f"Single estimator best {scoring}: {single_estimator_performance}.")
+ print(f'Ensemble consists of {len(ensemble)} estimators {ensemble}.')elifmethod=='Caruana':# Use the method from Caruana
@@ -1313,7 +1310,7 @@
Source code for WORC.classification.SearchCV
performances =np.zeros((n_iter,n_classifiers))forit,(train,valid)inenumerate(self.cv_iter):ifverbose:
- print(f' - iteration {it+1} / {n_iter}.')
+ print(f' - iteration {it + 1} / {n_iter}.')Y_valid_score_it=np.zeros((n_classifiers,len(valid)))# Loop over the 100 best estimators
@@ -1421,9 +1418,9 @@
Source code for WORC.classification.SearchCV
best_performance =new_performance# Print the performance gain
- print(f"Ensembling best {scoring}: {best_performance}.")
- print(f"Single estimator best {scoring}: {single_estimator_performance}.")
- print(f'Ensemble consists of {len(ensemble)} estimators {ensemble}.')
+ print(f"Ensembling best {scoring}: {best_performance}.")
+ print(f"Single estimator best {scoring}: {single_estimator_performance}.")
+ print(f'Ensemble consists of {len(ensemble)} estimators {ensemble}.')# Greedy selection -----------------------------------------------# Initialize variables
@@ -1437,7 +1434,7 @@
Source code for WORC.classification.SearchCV
while new_performance>best_performance:# Score is better, so expand ensemble and replace new best scoreifverbose:
- print(f"Iteration: {iteration}, best {scoring}: {new_performance}.")
+ print(f"Iteration: {iteration}, best {scoring}: {new_performance}.")best_performance=new_performanceifiteration>1:
@@ -1483,11 +1480,11 @@
Source code for WORC.classification.SearchCV
iteration +=1# Print the performance gain
- print(f"Ensembling best {scoring}: {best_performance}.")
- print(f"Single estimator best {scoring}: {single_estimator_performance}.")
- print(f'Ensemble consists of {len(ensemble)} estimators {ensemble}.')
+ print(f"Ensembling best {scoring}: {best_performance}.")
+ print(f"Single estimator best {scoring}: {single_estimator_performance}.")
+ print(f'Ensemble consists of {len(ensemble)} estimators {ensemble}.')else:
- print(f'[WORC WARNING] No valid ensemble method given: {method}. Not ensembling')
+ print(f'[WORC WARNING] No valid ensemble method given: {method}. Not ensembling')returnself# Create the ensemble --------------------------------------------------
@@ -1502,7 +1499,7 @@
nest =len(ensemble)forenum,p_allinenumerate(parameters_all):# Refit a SearchCV object with the provided parameters
- print(f"Refitting estimator {enum+1} / {nest}.")
+ print(f"Refitting estimator {enum+1} / {nest}.")base_estimator=clone(base_estimator)# # Check if we need to create a multiclass estimator
@@ -1554,7 +1551,7 @@
Source code for WORC.classification.SearchCV
n_splits =cv.get_n_splits(X,y,groups)ifself.verbose>0andisinstance(parameter_iterable,Sized):n_candidates=len(parameter_iterable)
- print(f"Fitting {n_splits} folds for each of {n_candidates} candidates, totalling {n_candidates*n_splits} fits.")
+ print(f"Fitting {n_splits} folds for each of {n_candidates} candidates, totalling {n_candidates * n_splits} fits.")cv_iter=list(cv.split(X,y,groups))
@@ -1622,7 +1619,7 @@
Source code for WORC.classification.SearchCV
message ='One or more of the values in your parameter sampler '+\
'is either not iterable, or the distribution cannot '+\
'generate valid samples. Please check your '+\
- f' parameters. At least {k} gives an error.'
+ f' parameters. At least {k} gives an error.'raiseWORCexceptions.WORCValueError(message)# Split the parameters files in equal parts
@@ -1634,7 +1631,7 @@
Source code for WORC.classification.SearchCV
for numberink:temp_dict[number]=parameters_temp[number]
- fname=f'settings_{num}.json'
+ fname=f'settings_{num}.json'sourcename=os.path.join(tempfolder,'parameters',fname)ifnotos.path.exists(os.path.dirname(sourcename)):os.makedirs(os.path.dirname(sourcename))
@@ -1642,7 +1639,7 @@
difference =expected_no_files-len(sink_files)fname=os.path.join(tempfolder,'tmp')message=('Fitting classifiers has failed for '+
- f'{difference} / {expected_no_files} files. The temporary '+
- f'results where not deleted and can be found in {tempfolder}. '+
+ f'{difference} / {expected_no_files} files. The temporary '+
+ f'results where not deleted and can be found in {tempfolder}. '+'Probably your fitting and scoring failed: check out '+'the tmp/fitandscore folder within the tempfolder for '+'the fastr job temporary results or run: fastr trace '+
- f'"{fname}{os.path.sep}__sink_data__.json" --samples.')
+ f'"{fname}{os.path.sep}__sink_data__.json" --samples.')raiseWORCexceptions.WORCValueError(message)# Read in the output data once finished
@@ -1998,7 +1995,7 @@
n_splits =cv.get_n_splits(X,y,groups)ifself.verbose>0andisinstance(parameter_iterable,Sized):n_candidates=len(parameter_iterable)
- print(f"Fitting {n_splits} folds for each of {n_candidates}"+\
+ print(f"Fitting {n_splits} folds for each of {n_candidates}"+\
" candidates, totalling"+\
" {n_candidates * n_splits} fits")
@@ -2351,7 +2348,7 @@
Source code for WORC.classification.construct_classifier
#!/usr/bin/env python
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2021 Biomedical Imaging Group Rotterdam, Departments of# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands## Licensed under the Apache License, Version 2.0 (the "License");
@@ -183,20 +180,21 @@
Source code for WORC.classification.construct_classifier
# See the License for the specific language governing permissions and
# limitations under the License.
-fromsklearn.svmimportSVC
-fromsklearn.svmimportSVRasSVMR
-fromsklearn.ensembleimportRandomForestClassifier,RandomForestRegressor
-fromsklearn.ensembleimportAdaBoostClassifier,AdaBoostRegressor
-fromsklearn.linear_modelimportSGDClassifier,ElasticNet,SGDRegressor
-fromsklearn.linear_modelimportLogisticRegression,Lasso
-fromsklearn.naive_bayesimportGaussianNB,ComplementNB
-fromsklearn.discriminant_analysisimportLinearDiscriminantAnalysisasLDA
-fromsklearn.discriminant_analysisimportQuadraticDiscriminantAnalysisasQDA
+fromsklearn.svmimportSVC
+fromsklearn.svmimportSVRasSVMR
+fromsklearn.ensembleimportRandomForestClassifier,RandomForestRegressor
+fromsklearn.ensembleimportAdaBoostClassifier,AdaBoostRegressor
+fromsklearn.linear_modelimportSGDClassifier,ElasticNet,SGDRegressor
+fromsklearn.linear_modelimportLogisticRegression,LinearRegression,Lasso
+fromsklearn.linear_modelimportRidge
+fromsklearn.naive_bayesimportGaussianNB,ComplementNB
+fromsklearn.discriminant_analysisimportLinearDiscriminantAnalysisasLDA
+fromsklearn.discriminant_analysisimportQuadraticDiscriminantAnalysisasQDAimportscipy
-fromWORC.classification.estimatorsimportRankedSVM
-fromWORC.classification.AdvancedSamplerimportlog_uniform,discrete_uniform
+fromWORC.classification.estimatorsimportRankedSVM
+fromWORC.classification.AdvancedSamplerimportlog_uniform,discrete_uniformimportWORC.addexceptionsasae
-fromxgboostimportXGBClassifier,XGBRegressor
+fromxgboostimportXGBClassifier,XGBRegressor
# Start from zero, thus empty list of previos data
save_data=list()
+ # If we are using fixed splits, set the n_iterations to the number of splits
+ iffixedsplitsisnotNone:
+ n_iterations=int(fixedsplits.columns.shape[0]/2)
+ print(f'Fixedsplits detected, adjusting n_iterations to {n_iterations}')
+
foriinrange(start,n_iterations):print(('Cross-validation iteration {} / {} .').format(str(i+1),str(n_iterations)))logging.debug(('Cross-validation iteration {} / {} .').format(str(i+1),str(n_iterations)))timestamp=strftime("%Y-%m-%d %H:%M:%S",gmtime())
- print(f'\t Time: {timestamp}.')
- logging.debug(f'\t Time: {timestamp}.')
+ print(f'\t Time: {timestamp}.')
+ logging.debug(f'\t Time: {timestamp}.')iffixed_seed:random_seed=i**2else:
@@ -245,6 +247,7 @@
Source code for WORC.classification.crossval
# label is maintained
ifany(clfinregressorsforclfinparam_grid['classifiers']):# We cannot do a stratified shuffle split with regression
+ classes_temp=classesstratify=Noneelse:ifmodus=='singlelabel':
@@ -329,8 +332,8 @@
Source code for WORC.classification.crossval
else:# Use pre defined splits
- train=fixedsplits[str(i)+'_train'].values
- test=fixedsplits[str(i)+'_test'].values
+ train=fixedsplits[str(i)+'_train'].dropna().values
+ test=fixedsplits[str(i)+'_test'].dropna().values# Convert the numbers to the correct indicesind_train=list()
@@ -390,7 +393,7 @@
Source code for WORC.classification.crossval
# Test performance for various RS and ensemble sizes
ifdo_test_RS_Ensemble:
- output_json=os.path.join(tempfolder,f'performance_RS_Ens_crossval_{i}.json')
+ output_json=os.path.join(tempfolder,f'performance_RS_Ens_crossval_{i}.json')test_RS_Ensemble(estimator_input=trained_classifier,X_train=X_train,Y_train=Y_train,X_test=X_test,Y_test=Y_test,
@@ -427,8 +430,8 @@
Source code for WORC.classification.crossval
# Print elapsed time
elapsed=int((time.time()-t)/60.0)
- print(f'\t Fitting took {elapsed} minutes.')
- logging.debug(f'\t Fitting took {elapsed} minutes.')
+ print(f'\t Fitting took {elapsed} minutes.')
+ logging.debug(f'\t Fitting took {elapsed} minutes.')returnsave_data
# Print elapsed time
elapsed=int((time.time()-t)/60.0)
- print(f'\t Fitting took {elapsed} minutes.')
- logging.debug(f'\t Fitting took {elapsed} minutes.')
+ print(f'\t Fitting took {elapsed} minutes.')
+ logging.debug(f'\t Fitting took {elapsed} minutes.')returnsave_data
@@ -685,6 +688,11 @@
Source code for WORC.classification.crossval
if fixedsplitsisnotNoneand'.csv'infixedsplits:fixedsplits=pd.read_csv(fixedsplits,header=0)
+ # Fixedsplits need to be performed in random split fashion, makes no sense for LOO
+ ifcrossval_type=='LOO':
+ print('[WORC WARNING] Fixedsplits need to be performed in random split fashion, makes no sense for LOO.')
+ crossval_type='random_split'
+
ifmodus=='singlelabel':print('Performing single-class classification.')logging.debug('Performing single-class classification.')
@@ -744,7 +752,7 @@
Source code for WORC.classification.crossval
use_fastr=use_fastr,fastr_plugin=fastr_plugin)else:
- raiseae.WORCKeyError(f'{crossval_type} is not a recognized cross-validation type.')
+ raiseae.WORCKeyError(f'{crossval_type} is not a recognized cross-validation type.')[classifiers,X_train_set,X_test_set,Y_train_set,Y_test_set,patient_ID_train_set,patient_ID_test_set,seed_set]=\
@@ -922,7 +930,7 @@
Source code for WORC.classification.crossval
# FIXME: Use home folder, as this function does not know
# Where final or temporary output is locatedoutput_json=os.path.join(os.path.expanduser("~"),
- f'performance_RS_Ens.json')
+ f'performance_RS_Ens.json')test_RS_Ensemble(estimator_input=trained_classifier,X_train=X_train,Y_train=Y_train,
@@ -965,14 +973,14 @@
Source code for WORC.classification.crossval
if RS<=n_workflows:# Make a key for saving the scorenum=0
- key=f'RS {RS} try {str(num).zfill(2)}'
+ key=f'RS {RS} try {str(num).zfill(2)}'whilekeyinkeys:num+=1
- key=f'RS {RS} try {str(num).zfill(2)}'
+ key=f'RS {RS} try {str(num).zfill(2)}'keys.append(key)# Make a local copy of the estimator and select only subset of workflows
- print(f'\t Using RS {RS}.')
+ print(f'\t Using RS {RS}.')estimator=copy(estimator_original)workflow_num=np.arange(n_workflows).tolist()
@@ -999,12 +1007,12 @@
Source code for WORC.classification.crossval
F1_training =[F1_training[i]foriinworkflow_ranking]mean_train_F1=F1_training[0:maxlen]
- performances[f'Mean training F1-score {key} top {maxlen}']=mean_train_F1
- performances[f'Mean validation F1-score {key} top {maxlen}']=mean_val_F1
+ performances[f'Mean training F1-score {key} top {maxlen}']=mean_train_F1
+ performances[f'Mean validation F1-score {key} top {maxlen}']=mean_val_F1forensembleinensembles:ifensemble<=RS:
- print(f'\t Using ensemble {ensemble}.')
+ print(f'\t Using ensemble {ensemble}.')# Create the ensembleestimator.create_ensemble(X_train_temp,Y_train,method=ensemble)
@@ -1013,15 +1021,15 @@
#!/usr/bin/env python
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2021 Biomedical Imaging Group Rotterdam, Departments of# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands## Licensed under the Apache License, Version 2.0 (the "License");
@@ -183,32 +180,32 @@
Source code for WORC.classification.fitandscore
<
# See the License for the specific language governing permissions and# limitations under the License.
-fromsklearn.model_selection._validationimport_fit_and_score
+fromsklearn.model_selection._validationimport_fit_and_scoreimportnumpyasnp
-fromsklearn.linear_modelimportLasso,LogisticRegression
-fromsklearn.feature_selectionimportSelectFromModel
-fromsklearn.decompositionimportPCA
-fromsklearn.multiclassimportOneVsRestClassifier
-fromsklearn.ensembleimportRandomForestClassifier
-fromWORC.classification.ObjectSamplerimportObjectSampler
-fromsklearn.utils.metaestimatorsimport_safe_split
-fromsklearn.utils.validationimport_num_samples
-fromWORC.classification.estimatorsimportRankedSVM
-fromWORC.classificationimportconstruct_classifierascc
-fromWORC.classification.metricsimportcheck_multimetric_scoring
-fromWORC.featureprocessing.ReliefimportSelectMulticlassRelief
-fromWORC.featureprocessing.ImputerimportImputer
-fromWORC.featureprocessing.ScalersimportWORCScaler
-fromWORC.featureprocessing.VarianceThresholdimportselfeat_variance
-fromWORC.featureprocessing.StatisticalTestThresholdimportStatisticalTestThreshold
-fromWORC.featureprocessing.SelectGroupsimportSelectGroups
-fromWORC.featureprocessing.OneHotEncoderWrapperimportOneHotEncoderWrapper
+fromsklearn.linear_modelimportLasso,LogisticRegression
+fromsklearn.feature_selectionimportSelectFromModel
+fromsklearn.decompositionimportPCA
+fromsklearn.multiclassimportOneVsRestClassifier
+fromsklearn.ensembleimportRandomForestClassifier
+fromWORC.classification.ObjectSamplerimportObjectSampler
+fromsklearn.utils.metaestimatorsimport_safe_split
+fromsklearn.utils.validationimport_num_samples
+fromWORC.classification.estimatorsimportRankedSVM
+fromWORC.classificationimportconstruct_classifierascc
+fromWORC.classification.metricsimportcheck_multimetric_scoring
+fromWORC.featureprocessing.ReliefimportSelectMulticlassRelief
+fromWORC.featureprocessing.ImputerimportImputer
+fromWORC.featureprocessing.ScalersimportWORCScaler
+fromWORC.featureprocessing.VarianceThresholdimportselfeat_variance
+fromWORC.featureprocessing.StatisticalTestThresholdimportStatisticalTestThreshold
+fromWORC.featureprocessing.SelectGroupsimportSelectGroups
+fromWORC.featureprocessing.OneHotEncoderWrapperimportOneHotEncoderWrapperimportWORCimportWORC.addexceptionsasae# Specific imports for error management
-fromsklearn.discriminant_analysisimportLinearDiscriminantAnalysisasLDA
-fromnumpy.linalgimportLinAlgError
+fromsklearn.discriminant_analysisimportLinearDiscriminantAnalysisasLDA
+fromnumpy.linalgimportLinAlgError# Suppress sklearn warningsimportwarnings
@@ -434,7 +431,7 @@
Source code for WORC.classification.fitandscore
<
if'OneHotEncoding'inpara_estimator.keys():ifpara_estimator['OneHotEncoding']=='True':ifverbose:
- print(f'Applying OneHotEncoding, will ignore unknowns.')
+ print(f'Applying OneHotEncoding, will ignore unknowns.')feature_labels_tofit=\
para_estimator['OneHotEncoding_feature_labels_tofit']encoder=\
@@ -462,7 +459,7 @@
Source code for WORC.classification.fitandscore
<
ifpara_estimator['Imputation']=='True':imp_type=para_estimator['ImputationMethod']ifverbose:
- print(f'Imputing NaN with {imp_type}.')
+ print(f'Imputing NaN with {imp_type}.')imp_nn=para_estimator['ImputationNeighbours']imputer=Imputer(missing_values=np.nan,strategy=imp_type,
@@ -476,7 +473,7 @@
Source code for WORC.classification.fitandscore
<
iforiginal_shape!=imputed_shape:removed_features=original_shape[1]-imputed_shape[1]
- raiseae.WORCValueError(f'Several features ({removed_features}) were np.NaN for all objects. Hence, imputation was not possible. Either make sure this is correct and turn of imputation, or correct the feature.')
+ raiseae.WORCValueError(f'Several features ({removed_features}) were np.NaN for all objects. Hence, imputation was not possible. Either make sure this is correct and turn of imputation, or correct the feature.')delpara_estimator['Imputation']delpara_estimator['ImputationMethod']
@@ -577,8 +574,8 @@
<
if'SelectFromModel'inpara_estimator.keys()andpara_estimator['SelectFromModel']=='True':model=para_estimator['SelectFromModel_estimator']ifverbose:
- print(f"Selecting features using model {model}.")
+ print(f"Selecting features using model {model}.")ifmodel=='Lasso':# Use lasso model for feature selectionalpha=para_estimator['SelectFromModel_lasso_alpha']
- selectestimator=Lasso(alpha=alpha)
+ selectestimator=Lasso(alpha=alpha,random_state=random_seed)elifmodel=='LR':# Use logistic regression model for feature selection
- selectestimator=LogisticRegression()
+ selectestimator=LogisticRegression(random_state=random_seed)elifmodel=='RF':# Use random forest model for feature selectionn_estimators=para_estimator['SelectFromModel_n_trees']
- selectestimator=RandomForestClassifier(n_estimators=n_estimators)
+ selectestimator=RandomForestClassifier(n_estimators=n_estimators,
+ random_state=random_seed)else:
- raiseae.WORCKeyError(f'Model {model} is not known for SelectFromModel. Use Lasso, LR, or RF.')
+ raiseae.WORCKeyError(f'Model {model} is not known for SelectFromModel. Use Lasso, LR, or RF.')# Prefit modelselectestimator.fit(X_train,y_train)
@@ -780,8 +778,9 @@
Source code for WORC.classification.fitandscore
<
pca.fit(X_train)except(ValueError,LinAlgError)ase:ifverbose:
- print(f'[WARNING]: skipping this setting due to PCA Error: {e}.')
+ print(f'[WARNING]: skipping this setting due to PCA Error: {e}.')
+ pca=Noneifreturn_all:returnret,GroupSel,VarSel,SelectModel,feature_labels[0],scaler,encoder,imputer,pca,StatisticalSel,ReliefSel,Samplerelse:
@@ -800,8 +799,9 @@
Source code for WORC.classification.fitandscore
<
pca.fit(X_train)except(ValueError,LinAlgError)ase:ifverbose:
- print(f'[WARNING]: skipping this setting due to PCA Error: {e}.')
+ print(f'[WARNING]: skipping this setting due to PCA Error: {e}.')
+ pca=Noneifreturn_all:returnret,GroupSel,VarSel,SelectModel,feature_labels[0],scaler,encoder,imputer,pca,StatisticalSel,ReliefSel,Samplerelse:
@@ -817,10 +817,21 @@
<
metric=para_estimator['StatisticalTestMetric']threshold=para_estimator['StatisticalTestThreshold']ifverbose:
- print(f"Selecting features based on statistical test. Method {metric}, threshold {round(threshold,5)}.")
+ print(f"Selecting features based on statistical test. Method {metric}, threshold {round(threshold, 5)}.")print("\t Original Length: "+str(len(X_train[0])))StatisticalSel=StatisticalTestThreshold(metric=metric,
@@ -890,7 +901,8 @@
<
if'ADASYN is not suited for this specific dataset. Use SMOTE instead.'instr(e):# Seldomly occurs, therefore return performance dummyifverbose:
- print(f'[WARNING]: {e}. Returning dummies. Parameters: ')
+ print(f'[WARNING]: {e}. Returning dummies. Parameters: ')print(parameters)para_estimator=delete_nonestimator_parameters(para_estimator)
@@ -922,7 +934,7 @@
Source code for WORC.classification.fitandscore
<
neg=int(len(y_train_temp)-pos)ifpos<10orneg<10:ifverbose:
- print(f'[WORC WARNING] Skipping resampling: to few objects returned in one or both classes (pos: {pos}, neg: {neg}).')
+ print(f'[WORC WARNING] Skipping resampling: to few objects returned in one or both classes (pos: {pos}, neg: {neg}).')Sampler=Noneparameters['Resampling_Use']='False'else:
@@ -933,8 +945,8 @@
Source code for WORC.classification.fitandscore
<
pos=int(np.sum(y_train))neg=int(len(y_train)-pos)ifverbose:
- message=f"Resampling from {len_in} ({pos_initial} pos,"+\
- f" {neg_initial} neg) to {len(y_train)} ({pos} pos, {neg} neg) patients."
+ message=f"Resampling from {len_in} ({pos_initial} pos,"+\
+ f" {neg_initial} neg) to {len(y_train)} ({pos} pos, {neg} neg) patients."print(message)# Also reset train and test indices
@@ -979,7 +991,7 @@
Source code for WORC.classification.fitandscore
<
estimator=OneVsRestClassifier(estimator)ifverbose:
- print(f"Fitting ML method: {parameters['classifiers']}.")
+ print(f"Fitting ML method: {parameters['classifiers']}.")# Recombine feature values and label for train and test setfeature_values=np.concatenate((X_train,X_test),axis=0)
@@ -1000,7 +1012,7 @@
Source code for WORC.classification.fitandscore
<
except(ValueError,LinAlgError)ase:iftype(estimator)==LDA:ifverbose:
- print(f'[WARNING]: skipping this setting due to LDA Error: {e}.')
+ print(f'[WARNING]: skipping this setting due to LDA Error: {e}.')ifreturn_all:returnret,GroupSel,VarSel,SelectModel,feature_labels[0],scaler,encoder,imputer,pca,StatisticalSel,ReliefSel,Sampler
@@ -1099,9 +1111,9 @@
Source code for WORC.classification.fitandscore
<
ifnp.isnan(value):ifverbose:iffeature_labelsisnotNone:
- print(f"[WORC WARNING] NaN found, patient {pnum}, label {feature_labels[fnum]}. Replacing with zero.")
+ print(f"[WORC WARNING] NaN found, patient {pnum}, label {feature_labels[fnum]}. Replacing with zero.")else:
- print(f"[WORC WARNING] NaN found, patient {pnum}, label {fnum}. Replacing with zero.")
+ print(f"[WORC WARNING] NaN found, patient {pnum}, label {fnum}. Replacing with zero.")# Note: X is a list of lists, hence we cannot index the element directlyimage_features_temp[pnum,fnum]=0
@@ -1158,19 +1170,11 @@
# See the License for the specific language governing permissions and# limitations under the License.
-from__future__importdivision
-fromsklearn.metricsimportaccuracy_score,balanced_accuracy_score
-fromsklearn.metricsimportroc_auc_score
-fromsklearn.metricsimportconfusion_matrix
-fromsklearn.metricsimportf1_score
+from__future__importdivision
+fromsklearn.metricsimportaccuracy_score,balanced_accuracy_score
+fromsklearn.metricsimportroc_auc_score
+fromsklearn.metricsimportconfusion_matrix
+fromsklearn.metricsimportf1_scoreimportnumpyasnp
-fromsklearnimportmetrics
-fromscipy.statsimportpearsonr,spearmanr
-fromsklearn.metricsimportmake_scorer,average_precision_score
-fromsklearn.metricsimportcheck_scoringascheck_scoring_sklearn
-fromscipy.linalgimportpinv
-fromimblearn.metricsimportgeometric_mean_score
+fromsklearnimportmetrics
+fromscipy.statsimportpearsonr,spearmanr
+fromsklearn.metricsimportmake_scorer,average_precision_score
+fromsklearn.metricsimportcheck_scoringascheck_scoring_sklearn
+fromscipy.linalgimportpinv
+fromimblearn.metricsimportgeometric_mean_score
Source code for WORC.classification.parameter_optimization
random_search.fit(features,labels)print("Best found parameters:")foriinrandom_search.best_params_:
- print(f'{i}: {random_search.best_params_[i]}.')
- print(f"\n Best score using best parameters: {scoring_method} = {random_search.best_score_}")
+ print(f'{i}: {random_search.best_params_[i]}.')
+ print(f"\n Best score using best parameters: {scoring_method} = {random_search.best_score_}")returnrandom_search
@@ -278,19 +275,11 @@
Source code for WORC.classification.parameter_optimization
Source code for WORC.classification.trainclassifier
#!/usr/bin/env python
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2021 Biomedical Imaging Group Rotterdam, Departments of# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands## Licensed under the Apache License, Version 2.0 (the "License");
@@ -184,12 +181,12 @@
Source code for WORC.classification.trainclassifier
<
# Specific for STW Strategy BMIA XNAT projectsifexperiment.session_typeisNone:# some files in project don't have _CT postfix
- print(f"\tSkipping patient {subject.label}, experiment {experiment.label}: type is not CT but {experiment.session_type}.")
+ print(f"\tSkipping patient {subject.label}, experiment {experiment.label}: type is not CT but {experiment.session_type}.")continueif'_CT'notinexperiment.session_type:
- print(f"\tSkipping patient {subject.label}, experiment {experiment.label}: type is not CT but {experiment.session_type}.")
+ print(f"\tSkipping patient {subject.label}, experiment {experiment.label}: type is not CT but {experiment.session_type}.")continueforsinexperiment.scans:
@@ -225,7 +222,7 @@
Source code for WORC.exampledata.datadownloader
<
os.makedirs(outdir)resmap[resource_label]=scan
- print(f'resource is {resource_label}')
+ print(f'resource is {resource_label}')scan.resources[res].download_dir(outdir)resource_labels.append(resource_label)download_counter+=1
@@ -233,19 +230,19 @@
Source code for WORC.exampledata.datadownloader
<
# Parse resources and throw warnings if they not meet the requirementssubject_name=subject.labelifdownload_counter==0:
- print(f'[WARNING] Skipping subject {subject_name}: no (suitable) resources found.')
+ print(f'[WARNING] Skipping subject {subject_name}: no (suitable) resources found.')returnFalseif'NIFTI'notinresource_labels:
- print(f'[WARNING] Skipping subject {subject_name}: no NIFTI resources found.')
+ print(f'[WARNING] Skipping subject {subject_name}: no NIFTI resources found.')returnFalseifresource_labels.count('NIFTI')<2:
- print(f'[WARNING] Skipping subject {subject_name}: only one NIFTI resource found, need two (mask and image).')
+ print(f'[WARNING] Skipping subject {subject_name}: only one NIFTI resource found, need two (mask and image).')returnFalseelifresource_labels.count('NIFTI')>2:count=resource_labels.count('NIFTI')
- print(f'[WARNING] Skipping subject {subject_name}: {str(count)} NIFTI resources found, need two (mask and image).')
+ print(f'[WARNING] Skipping subject {subject_name}: {str(count)} NIFTI resources found, need two (mask and image).')returnFalse# Check what the mask and image folders are
@@ -259,13 +256,13 @@
<
verbose=True):# Connect to XNAT and retreive project
- session=xnat.connect(xnat_url)
- project=session.projects[project_name]
+ withxnat.connect(xnat_url)assession:
+ project=session.projects[project_name]
- # Create the data folder if it does not exist yet
- datafolder=os.path.join(datafolder,project_name)
- ifnotos.path.exists(datafolder):
- os.makedirs(datafolder)
+ # Create the data folder if it does not exist yet
+ datafolder=os.path.join(datafolder,project_name)
+ ifnotos.path.exists(datafolder):
+ os.makedirs(datafolder)
- subjects_len=len(project.subjects)
- ifnsubjects=='all':
- nsubjects=subjects_len
- else:
- nsubjects=min(nsubjects,subjects_len)
+ subjects_len=len(project.subjects)
+ ifnsubjects=='all':
+ nsubjects=subjects_len
+ else:
+ nsubjects=min(nsubjects,subjects_len)
- subjects_counter=1
- downloaded_subjects_counter=0
- forsinrange(0,subjects_len):
- s=project.subjects[s]
- print(f'Working on subject {subjects_counter}/{subjects_len}')
- subjects_counter+=1
+ subjects_counter=1
+ downloaded_subjects_counter=0
+ forsinrange(0,subjects_len):
+ s=project.subjects[s]
+ print(f'Working on subject {subjects_counter}/{subjects_len}')
+ subjects_counter+=1
- success=download_subject(project_name,s,datafolder,session,verbose)
- ifsuccess:
- downloaded_subjects_counter+=1
+ success=download_subject(project_name,s,datafolder,session,verbose)
+ ifsuccess:
+ downloaded_subjects_counter+=1
- # Stop downloading if we have reached the required number of subjects
- ifdownloaded_subjects_counter==nsubjects:
- break
+ # Stop downloading if we have reached the required number of subjects
+ ifdownloaded_subjects_counter==nsubjects:
+ break
- # Disconnect the session
- session.disconnect()
- ifdownloaded_subjects_counter<nsubjects:
- raiseValueError(f'Number of subjects downloaded {downloaded_subjects_counter} is smaller than the number required {nsubjects}.')
+ # Disconnect the session
+ session.disconnect()
+ ifdownloaded_subjects_counter<nsubjects:
+ raiseValueError(f'Number of subjects downloaded {downloaded_subjects_counter} is smaller than the number required {nsubjects}.')
- print('Done downloading!')
# See the License for the specific language governing permissions and
# limitations under the License.
-fromsklearn.imputeimportSimpleImputer,KNNImputer
+fromsklearn.imputeimportSimpleImputer,KNNImputer
[docs]classImputer(object):"""Module for feature imputation."""
-
[docs]def__init__(self,missing_values='nan',strategy='mean',n_neighbors=5):''' Imputation of feature values using either sklearn, missingpy or
@@ -261,19 +258,11 @@
# See the License for the specific language governing permissions and
# limitations under the License.
-fromsklearn.baseimportBaseEstimator
-fromsklearn.feature_selection.baseimportSelectorMixin
+fromsklearn.baseimportBaseEstimator
+fromsklearn.feature_selection.baseimportSelectorMixinimportnumpyasnpimportsklearn.neighborsasnn# from skrebate import ReliefF
@@ -195,7 +192,7 @@
Source code for WORC.featureprocessing.Relief
Object to fit feature selection based on the type group the feature belongs
to. The label for the feature is used for this procedure. '''
-
Source code for WORC.featureprocessing.SelectGroups
# See the License for the specific language governing permissions and
# limitations under the License.
-fromsklearn.baseimportBaseEstimator
-fromsklearn.feature_selection.baseimportSelectorMixin
+fromsklearn.baseimportBaseEstimator
+fromsklearn.feature_selection.baseimportSelectorMixinimportnumpyasnp
@@ -193,7 +190,7 @@
Source code for WORC.featureprocessing.SelectGroups
Object to fit feature selection based on the type group the feature belongs
to. The label for the feature is used for this procedure. '''
-