Merge pull request #58 from MStarmans91/development

Release version 3.4.1
MStarmans91 · May 18, 2021 · f9349c0 · f9349c0
2 parents 47d354c + fd656ba
commit f9349c0
Show file tree

Hide file tree

Showing 159 changed files with 5,515 additions and 5,433 deletions.
diff --git a/.gitignore b/.gitignore
@@ -132,3 +132,4 @@ WORC/external/*
 WORC/exampledata/ICCvalues.csv
 WORC/tests/*.png
 WORC/tests/*.mat
+WORC/tests/WORC_Example_STWStrategyHN_Regression
diff --git a/.travis.yml b/.travis.yml
@@ -52,7 +52,13 @@ matrix:
         - fastr trace /tmp/WORC_Example_STWStrategyHN/__sink_data__.json --sinks classification --samples all
         - fastr trace /tmp/WORC_Example_STWStrategyHN/__sink_data__.json --sinks performance --samples all
         - fastr trace /tmp/GS/DEBUG_0/tmp/__sink_data__.json --sinks output --samples id_0__0000__0000
-
+        # Change the tutorial script to also run a regression experiment,
+        # using the previously calculated features
+        - rm -r /tmp/GS/DEBUG_0
+        - python WORC/tests/WORCTutorialSimple_travis_regression.py
+        - fastr trace /tmp/WORC_Example_STWStrategyHN_Regression/__sink_data__.json --sinks classification --samples all
+        - fastr trace /tmp/WORC_Example_STWStrategyHN_Regression/__sink_data__.json --sinks performance --samples all
+        - fastr trace /tmp/GS/DEBUG_0/tmp/__sink_data__.json --sinks output --samples id_0__0000__0000
 notifications:
   slack:
     secure: ytP+qd6Rx1m1uXYMaN7dFHnFNu+bCIcyugSnAY7BtbumJwCuEt8hbWvQ/sDoAKqxj5VYcnBlTRDn1gjg2t2shs7pBGgjdeZQpQglXyAtN4bz3suSUbQ9/RIwt+RPmbiTXkWQtoZ4q0DotydozKMnq8Cvhdy+d5pMqToER6kMq/WCC+Y/99mmnqO2VrWpvAvP6bBOWDvrk/C4u3y5m3Rp5iE7uAYR3TDTprIW9UNEntDoEYT2T+bidkDRl7DMsi8R4q4s/A6EhZpB4Tnhwz7ama155z77ywdZLhdmk5HJvngXcunVwH4v/l8DbBZU0PqMEJzaRMn/tQCCqjx1/unpyFCv+QuhmP5K4wo17R77jHlcn7SBkdzYr/CKHrilWuShmvOMCckBShpQw3H9PivcI6/G5mVA23tH+gJSQUbzZmBR683x7oQHmnK3g977yD/ufEvV6qME9HFXt3+jIzVEwsUjtJsTV/NsbHlErJfhBp8HJTpq6IRhtKcX9QS1i/APXcYcCSCFJe8tOTLN6xmAKBgONG3XOAvJwfwXbF+rmfjX0x6KMUuD5WmHLjMLhQp0dS00LV7C9s18UkFBgKydqvF2AMPUsbgIGyZ/Vz3v5nz7JiNLDfp0HxQpqAABpdwDHR3/CfuhCDcqzIXAgRgXaFrqCxqoH6OrsgRH6UxUXnM=
diff --git a/CHANGELOG b/CHANGELOG
@@ -6,6 +6,37 @@ All notable changes to this project will be documented in this file.
 The format is based on `Keep a Changelog <http://keepachangelog.com/>`_
 and this project adheres to `Semantic Versioning <http://semver.org/>`_
 
+
+3.4.1 - 2021-05-18
+------------------
+
+Fixed
+~~~~~
+- Bugfix when PCA cannot be fitted.
+- Bugfix when using LOO cross-validation in performance evaluation.
+- Fix XGboost verson, as newest version automatically uses multihreading,
+  which is unsuitable for clusters.
+- Bug in decomposition for Evaluation.
+- RankedPosteriors naming of images was rounded to an integer, now unrounded
+- Several fixes for regression.
+- Regression in unit test.
+- Several fixes for using 2D images.
+
+Changed
+~~~~~~~
+- Reverted back to weighted f1-score without predictproba for optimization,
+  more stable.
+- Updated regressors in SimpleWORC.
+
+Added
+~~~~~~~
+- Option to combine features from a varying number of objects per patient,
+  e.g. by averaging or taking the maximum.
+- Logarithmic z-score scaler to be more robust to non-normal distributions
+  and outliers.
+- Linear and Ridge regression.
+- Precision-recall curves.
+
 3.4.0 - 2021-02-02
 ------------------
 

diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-# WORC v3.4.0
+# WORC v3.4.1
 ## Workflow for Optimal Radiomics Classification
 
 ## Information
@@ -33,6 +33,15 @@ and support of different software languages (python, MATLAB, ruby, java etc.), w
 collaboration, standardisation and comparison of different radiomics approaches. By combining this in a single framework,
 we hope to find a universal radiomics strategy that can address various problems.
 
+## License
+This package is covered by the open source [APACHE 2.0 License](APACHE-LICENSE-2.0).
+
+When using WORC, please cite this repository as following:
+
+``Martijn P.A. Starmans, Sebastian R. van der Voort, Thomas Phil and Stefan Klein. Workflow for Optimal Radiomics Classification (WORC). Zenodo (2018). Available from:  https://github.com/MStarmans91/WORC. DOI: http://doi.org/10.5281/zenodo.3840534.``
+
+For the DOI, visit [![][DOI]][DOI-lnk].
+
 ## Disclaimer
 This package is still under development. We try to thoroughly test and evaluate every new build and function, but
 bugs can off course still occur. Please contact us through the channels below if you find any and we will try to fix
@@ -86,15 +95,6 @@ Besides a Jupyter notebook with instructions, we provide there also an example s
 - We are writing the paper on WORC.
 - We are expanding the example experiments of WORC with open source datasets.
 
-## License
-This package is covered by the open source [APACHE 2.0 License](APACHE-LICENSE-2.0).
-
-When using WORC, please cite this repository as following:
-
-``Martijn P.A. Starmans, Sebastian R. van der Voort, Thomas Phil and Stefan Klein. Workflow for Optimal Radiomics Classification (WORC). Zenodo (2018). Available from:  https://github.com/MStarmans91/WORC. DOI: http://doi.org/10.5281/zenodo.3840534.``
-
-For the DOI, visit [![][DOI]][DOI-lnk].
-
 ## Contact
 We are happy to help you with any questions. Please sent us a mail or place an issue on the Github.
 

diff --git a/README.rst b/README.rst
@@ -1,4 +1,4 @@
-WORC v3.4.0
+WORC v3.4.1
 ===========
 
 Workflow for Optimal Radiomics Classification
@@ -28,6 +28,18 @@ comparison of different radiomics approaches. By combining this in a
 single framework, we hope to find a universal radiomics strategy that
 can address various problems.
 
+License
+-------
+
+This package is covered by the open source `APACHE 2.0
+License <APACHE-LICENSE-2.0>`__.
+
+When using WORC, please cite this repository as following:
+
+``Martijn P.A. Starmans, Sebastian R. van der Voort, Thomas Phil and Stefan Klein. Workflow for Optimal Radiomics Classification (WORC). Zenodo (2018). Available from:  https://github.com/MStarmans91/WORC. DOI: http://doi.org/10.5281/zenodo.3840534.``
+
+For the DOI, visit |image5|.
+
 Disclaimer
 ----------
 
@@ -111,18 +123,6 @@ WIP
 -  We are expanding the example experiments of WORC with open source
    datasets.
 
-License
--------
-
-This package is covered by the open source `APACHE 2.0
-License <APACHE-LICENSE-2.0>`__.
-
-When using WORC, please cite this repository as following:
-
-``Martijn P.A. Starmans, Sebastian R. van der Voort, Thomas Phil and Stefan Klein. Workflow for Optimal Radiomics Classification (WORC). Zenodo (2018). Available from:  https://github.com/MStarmans91/WORC. DOI: http://doi.org/10.5281/zenodo.3840534.``
-
-For the DOI, visit |image5|.
-
 Contact
 -------
 

diff --git a/WORC/IOparser/config_io_classifier.py b/WORC/IOparser/config_io_classifier.py
@@ -1,6 +1,6 @@
 #!/usr/bin/env python
 
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2021 Biomedical Imaging Group Rotterdam, Departments of
 # Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -131,9 +131,16 @@ def load_config(config_file_path):
         [int(str(item).strip()) for item in
          settings['Featsel']['ReliefNumFeatures'].split(',')]
 
+    # Feature preprocessing before the whole HyperOptimization
     settings_dict['FeatPreProcess']['Use'] =\
         [str(settings['FeatPreProcess']['Use'])]
 
+    settings_dict['FeatPreProcess']['Combine'] =\
+        settings['FeatPreProcess'].getboolean('Combine')
+
+    settings_dict['FeatPreProcess']['Combine_method'] =\
+        str(settings['FeatPreProcess']['Combine_method'])
+
     # Imputation
     settings_dict['Imputation']['use'] =\
         [str(item).strip() for item in

diff --git a/WORC/IOparser/file_io.py b/WORC/IOparser/file_io.py
@@ -1,6 +1,6 @@
 #!/usr/bin/env python
 
-# Copyright 2016-2020 Biomedical Imaging Group Rotterdam, Departments of
+# Copyright 2016-2021 Biomedical Imaging Group Rotterdam, Departments of
 # Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -23,8 +23,10 @@
 import os
 
 
-def load_data(featurefiles, patientinfo=None, label_names=None, modnames=[]):
-    ''' Read feature files and stack the features per patient in an array.
+def load_data(featurefiles, patientinfo=None, label_names=None, modnames=[],
+              combine_features=False, combine_method='mean'):
+    """Read feature files and stack the features per patient in an array.
+
         Additionally, if a patient label file is supplied, the features from
         a patient will be matched to the labels.
 
@@ -44,8 +46,13 @@ def load_data(featurefiles, patientinfo=None, label_names=None, modnames=[]):
                 List containing all the labels that should be extracted from
                 the patientinfo file.
 
-    '''
+        combine_features: boolean, default False
+                Determines whether to combine the features from all samples
+                of the same patient or not.
 
+        combine_methods: string, mean or max
+                If features per patient should be combined, determine how.
+    """
     # Read out all feature values and labels
     image_features_temp = list()
     feature_labels_all = list()
@@ -138,11 +145,64 @@ def load_data(featurefiles, patientinfo=None, label_names=None, modnames=[]):
         label_data = dict()
         label_data['patient_IDs'] = patient_IDs
 
+    # Optionally, combine features of same patient
+    if combine_features:
+        print('Combining features of the same patient.')
+        feature_labels = image_features[0][1]
+        label_name = label_data['label_name']
+        new_label_data = list()
+        new_pids = list()
+        new_features = list()
+        pid_length = len(label_data['patient_IDs'])
+        print(f'\tOriginal number of samples / patients: {pid_length}.')
+
+        already_processed = list()
+        for pnum, pid in enumerate(label_data['patient_IDs']):
+            if pid not in already_processed:
+                # NOTE: should check whether we have already processed this patient
+                occurrences = list(label_data['patient_IDs']).count(pid)
+
+                # NOTE: Assume all object from one patient have the same label
+                label = label_data['label'][0][pnum]
+                new_label_data.append(label)
+                new_pids.append(pid)
+
+                # Only process patients which occur multiple times
+                if occurrences > 1:
+                    print(f'\tFound {occurrences} occurrences for {pid}.')
+                    indices = [i for i, x in enumerate(label_data['patient_IDs']) if x == pid]
+                    feature_values_thispatient = np.asarray([image_features[i][0] for i in indices])
+                    if combine_method == 'mean':
+                        feature_values_thispatient = np.nanmean(feature_values_thispatient, axis=0).tolist()
+                    else:
+                        raise WORCexceptions.KeyError(f'{combine_method} is not a valid combination method, should be mean or max.')
+                    features = (feature_values_thispatient, feature_labels)
+
+                    # And add the new one
+                    new_features.append(features)
+                else:
+                    new_features.append(image_features[pnum])
+
+                already_processed.append(pid)
+
+        # Adjust the labels and features for further processing
+        label_data = dict()
+        label_data['patient_IDs'] = np.asarray(new_pids)
+        label_data['label'] = np.asarray([new_label_data])
+        label_data['label_name'] = label_name
+
+        image_features = new_features
+
+        pid_length = len(label_data['patient_IDs'])
+        print(f'\tNumber of samples / patients after combining: {pid_length}.')
+
     return label_data, image_features
 
 
-def load_features(feat, patientinfo, label_type):
-    ''' Read feature files and stack the features per patient in an array.
+def load_features(feat, patientinfo, label_type, combine_features=False,
+                  combine_method='mean'):
+    """Read feature files and stack the features per patient in an array.
+
         Additionally, if a patient label file is supplied, the features from
         a patient will be matched to the labels.
 
@@ -162,7 +222,14 @@ def load_features(feat, patientinfo, label_type):
                 List containing all the labels that should be extracted from
                 the patientinfo file.
 
-    '''
+        combine_features: boolean, default False
+                Determines whether to combine the features from all samples
+                of the same patient or not.
+
+        combine_methods: string, mean or max
+                If features per patient should be combined, determine how.
+
+    """
     # Check if features is a simple list, or just one string
     if '=' not in feat[0]:
         feat = ['Mod0=' + ','.join(feat)]
@@ -186,7 +253,9 @@ def load_features(feat, patientinfo, label_type):
     # Read the features and classification data
     label_data, image_features =\
         load_data(feat, patientinfo,
-                  label_type, modnames)
+                  label_type, modnames,
+                  combine_features,
+                  combine_method)
 
     return label_data, image_features