Determine whether a cross validation will be performed or not. Obsolete, will be removed.
+Determine whether to use Segmentix tool for segmentation preprocessing.
+Specifies which feature calculation tool should be used.
+Specifies which tool will be used for image preprocessing.
+Specifies which tool will be used for image registration.
+Specifies which tool will be used for applying image transformations.
+Number of cores to be used by joblib for multicore processing.
+Type of backend to be used by joblib for multicore processing.
+Determines whether after every cross validation iteration the result will be saved, in addition to the result after all iterations. Especially useful for debugging.
True, False
+True, False
+predict/CalcFeatures:1.0, pyradiomics/CF_pyradiomics:1.0, your own tool reference
+worc/PreProcess:1.0, your own tool reference
+‘elastix4.8/Elastix:4.8’, your own tool reference
+‘elastix4.8/Transformix:4.8’, your own tool reference
+Integer > 0
+multiprocessing, threading
+True, False
+
+
Segmentix
+
mask
+segtype
+segradius
+N_blobs
+fillholes
+
If a mask is supplied, should the mask be subtracted from the contour or multiplied.
+If Ring, then a ring around the segmentation will be used as contour.
+Define the radius of the ring used if segtype is Ring.
+How many of the largest blobs are extracted from the segmentation. If None, no blob extraction is used.
+Determines whether hole filling will be used.
If a mask is supplied and this is set to True, normalize image based on supplied ROI. Otherwise, the full image is used for normalization using the SimpleITK Normalize function. Lastly, setting this to False will result in no normalization being applied.
+Method used for normalization if ROI is supplied. Currently, z-scoring or using the minimum and median of the ROI can be used.
Determine whether orientation features are computed or not.
+Determine whether histogram features are computed or not.
+Determine whether orientation features are computed or not.
+Determine whether Gabor texture features are computed or not.
+Determine whether LBP texture features are computed or not.
+Determine whether GLCM texture features are computed or not.
+Determine whether GLCM Multislice texture features are computed or not.
+Determine whether GLRLM texture features are computed or not.
+Determine whether GLSZM texture features are computed or not.
+Determine whether NGTDM texture features are computed or not.
+Determine whether coliage features are computed or not.
+Determine whether vessel features are computed or not.
+Determine whether LoG features are computed or not.
+Determine whether local phase features are computed or not.
+Modality of images supplied. Determines how the image is loaded.
+Frequencies of Gabor filters used: can be a single float or a list.
+Angles of Gabor filters in degrees: can be a single integer or a list.
+Angles used in GLCM computation in radians: can be a single float or a list.
+Number of grayscale levels used in discretization before GLCM computation.
+Distance(s) used in GLCM computation in pixels: can be a single integer or a list.
+Radii used for LBP computation: can be a single integer or a list.
+Number(s) of points used in LBP computation: can be a single integer or a list.
+Minimal wavelength in pixels used for phase features.
+Number of scales used in phase feature computation.
+Standard deviation(s) in pixels used in log feature computation: can be a single integer or a list.
+Scale in pixels used for Frangi vessel filter. Given as a minimum and a maximum.
+Step size used to go from minimum to maximum scale on Frangi vessel filter.
+Radius to determine boundary of between inner part and edge in Frangi vessel filter.
If True, exclude features which have a variance < 0.01. Based on ` sklearn <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.VarianceThreshold.html/>`_.
+Randomly select which feature groups to use. Parameters determined by the SelectFeatGroup config part, see below.
+Select features by first training a LASSO model. The alpha for the LASSO model is randomly generated. See also sklearn.
+If True, Use Principle Component Analysis (PCA) to select features.
+Method to select number of components using PCA: Either the number of components that explains 95% of the variance, or use a fixed number of components.95variance
+If True, use statistical test to select features.
+Define the type of statistical test to be used.
+Specify a threshold for the p-value threshold used in the statistical test to select features. The first element defines the lower boundary, the other the upper boundary. Random sampling will occur between the boundaries.
+If True, use Relief to select features.
+Min and max of number of nearest neighbors search range in Relief.
+Min and max of sample size search range in Relief.
+Min and max of positive distance search range in Relief.
+Min and max of number of features that is selected search range in Relief.
Boolean(s)
+Boolean(s)
+Boolean(s)
+Boolean(s)
+Inteteger(s), 95variance
+Boolean(s)
+ttest, Welch, Wilcoxon, MannWhitneyU
+Two Integers: loc and scale
+Boolean(s)
+Two Integers: loc and scale
+Two Integers: loc and scale
+Two Integers: loc and scale
+Two Integers: loc and scale
If True, use shape features in model.
+If True, use histogram features in model.
+If True, use orientation features in model.
+If True, use Gabor texture features in model.
+If True, use GLCM texture features in model.
+If True, use GLCM Multislice texture features in model.
+If True, use GLRLM texture features in model.
+If True, use GLSZM texture features in model.
+If True, use NGTDM texture features in model.
+If True, use LBP texture features in model.
+If True, use patient features in model.
+If True, use semantic features in model.
+If True, use coliage features in model.
+If True, use log features in model.
+If True, use vessel features in model.
+If True, use phase features in model.
If True, use feature imputation methods to replace NaN values. If False, all NaN features will be set to zero.
+Method to be used for imputation.
+When using k-Nearest Neighbors (kNN) for feature imputation, determines the number of neighbors used for imputation. Can be a single integer or a list.
Use fastr for the optimization gridsearch (recommended on clusters, default) or if set to False , joblib (recommended for PCs but not on Windows).
+Name of execution plugin to be used. Default use the same as the self.fastr_plugin for the WORC object.
+Select the estimator(s) to use. Most are implemented using sklearn. For abbreviations, see above.
+Maximum number of iterations to use in training an estimator. Only for specific estimators, see sklearn.
+When using a SVM, specify the kernel type.
+Range of the SVM slack parameter. We sample on a uniform log scale: the parameters specify the range of the exponent (a, a + b).
+Range of the SVM polynomial degree when using a polynomial kernel. We sample on a uniform scale: the parameters specify the range (a, a + b).
+Range of SVM homogeneity parameter. We sample on a uniform scale: the parameters specify the range (a, a + b).
+Range of the SVM gamma parameter. We sample on a uniform log scale: the parameters specify the range of the exponent (a, a + b)
+Range of number of trees in a RF. We sample on a uniform scale: the parameters specify the range (a, a + b).
+Range of minimum number of samples required to split a branch in a RF. We sample on a uniform scale: the parameters specify the range (a, a + b).
+Range of maximum depth of a RF. We sample on a uniform scale: the parameters specify the range (a, a + b).
+Penalty term used in LR.
+Range of regularization strength in LR. We sample on a uniform scale: the parameters specify the range (a, a + b).
+Solver used in LDA.
+Range of the LDA shrinkage parameter. We sample on a uniform log scale: the parameters specify the range of the exponent (a, a + b).
+Range of the QDA regularization parameter. We sample on a uniform log scale: the parameters specify the range of the exponent (a, a + b).
+Range of the ElasticNet penalty parameter. We sample on a uniform log scale: the parameters specify the range of the exponent (a, a + b).
+Range of l1 ratio in LR. We sample on a uniform scale: the parameters specify the range (a, a + b).
+Range of the SGD penalty parameter. We sample on a uniform log scale: the parameters specify the range of the exponent (a, a + b).
+Range of l1 ratio in SGD. We sample on a uniform scale: the parameters specify the range (a, a + b).
+hinge, Loss function of SG
+Penalty term in SGD.
+Regularization strenght in ComplementNB. We sample on a uniform scale: the parameters specify the range (a, a + b)
True, False
+Any fastr execution plugin .
+SVM , SVR, SGD, SGDR, RF, LDA, QDA, ComplementND, GaussianNB, LR, RFR, Lasso, ElasticNet. All are estimators from sklearn
+Integer
+poly, linear, rbf
+Two Integers: loc and scale
+Two Integers: loc and scale
+Two Integers: loc and scale
+Two Integers: loc and scale
+Two Integers: loc and scale
+Two Integers: loc and scale
+Two Integers: loc and scale
+none, l2, l1
+Two Integers: loc and scale
+svd, lsqr, eigen
+Two Integers: loc and scale
+Two Integers: loc and scale
+Two Integers: loc and scale
+Two Integers: loc and scale
+Two Integers: loc and scale
+Two Integers: loc and scale
+hinge, squared_hinge, modified_huber
+none, l2, l1
+Two Integers: loc and scale
+
+
CrossValidation
+
N_iterations
+test_size
+
Number of times the data is split in training and test in the outer cross-validation.
+The percentage of data to be used for testing.
+
100
+0.2
+
Integer
+Float
+
+
Labels
+
label_names
+modus
+url
+projectID
+
The labels used from your label file for classification.
+Determine whether multilabel or singlelabel classification or regression will be performed.
+WIP
+WIP
Specify the optimization metric for your hyperparameter search.
+Size of test set in the hyperoptimization cross validation, given as a percentage of the whole dataset.
+
Number of iterations used in the hyperparameter optimization. This corresponds to the number of samples drawn from the parameter grid.
+Number of jobs assigned to a single core. Only used if fastr is set to true in the classfication.
Any sklearn metric
+Float
+5
+Integer
+Integer
+100
+test_score
+
+
FeatureScaling
+
scale_features
+scaling_method
+
Determine whether to use feature scaling is.
+Determine the scaling method.
+
True
+z_score
+
Boolean(s)
+z_score, minmax
+
+
SampleProcessing
+
SMOTE
+SMOTE_ratio
+SMOTE_neighbors
+Oversampling
+
Determine whether to use SMOTE oversampling, see also ` imbalanced learn <https://imbalanced-learn.readthedocs.io/en/stable/generated/imblearn.over_sampling.SMOTE.html/>`_.
+Determine the ratio of oversampling. If 1, the minority class will be oversampled to the same size as the majority class. We sample on a uniform scale: the parameters specify the range (a, a + b).
+Number of neighbors used in SMOTE. This should be much smaller than the number of objects/patients you supply. We sample on a uniform scale: the parameters specify the range (a, a + b).
+Determine whether to random oversampling.
+
True
+1, 0
+5, 15
+False
+
Boolean(s)
+Two Integers: loc and scale
+Two Integers: loc and scale
+Boolean(s)
+
+
Ensemble
+
Use
+
Determine whether to use ensembling or not. Either provide an integer to state how many estimators to include, or True, which will use the default ensembling method.
+
1
+
Boolean or Integer
+
+
Bootstrap
+
Use
+N_iterations
+
+
False
+1000
+
False
+1000
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/WORC/doc/_build/html/_modules/WORC/IOparser/config_WORC.html b/WORC/doc/_build/html/_modules/WORC/IOparser/config_WORC.html
index da54aa30..8072c28e 100644
--- a/WORC/doc/_build/html/_modules/WORC/IOparser/config_WORC.html
+++ b/WORC/doc/_build/html/_modules/WORC/IOparser/config_WORC.html
@@ -8,7 +8,7 @@
- WORC.IOparser.config_WORC — WORC 3.0.0 documentation
+ WORC.IOparser.config_WORC — WORC 3.1.0 documentation
@@ -59,7 +59,7 @@
pfiles,image_features)exceptValueErrorase:
- message=e.message+'. Please take a look at your labels'+\
+ message=str(e)+'. Please take a look at your labels'+\
' file and make sure it is formatted correctly. '+\
'See also https://github.com/MStarmans91/WORC/wiki/The-WORC-configuration#genetics.'raiseWORCexceptions.WORCValueError(message)
diff --git a/WORC/doc/_build/html/_modules/WORC/WORC.html b/WORC/doc/_build/html/_modules/WORC/WORC.html
index b8539902..e34e1eca 100644
--- a/WORC/doc/_build/html/_modules/WORC/WORC.html
+++ b/WORC/doc/_build/html/_modules/WORC/WORC.html
@@ -8,7 +8,7 @@
- WORC.WORC — WORC 3.0.0 documentation
+ WORC.WORC — WORC 3.1.0 documentation
@@ -59,7 +59,7 @@
[docs]def__init__(self,name='test'):"""Initialize WORC object. Set the initial variables all to None, except for some defaults.
@@ -266,7 +268,7 @@
Source code for WORC.WORC
name: name of the nework (string, optional) """
- self.name=name
+ self.name='WORC_'+name# Initialize several objectsself.configs=list()
@@ -277,6 +279,7 @@
self.configs=[self.defaultconfig()]*len(self.images_train)else:self.configs=[self.defaultconfig()]*len(self.features_train)
- self.network=fastr.create_network('WORC_'+self.name)
+ self.network=fastr.create_network(self.name)# BUG: We currently use the first configuration as general configimage_types=list()
@@ -566,6 +580,12 @@
config=configparser.ConfigParser()config.read(c)c=config
- cfile=os.path.join(fastr.config.mounts['tmp'],'WORC_'+self.name,("config_{}_{}.ini").format(self.name,num))
+ cfile=os.path.join(self.fastr_tmpdir,f"config_{self.name}_{num}.ini")ifnotos.path.exists(os.path.dirname(cfile)):os.makedirs(os.path.dirname(cfile))withopen(cfile,'w')asconfigfile:c.write(configfile)
- self.fastrconfigs.append(("vfs://tmp/{}/config_{}_{}.ini").format('WORC_'+self.name,self.name,num))
+ self.fastrconfigs.append(cfile)# Generate gridsearch parameter files if required# TODO: We now use the first configuration for the classifier, but his needs to be separated from the rest per modality
@@ -1047,7 +1079,7 @@
[docs]defexecute(self):""" Execute the network through the fastr.network.execute command. """# Draw and execute nwtwork
- self.network.draw(file_path=self.network.id+'.svg',draw_dimensions=True)
- self.network.execute(self.source_data,self.sink_data,execution_plugin=self.fastr_plugin,tmpdir=self.fastr_tmpdir)
- # self.network.execute(self.source_data, self.sink_data)
+ try:
+ self.network.draw(file_path=self.network.id+'.svg',draw_dimensions=True)
+ exceptgraphviz.backend.ExecutableNotFound:
+ print('[WORC WARNING] Graphviz executable not found: not drawing network diagram. MAke sure the Graphviz executables are on your systems PATH.')
+ self.network.execute(self.source_data,self.sink_data,execution_plugin=self.fastr_plugin,tmpdir=self.fastr_tmpdir)
+#!/usr/bin/env python
+
+# Copyright 2016-2019 Biomedical Imaging Group Rotterdam, Departments of
+# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+fromimblearnimportover_sampling,under_sampling,combine
+importnumpyasnp
+fromsklearn.utilsimportcheck_random_state
+importWORC.addexceptionsasae
+
+
+
[docs]classObjectSampler(object):
+ """
+ Samples objects for learning based on various under-, over- and combined sampling methods.
+
+ The choice of included methods is largely based on:
+
+ He, Haibo, and Edwardo A. Garcia. "Learning from imbalanced data."
+ IEEE Transactions on Knowledge & Data Engineering 9 (2008): 1263-1284.
+
+
+ """
+
[docs]def__init__(self,method,
+ sampling_strategy='auto',
+ SMOTE_ratio=1,
+ SMOTE_neighbors=5,
+ n_jobs=1,
+ n_neighbors=3,
+ ):
+
+ # Initialize a random state
+ self.random_seed=np.random.randint(5000)
+ self.random_state=check_random_state(random_seed)
+
+ # Initialize all objects as Nones: overriden when required by functions
+ self.sampling_strategy=None
+ self.object=None
+ self.n_neighbors=None
+ self.n_jobs=None
+
+ ifmethod=='RandomUnderSampling':
+ self.init_RandomUnderSampling(sampling_strategy)
+ elifmethod=='NearMiss':
+ self.init_NearMiss(sampling_strategy,n_neighbors,n_jobs)
+ elifmethod=='NeigbourhoodCleaningRule':
+ self.init_NeigbourhoodCleaningRule()
+ elifmethod=='RandomOverSampling':
+ self.init_RandomOverSampling(sampling_strategy)
+ elifmethod=='ADASYN':
+ self.init_ADASYN()
+ elifmethod=='BorderlineSMOTE':
+ self.init_BorderlineSMOTE()
+ elifmethod=='SMOTEENN':
+ self.init_SMOTEENN()
+ elifmethod=='SMOTETomek':
+ self.init_SMOTETomek()
+ else:
+ raiseae.WORCKeyError(f'{method} is not a valid sampling method!')
"""Ensemble of BaseSearchCV Estimators."""
# @abstractmethod
[docs]def__init__(self,estimators):
+ ifnotestimators:
+ message='You supplied an empty list of estimators: No ensemble creation possible.'
+ raiseWORCexceptions.WORCValueError(message)self.estimators=estimatorsself.n_estimators=len(estimators)
@@ -273,20 +276,36 @@
Source code for WORC.classification.SearchCV
"""
self.estimators[0]._check_is_fitted('predict')
- # NOTE: Check if we are dealing with multilabel
+ # Check if we are dealing with multilabel
+ iflen(self.estimators[0].predict(X).shape)==1:
+ nlabels=1
+ else:
+ nlabels=self.estimators[0].predict(X).shape[1]
+
iftype(self.estimators[0].best_estimator_)==OneVsRestClassifier:
+ multilabel=True
+ elifnlabels>1:
+ multilabel=True
+ else:
+ multilabel=False
+
+ ifmultilabel:# Multilabel
- nlabels=self.estimators[0].predict(X).shape[1]outcome=np.zeros((self.n_estimators,len(X),nlabels))fornum,estinenumerate(self.estimators):ifhasattr(est,'predict_proba'):# BUG: SVM kernel can be wrong typeifhasattr(est.best_estimator_,'kernel'):est.best_estimator_.kernel=str(est.best_estimator_.kernel)
- outcome[num,:,:]=est.predict_proba(X)[:,1]
+ outcome[num,:,:]=est.predict_proba(X)else:outcome[num,:,:]=est.predict(X)
+ # Replace NAN if they are there
+ ifnp.isnan(outcome).any():
+ print('[WARNING] Predictions contain NaN, removing those rows.')
+ outcome=outcome[~np.isnan(outcome).any(axis=1)]
+
outcome=np.squeeze(np.mean(outcome,axis=0))# NOTE: Binarize specifically for multiclass
@@ -307,6 +326,9 @@
Source code for WORC.classification.SearchCV
else:outcome[num,:]=est.predict(X)
+ # Replace NAN if they are there
+ outcome=outcome[~np.isnan(outcome).any(axis=1)]
+
outcome=np.squeeze(np.mean(outcome,axis=0))# Binarize
@@ -333,19 +355,53 @@
Source code for WORC.classification.SearchCV
"""
self.estimators[0]._check_is_fitted('predict_proba')
- # For probabilities, we get both a class0 and a class1 score
- outcome=np.zeros((len(X),2))
- outcome_class1=np.zeros((self.n_estimators,len(X)))
- outcome_class2=np.zeros((self.n_estimators,len(X)))
- fornum,estinenumerate(self.estimators):
- # BUG: SVM kernel can be wrong type
- ifhasattr(est.best_estimator_,'kernel'):
- est.best_estimator_.kernel=str(est.best_estimator_.kernel)
- outcome_class1[num,:]=est.predict_proba(X)[:,0]
- outcome_class2[num,:]=est.predict_proba(X)[:,1]
-
- outcome[:,0]=np.squeeze(np.mean(outcome_class1,axis=0))
- outcome[:,1]=np.squeeze(np.mean(outcome_class2,axis=0))
+ # Check if we are dealing with multilabel
+ iflen(self.estimators[0].predict(X).shape)==1:
+ nlabels=1
+ else:
+ nlabels=self.estimators[0].predict(X).shape[1]
+
+ iftype(self.estimators[0].best_estimator_)==OneVsRestClassifier:
+ multilabel=True
+ elifnlabels>1:
+ multilabel=True
+ else:
+ multilabel=False
+
+ ifmultilabel:
+ # Multilabel
+ outcome=np.zeros((self.n_estimators,len(X),nlabels))
+ fornum,estinenumerate(self.estimators):
+ ifhasattr(est,'predict_proba'):
+ # BUG: SVM kernel can be wrong type
+ ifhasattr(est.best_estimator_,'kernel'):
+ est.best_estimator_.kernel=str(est.best_estimator_.kernel)
+ outcome[num,:,:]=est.predict_proba(X)
+ else:
+ outcome[num,:,:]=est.predict(X)
+
+ # Replace NAN if they are there
+ ifnp.isnan(outcome).any():
+ print('[WARNING] Predictions contain NaN, removing those rows.')
+ outcome=outcome[~np.isnan(outcome).any(axis=1)]
+
+ outcome=np.squeeze(np.mean(outcome,axis=0))
+ else:
+ # Single label
+ # For probabilities, we get both a class0 and a class1 score
+ outcome=np.zeros((len(X),2))
+ outcome_class1=np.zeros((self.n_estimators,len(X)))
+ outcome_class2=np.zeros((self.n_estimators,len(X)))
+ fornum,estinenumerate(self.estimators):
+ # BUG: SVM kernel can be wrong type
+ ifhasattr(est.best_estimator_,'kernel'):
+ est.best_estimator_.kernel=str(est.best_estimator_.kernel)
+ outcome_class1[num,:]=est.predict_proba(X)[:,0]
+ outcome_class2[num,:]=est.predict_proba(X)[:,1]
+
+ outcome[:,0]=np.squeeze(np.mean(outcome_class1,axis=0))
+ outcome[:,1]=np.squeeze(np.mean(outcome_class2,axis=0))
+
returnoutcome
[docs]defpreprocess(self,X,y=None):'''Apply the available preprocssing methods to the features'''
+ ifself.best_scalerisnotNone:
+ X=self.best_scaler.transform(X)
+
ifself.best_imputerisnotNone:X=self.best_imputer.transform(X)
+ # Replace nan if still left
+ X=replacenan(np.asarray(X)).tolist()
+
# Only oversample in training phase, i.e. if we have the labelsifyisnotNone:ifself.best_SMOTEisnotNone:
@@ -685,9 +749,6 @@
check_is_fitted(self,'cv_results_')returnself.cv_results_['mean_test_score'][self.best_index_]
- @property
- defgrid_scores_(self):
- warnings.warn(
- "The grid_scores_ attribute was deprecated in version 0.18"
- " in favor of the more elaborate cv_results_ attribute."
- " The grid_scores_ attribute will not be available from 0.20",
- DeprecationWarning)
-
- check_is_fitted(self,'cv_results_')
- grid_scores=list()
-
- fori,(params,mean,std)inenumerate(zip(
- self.cv_results_['params'],
- self.cv_results_['mean_test_score'],
- self.cv_results_['std_test_score'])):
- scores=np.array(list(self.cv_results_['split%d_test_score'
- %s][i]
- forsinrange(self.n_splits_)),
- dtype=np.float64)
- grid_scores.append(_CVScoreTuple(params,mean,scores))
-
- returngrid_scores
-
try:array_means=np.average(array,axis=1,weights=weights)exceptZeroDivisionErrorase:
- e=('[PREDICT Warning] {}. Setting {} to unweighted.').format(e,key_name)
+ e=f'[WORC Warning] {e}. Setting {key_name} to unweighted.'print(e)array_means=np.average(array,axis=1)results['mean_%s'%key_name]=array_means
+
+ array_mins=np.min(array,axis=1)
+ results['min_%s'%key_name]=array_mins
+
# Weighted std is not directly available in numpytry:array_stds=np.sqrt(np.average((array-array_means[:,np.newaxis])**2,axis=1,weights=weights))exceptZeroDivisionErrorase:
- e=('[PREDICT Warning] {}. Setting {} to unweighted.').format(e,key_name)
+ e=f'[WORC Warning] {e}. Setting {key_name} to unweighted.'print(e)array_stds=np.sqrt(np.average((array-array_means[:,np.newaxis])**2,
@@ -797,8 +839,15 @@
Source code for WORC.classification.SearchCV
_store('fit_time',fit_time)_store('score_time',score_time)
+ # Compute the "Generalization" score
+ difference_score=abs(results['mean_train_score']-results['mean_test_score'])
+ generalization_score=results['mean_test_score']-difference_score
+ results['generalization_score']=generalization_score
+ results['rank_generalization_score']=np.asarray(
+ rankdata(-results['generalization_score'],method='min'),dtype=np.int32)
+
# Rank the indices of scores from all parameter settings
- ranked_test_scores=results["rank_test_score"]
+ ranked_test_scores=results["rank_"+self.ranking_score]indices=range(0,len(ranked_test_scores))sortedindices=[xfor_,xinsorted(zip(ranked_test_scores,indices))]
@@ -814,7 +863,7 @@
Source code for WORC.classification.SearchCV
n_candidates =len(candidate_params_est)# Store the atributes of the best performing estimator
- best_index=np.flatnonzero(results["rank_test_score"]==1)[0]
+ best_index=np.flatnonzero(results["rank_"+self.ranking_score]==1)[0]best_parameters_est=candidate_params_est[best_index]best_parameters_all=candidate_params_all[best_index]
@@ -938,7 +987,6 @@
Source code for WORC.classification.SearchCV
[docs]defcreate_ensemble(self,X_train,Y_train,verbose=None,initialize=True,scoring=None,method=50):
- # NOTE: Function is still WIP, do not actually use this.''' Create an (optimal) ensemble of a combination of hyperparameter settings
@@ -979,7 +1027,7 @@
Source code for WORC.classification.SearchCV
elif scoring=='sar':perf=sar_score(Y_valid_truth,Y_valid_score)else:
- raiseKeyError('[PREDICT Warning] No valid score method given in ensembling: '+str(scoring))
+ raiseKeyError('[WORC Warning] No valid score method given in ensembling: '+str(scoring))returnperf
@@ -1005,7 +1053,11 @@
Source code for WORC.classification.SearchCV
# Simply take the top50 best hyperparameters
ifverbose:print(f'Creating ensemble using top {str(method)} individual classifiers.')
- ensemble=range(0,method)
+ ifmethod==1:
+ # Next functions expect list
+ ensemble=[0]
+ else:
+ ensemble=range(0,method)elifmethod=='FitNumber':# Use optimum number of models
@@ -1331,7 +1383,7 @@
Source code for WORC.classification.SearchCV
print(f"Single estimator best {scoring}: {single_estimator_performance}.")print(f'Ensemble consists of {len(ensemble)} estimators {ensemble}.')else:
- print('[PREDICT WARNING] No valid ensemble method given: {}. Not ensembling').format(str(method))
+ print(f'[WORC WARNING] No valid ensemble method given: {method}. Not ensembling')returnself# Create the ensemble --------------------------------------------------
@@ -1358,9 +1410,8 @@
message ='One or more of the values in your parameter sampler '+\
'is either not iterable, or the distribution cannot '+\
'generate valid samples. Please check your '+\
- (' parameters. At least {} gives an error.').format(k)
- raisePREDICTexceptions.PREDICTValueError(message)
+ f' parameters. At least {k} gives an error.'
+ raiseWORCexceptions.WORCValueError(message)# Split the parameters files in equal partskeys=list(parameters_temp.keys())
@@ -1421,7 +1472,7 @@
Source code for WORC.classification.SearchCV
for numberink:temp_dict[number]=parameters_temp[number]
- fname=('settings_{}.json').format(str(num))
+ fname=f'settings_{num}.json'sourcename=os.path.join(tempfolder,'parameters',fname)ifnotos.path.exists(os.path.dirname(sourcename)):os.makedirs(os.path.dirname(sourcename))
@@ -1429,10 +1480,7 @@
except ValueErrorase:print(e)message=('Fitting classifiers has failed. The temporary '+
- 'results where not deleted and can be found in {}. '+
+ f'results where not deleted and can be found in {tempfolder}. '+'Probably your fitting and scoring failed: check out '+'the tmp/fitandscore folder within the tempfolder for '+
- 'the fastr job temporary results.').format(tempfolder)
- raisePREDICTexceptions.PREDICTValueError(message)
+ 'the fastr job temporary results.')
+ raiseWORCexceptions.WORCValueError(message)# Remove the temporary folder usedshutil.rmtree(tempfolder)
@@ -1761,13 +1806,15 @@
Source code for WORC.classification.createfixedsplits
+#!/usr/bin/env python
+
+# Copyright 2016-2019 Biomedical Imaging Group Rotterdam, Departments of
+# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+importnumpyasnp
+fromsklearn.model_selectionimporttrain_test_split
+importWORC.addexceptionsasae
+fromWORC.processing.label_processingimportload_labels
+importpandasaspd
+
+
+
[docs]defcreatefixedsplits(label_file=None,label_type=None,patient_IDs=None,
+ test_size=0.2,N_iterations=1,regression=False,
+ stratify=None,modus='singlelabel',output=None):
+ '''
+ Create fixed splits for a cross validation.
+ '''
+ # Check whether input is valid
+ ifpatient_IDsisNone:
+ iflabel_fileisnotNoneandlabel_typeisnotNone:
+ # Read the label file
+ label_data=load_labels(label_file,label_type)
+ patient_IDs=label_data['patient_IDs']
+
+ # Create the stratification object
+ ifmodus=='singlelabel':
+ stratify=label_data['label']
+ elifmodus=='multilabel':
+ # Create a stratification object from the labels
+ # Label = 0 means no label equals one
+ # Other label numbers refer to the label name that is 1
+ stratify=list()
+ labels=label_data['label']
+ forpnuminrange(0,len(labels[0])):
+ plabel=0
+ forlnum,slabelinenumerate(labels):
+ ifslabel[pnum]==1:
+ plabel=lnum+1
+ stratify.append(plabel)
+
+ else:
+ raiseae.WORCKeyError('{} is not a valid modus!').format(modus)
+ else:
+ raiseae.WORCIOError('Either a label file and label type or patient_IDs need to be provided!')
+
+ pd_dict=dict()
+ foriinrange(N_iterations):
+ print(f'Splitting iteration {i + 1} / {N_iterations}')
+ # Create a random seed for the splitting
+ random_seed=np.random.randint(5000)
+
+ # Define stratification
+ unique_patient_IDs,unique_indices=\
+ np.unique(np.asarray(patient_IDs),return_index=True)
+ ifregression:
+ unique_stratify=None
+ else:
+ unique_stratify=[stratify[i]foriinunique_indices]
+
+ # Split, throw error when dataset is too small for split ratio's
+ try:
+ unique_PID_train,indices_PID_test\
+ =train_test_split(unique_patient_IDs,
+ test_size=test_size,
+ random_state=random_seed,
+ stratify=unique_stratify)
+ exceptValueErrorase:
+ e=str(e)+' Increase the size of your test set.'
+ raiseae.WORCValueError(e)
+
+ # Check for all IDs if they are in test or training
+ indices_train=list()
+ indices_test=list()
+ patient_ID_train=list()
+ patient_ID_test=list()
+ fornum,pidinenumerate(patient_IDs):
+ ifpidinunique_PID_train:
+ indices_train.append(num)
+
+ # Make sure we get a unique ID
+ ifpidinpatient_ID_train:
+ n=1
+ whilestr(pid+'_'+str(n))inpatient_ID_train:
+ n+=1
+ pid=str(pid+'_'+str(n))
+ patient_ID_train.append(pid)
+ else:
+ indices_test.append(num)
+
+ # Make sure we get a unique ID
+ ifpidinpatient_ID_test:
+ n=1
+ whilestr(pid+'_'+str(n))inpatient_ID_test:
+ n+=1
+ pid=str(pid+'_'+str(n))
+ patient_ID_test.append(pid)
+
+ # Add to train object
+ pd_dict[str(i)+'_train']=patient_ID_train
+
+ # Test object has to be same length as training object
+ extras=[""]*(len(patient_ID_train)-len(patient_ID_test))
+ patient_ID_test.extend(extras)
+ pd_dict[str(i)+'_test']=patient_ID_test
+
+ # Convert into pandas dataframe for easy use and conversion
+ df=pd.DataFrame(pd_dict)
+
+ # Write output if required
+ ifoutputisnotNone:
+ print("Writing Output.")
+ df.to_csv(output)
+
+ returndf
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/WORC/doc/_build/html/_modules/WORC/classification/crossval.html b/WORC/doc/_build/html/_modules/WORC/classification/crossval.html
index d0efbfd2..395379eb 100644
--- a/WORC/doc/_build/html/_modules/WORC/classification/crossval.html
+++ b/WORC/doc/_build/html/_modules/WORC/classification/crossval.html
@@ -8,7 +8,7 @@
- WORC.classification.crossval — WORC 3.0.0 documentation
+ WORC.classification.crossval — WORC 3.1.0 documentation
@@ -59,7 +59,7 @@
if tempsave:importfastr
-
- # Define all possible regressors
- regressors=['SVR','RFR','SGDR','Lasso','ElasticNet']
-
# Process input datapatient_IDs=label_data['patient_IDs']label_value=label_data['label']
@@ -290,10 +286,8 @@
Source code for WORC.classification.crossval
feature_labels =image_features[0][1]# Check if we need to use fixedsplits:
- iffixedsplitsisnotNoneand'.xlsx'infixedsplits:
- # fixedsplits = '/home/mstarmans/Settings/RandomSufflingOfData.xlsx'
- wb=xlrd.open_workbook(fixedsplits)
- wb=wb.sheet_by_index(1)
+ iffixedsplitsisnotNoneand'.csv'infixedsplits:
+ fixedsplits=pd.read_csv(fixedsplits,header=0)ifmodus=='singlelabel':print('Performing Single class classification.')
@@ -409,17 +403,15 @@
Source code for WORC.classification.crossval
else:# Use pre defined splits
- indices=wb.col_values(i)
- indices=[int(j)forjinindices[1:]]# First element is "Iteration x"
- train=indices[0:121]
- test=indices[121:]
+ train=fixedsplits[str(i)+'_train'].values
+ test=fixedsplits[str(i)+'_test'].values# Convert the numbers to the correct indicesind_train=list()forjintrain:success=Falsefornum,pinenumerate(patient_IDs):
- ifstr(j).zfill(3)==p[0:3]:
+ ifj==p:ind_train.append(num)success=Trueifnotsuccess:
@@ -429,19 +421,27 @@
Source code for WORC.classification.crossval
for jintest:success=Falsefornum,pinenumerate(patient_IDs):
- ifstr(j).zfill(3)==p[0:3]:
+ ifj==p:ind_test.append(num)success=Trueifnotsuccess:raiseae.WORCIOError("Patient "+str(j).zfill(3)+" is not included!")
- X_train=np.asarray(image_features)[ind_train].tolist()
- Y_train=np.asarray(i_class_temp)[ind_train].tolist()
+ X_train=[image_features[i]foriinind_train]
+ X_test=[image_features[i]foriinind_test]
+
patient_ID_train=patient_IDs[ind_train]
- X_test=np.asarray(image_features)[ind_test].tolist()
- Y_test=np.asarray(i_class_temp)[ind_test].tolist()patient_ID_test=patient_IDs[ind_test]
+ ifmodus=='singlelabel':
+ Y_train=i_class_temp[ind_train]
+ Y_test=i_class_temp[ind_test]
+ elifmodus=='multilabel':
+ Y_train=i_class_temp[ind_train,:]
+ Y_test=i_class_temp[ind_test,:]
+ else:
+ raiseae.WORCKeyError('{} is not a valid modus!').format(modus)
+
# Find best hyperparameters and construct classifierconfig['HyperOptimization']['use_fastr']=use_fastrconfig['HyperOptimization']['fastr_plugin']=fastr_plugin
@@ -453,8 +453,7 @@
Source code for WORC.classification.crossval
**config['HyperOptimization'])# Create an ensemble if required
- ifensemble['Use']:
- trained_classifier.create_ensemble(X_train,Y_train)
+ trained_classifier.create_ensemble(X_train,Y_train,method=ensemble['Use'])# We only want to save the feature values and one label arrayX_train=[x[0]forxinX_train]
@@ -469,12 +468,12 @@
from sklearn.baseimportBaseEstimator,ClassifierMixinfromsklearn.utils.validationimportcheck_is_fittedfromsklearn.utils.multiclassimportunique_labels
-importWORC.classification.RankedSVMasRSVM
+fromWORC.classification.RankedSVMimportRankSVM_train,RankSVM_test
<
the fitted object. '''
+ # Set some defaults for if a part fails and we return a dummy
+ test_sample_counts=len(test)
+ fit_time=np.inf
+ score_time=np.inf
+ train_score=np.nan
+ test_score=np.nan
+
# We copy the parameter object so we can alter it and keep the original
+ ifverbose:
+ print("\n")
+ print('#######################################')
+ print('Starting fit and score of new workflow.')para_estimator=para.copy()estimator=cc.construct_classifier(para_estimator)ifscoring!='average_precision_weighted':
@@ -350,14 +366,36 @@
Source code for WORC.classification.fitandscore
<
feature_values=np.asarray([x[0]forxinX])feature_labels=np.asarray([x[1]forxinX])
+ # ------------------------------------------------------------------------
+ # Feature scaling
+ if'FeatureScaling'inpara_estimator:
+ ifverbose:
+ print("Fitting scaler and transforming features.")
+
+ ifpara_estimator['FeatureScaling']=='z_score':
+ scaler=StandardScaler().fit(feature_values)
+ elifpara_estimator['FeatureScaling']=='minmax':
+ scaler=MinMaxScaler().fit(feature_values)
+ else:
+ scaler=None
+
+ ifscalerisnotNone:
+ feature_values=scaler.transform(feature_values)
+ delpara_estimator['FeatureScaling']
+ else:
+ scaler=None
+
+ # Delete the object if we do not need to return it
+ ifnotreturn_all:
+ delscaler
+
# ------------------------------------------------------------------------# Feature imputationif'Imputation'inpara_estimator.keys():ifpara_estimator['Imputation']=='True':imp_type=para_estimator['ImputationMethod']ifverbose:
- message=('Imputing NaN with {}.').format(imp_type)
- print(message)
+ print(f'Imputing NaN with {imp_type}.')imp_nn=para_estimator['ImputationNeighbours']imputer=Imputer(missing_values=np.nan,strategy=imp_type,
@@ -378,6 +416,9 @@
Source code for WORC.classification.fitandscore
<
ifnotreturn_all:delimputer
+ # Remove any NaN feature values if these are still left after imputation
+ feature_values=replacenan(feature_values,verbose=verbose,feature_labels=feature_labels[0])
+
# ------------------------------------------------------------------------# Use SMOTE oversamplingif'SampleProcessing_SMOTE'inpara_estimator.keys():
@@ -406,12 +447,8 @@
Source code for WORC.classification.fitandscore
<
pos=int(np.sum(y))neg=int(len(y)-pos)ifverbose:
- message=("Sampling with SMOTE from {} ({} pos, {} neg) to {} ({} pos, {} neg) patients.").format(str(len_in),
- str(pos_initial),
- str(neg_initial),
- str(len(y)),
- str(pos),
- str(neg))
+ message=f"Sampling with SMOTE from {len_in} ({pos_initial} pos,"+\
+ f" {neg_initial} neg) to {len(y)} ({pos} pos, {neg} neg) patients."print(message)else:sm=None
@@ -543,7 +580,9 @@
<
delpara_estimator['UsePCA']delpara_estimator['PCAType']
+ # --------------------------------------------------------------------
+ # Feature selection based on a statistical test
+ if'StatisticalTestUse'inpara_estimator.keys():
+ ifpara_estimator['StatisticalTestUse']=='True':
+ metric=para_estimator['StatisticalTestMetric']
+ threshold=para_estimator['StatisticalTestThreshold']
+ ifverbose:
+ print(f"Selecting features based on statistical test. Method {metric}, threshold {round(threshold, 2)}.")
+ ifverbose:
+ print("Original Length: "+str(len(feature_values[0])))
+
+ StatisticalSel=StatisticalTestThreshold(metric=metric,
+ threshold=threshold)
+
+ StatisticalSel.fit(feature_values,y)
+ feature_values=StatisticalSel.transform(feature_values)
+ feature_labels=StatisticalSel.transform(feature_labels)
+ ifverbose:
+ print("New Length: "+str(len(feature_values[0])))
+ else:
+ StatisticalSel=None
+ delpara_estimator['StatisticalTestUse']
+ delpara_estimator['StatisticalTestMetric']
+ delpara_estimator['StatisticalTestThreshold']
+ else:
+ StatisticalSel=None
+
+ # Delete the object if we do not need to return it
+ ifnotreturn_all:
+ delStatisticalSel
+
# ----------------------------------------------------------------# Fitting and scoring# Only when using fastr this is an entry
@@ -811,21 +836,36 @@
Source code for WORC.classification.fitandscore
<
exceptIndexError:labellength=1
- iflabellength>1andtype(estimator)!=RankedSVM:
- # Multiclass, hence employ a multiclass classifier for e.g. SVM, RF
+ iflabellength>1andtype(estimator)notin[RankedSVM,
+ RandomForestClassifier]:
+ # Multiclass, hence employ a multiclass classifier for e.g. SVM, LRestimator.set_params(**para_estimator)estimator=OneVsRestClassifier(estimator)para_estimator={}ifverbose:print("Fitting ML.")
- ret=_fit_and_score(estimator,feature_values,y,
- scorer,train,
- test,verbose,
- para_estimator,fit_params,return_train_score,
- return_parameters,
- return_n_test_samples,
- return_times,error_score)
+
+ try:
+ ret=_fit_and_score(estimator,feature_values,y,
+ scorer,train,
+ test,verbose,
+ para_estimator,fit_params,return_train_score,
+ return_parameters,
+ return_n_test_samples,
+ return_times,error_score)
+ except(ValueError,LinAlgError)ase:
+ iftype(estimator)==LDA:
+ print('[WARNING]: skipping this setting due to LDA Error: '+e.message)
+ ret=[train_score,test_score,test_sample_counts,
+ fit_time,score_time,para_estimator,para]
+
+ ifreturn_all:
+ returnret,GroupSel,VarSel,SelectModel,feature_labels[0],scaler,imputer,pca,StatisticalSel,ReliefSel,sm,ros
+ else:
+ returnret
+ else:
+ raisee# Remove 'estimator object', it's the causes of a bug.# Somewhere between scikit-learn 0.18.2 and 0.20.2
@@ -897,9 +937,9 @@
Source code for WORC.classification.fitandscore
<
ifnp.isnan(value):ifverbose:iffeature_labelsisnotNone:
- print("[WORC WARNING] NaN found, patient {}, label {}. Replacing with zero.").format(pnum,feature_labels[fnum])
+ print(f"[WORC WARNING] NaN found, patient {pnum}, label {feature_labels[fnum]}. Replacing with zero.")else:
- print("[WORC WARNING] NaN found, patient {}, label {}. Replacing with zero.").format(pnum,fnum)
+ print(f"[WORC WARNING] NaN found, patient {pnum}, label {fnum}. Replacing with zero.")# Note: X is a list of lists, hence we cannot index the element directlyimage_features_temp[pnum,fnum]=0
diff --git a/WORC/doc/_build/html/_modules/WORC/classification/metrics.html b/WORC/doc/_build/html/_modules/WORC/classification/metrics.html
index 22daa37e..f0b908c0 100644
--- a/WORC/doc/_build/html/_modules/WORC/classification/metrics.html
+++ b/WORC/doc/_build/html/_modules/WORC/classification/metrics.html
@@ -8,7 +8,7 @@
- WORC.classification.metrics — WORC 3.0.0 documentation
+ WORC.classification.metrics — WORC 3.1.0 documentation
@@ -59,7 +59,7 @@
- 3.0.0
+ 3.1.0
@@ -174,7 +174,7 @@
Source code for WORC.classification.metrics
# limitations under the License.from__future__importdivision
-fromsklearn.metricsimportaccuracy_score
+fromsklearn.metricsimportaccuracy_score,balanced_accuracy_scorefromsklearn.metricsimportroc_auc_scorefromsklearn.metricsimportconfusion_matrixfromsklearn.metricsimportf1_score
@@ -184,6 +184,7 @@
[docs]defmulti_class_auc(y_truth,y_score):classes=np.unique(y_truth)
- ifany(t==0.0fortinnp.sum(y_score,axis=1)):
- raiseValueError('No AUC is calculated, output probabilities are missing')
+ # if any(t == 0.0 for t in np.sum(y_score, axis=1)):
+ # raise ValueError('No AUC is calculated, output probabilities are missing')pairwise_auc_list=[0.5*(pairwise_auc(y_truth,y_score,i,j)+pairwise_auc(y_truth,y_score,j,i))foriinclassesforjinclassesifi<j]
@@ -349,7 +402,7 @@
Source code for WORC.classification.trainclassifier
print('[WORC Warning] You provided multiple output json files: only the first one will be used!')output_json=output_json[0]
+ iftype(fixedsplits)islist:
+ fixedsplits=''.join(fixedsplits)
+
# Load variables from the config fileconfig=config_io.load_config(config)label_type=config['Labels']['label_names']
@@ -347,8 +348,8 @@
Source code for WORC.classification.trainclassifier
+
+ def_is_detected(self,*args,**kwargs):
+ try:
+ withopen(self._csv_file_path,newline='')ascsvfile:
+ start=csvfile.read(4096)
+
+ # isprintable does not allow newlines, printable does not allow umlauts...
+ ifnotall([cinstring.printableorc.isprintable()forcinstart]):
+ returnFalse
+ dialect=csv.Sniffer().sniff(start)# this triggers csv.Error if it can't sniff the csv dialect
+ returnTrue
+ exceptcsv.Error:
+ # Could not get a csv dialect -> probably not a csv.
+ returnFalse
[docs]defdownload_subject(project,subject,datafolder,session,verbose=False):
+ # Download all data and keep track of resources
+ download_counter=0
+ resource_labels=list()
+ foreinsubject.experiments:
+ resmap={}
+ experiment=subject.experiments[e]
+
+ # FIXME: Need a way to smartly check whether we have a matching RT struct and image
+ # Current solution: We only download the CT sessions, no PET / MRI / Other scans
+ # Specific for STW Strategy BMIA XNAT projects
+
+ ifexperiment.session_typeisNone:# some files in project don't have _CT postfix
+ print(f"\tSkipping patient {subject.label}, experiment {experiment.label}: type is not CT but {experiment.session_type}.")
+ continue
+
+ if'_CT'notinexperiment.session_type:
+ print(f"\tSkipping patient {subject.label}, experiment {experiment.label}: type is not CT but {experiment.session_type}.")
+ continue
+
+ forsinexperiment.scans:
+ scan=experiment.scans[s]
+ print(("\tDownloading patient {}, experiment {}, scan {}.").format(subject.label,experiment.label,
+ scan.id))
+ forresinscan.resources:
+ resource_label=scan.resources[res].label
+ ifresource_label=='NIFTI':
+ # Create output directory
+ outdir=datafolder+'/{}'.format(subject.label)
+ ifnotos.path.exists(outdir):
+ os.makedirs(outdir)
+
+ resmap[resource_label]=scan
+ print(f'resource is {resource_label}')
+ scan.resources[res].download_dir(outdir)
+ resource_labels.append(resource_label)
+ download_counter+=1
+
+ # Parse resources and throw warnings if they not meet the requirements
+ subject_name=subject.label
+ ifdownload_counter==0:
+ print(f'[WARNING] Skipping subject {subject_name}: no (suitable) resources found.')
+ returnFalse
+
+ if'NIFTI'notinresource_labels:
+ print(f'[WARNING] Skipping subject {subject_name}: no NIFTI resources found.')
+ returnFalse
+
+ ifresource_labels.count('NIFTI')<2:
+ print(f'[WARNING] Skipping subject {subject_name}: only one NIFTI resource found, need two (mask and image).')
+ returnFalse
+ elifresource_labels.count('NIFTI')>2:
+ count=resource_labels.count('NIFTI')
+ print(f'[WARNING] Skipping subject {subject_name}: {str(count)} NIFTI resources found, need two (mask and image).')
+ returnFalse
+
+ # Check what the mask and image folders are
+ NIFTI_folders=glob(os.path.join(outdir,'*','scans','*','resources','NIFTI','files'))
+ if'mask'inglob(os.path.join(NIFTI_folders[0],'*.nii.gz'))[0]:
+ NIFTI_image_folder=NIFTI_folders[1]
+ NIFTI_mask_folder=NIFTI_folders[0]
+ else:
+ NIFTI_image_folder=NIFTI_folders[0]
+ NIFTI_mask_folder=NIFTI_folders[1]
+
+ NIFTI_files=glob(os.path.join(NIFTI_image_folder,'*'))
+ iflen(NIFTI_files)==0:
+ print(f'[WARNING] Skipping subject {subject_name}: image NIFTI resources is empty.')
+ shutil.rmtree(outdir)
+ returnFalse
+
+ NIFTI_files=glob(os.path.join(NIFTI_mask_folder,'*'))
+ iflen(NIFTI_files)==0:
+ print(f'[WARNING] Skipping subject {subject_name}: mask NIFTI resources is empty.')
+ shutil.rmtree(outdir)
+ returnFalse
+
+ # Patient is included, so cleanup folder structure
+ os.rename(os.path.join(NIFTI_image_folder,'image.nii.gz'),
+ os.path.join(outdir,'image.nii.gz'))
+ os.rename(os.path.join(NIFTI_mask_folder,'mask_GTV-1.nii.gz'),
+ os.path.join(outdir,'mask.nii.gz'))
+
+ forfolderinglob(os.path.join(outdir,'*','scans')):
+ folder=os.path.dirname(folder)
+ shutil.rmtree(folder)
+
+ returnTrue
+
+
+
[docs]defdownload_project(project_name,xnat_url,datafolder,nsubjects=10,
+ verbose=True):
+
+ # Connect to XNAT and retreive project
+ session=xnat.connect(xnat_url)
+ project=session.projects[project_name]
+
+ # Create the data folder if it does not exist yet
+ datafolder=os.path.join(datafolder,project_name)
+ ifnotos.path.exists(datafolder):
+ os.makedirs(datafolder)
+
+ subjects_len=len(project.subjects)
+ ifnsubjects=='all':
+ nsubjects=subjects_len
+ else:
+ nsubjects=min(nsubjects,subjects_len)
+
+ subjects_counter=1
+ downloaded_subjects_counter=0
+ forsinrange(0,subjects_len):
+ s=project.subjects[s]
+ print(f'Working on subject {subjects_counter}/{subjects_len}')
+ subjects_counter+=1
+
+ success=download_subject(project_name,s,datafolder,session,verbose)
+ ifsuccess:
+ downloaded_subjects_counter+=1
+
+ # Stop downloading if we have reached the required number of subjects
+ ifdownloaded_subjects_counter==nsubjects:
+ break
+
+ # Disconnect the session
+ session.disconnect()
+ ifdownloaded_subjects_counter<nsubjects:
+ raiseValueError(f'Number of subjects downloaded {downloaded_subjects_counter} is smaller than the number required {nsubjects}.')
+
+ print('Done downloading!')
+
+
+
[docs]defdownload_HeadAndNeck(datafolder=None,nsubjects=10):
+ ifdatafolderisNone:
+ # Download data to path in which this script is located + Data
+ cwd=os.getcwd()
+ datafolder=os.path.join(cwd,'Data')
+ ifnotos.path.exists(datafolder):
+ os.makedirs(datafolder)
+
+ xnat_url='https://xnat.bmia.nl/'
+ project_name='stwstrategyhn1'
+ download_project(project_name,xnat_url,datafolder,nsubjects=nsubjects,
+ verbose=True)
[docs]def__init__(self):
+ # initalize the main config object and the custom overrids
+ self._config=configparser.ConfigParser()
+ self._custom_overrides={}
+
+ # Detect when using a cluster and override relevant config fields
+ self._cluster_config_overrides()
[docs]deffullprint(self):
+ '''
+ Print the full contents of the config to the console.
+ '''
+ fork,vinself._config.items():
+ print(f"{k}:")
+ fork2,v2inv.items():
+ print(f"\t{k2}: {v2}")
+ print("\n")
[docs]def__init__(self,function,execute_first):
+ super(InvalidOrderException,self).__init__(f'Invalid order for function {path} call {execute_first} before calling this function')
Source code for WORC.facade.intermediatefacade.intermediatefacade
+fromWORCimportWORC
+
+frompathlibimportPath
+
+fromWORC.detectors.detectorsimportCsvDetector
+fromWORC.facade.intermediatefacade.configbuilderimportConfigBuilder
+from.exceptionsimportPathNotFoundException,NoImagesFoundException,NoSegmentationsFoundException, \
+ InvalidCsvFileException
+
+
+def_for_all_methods(decorator):
+ defdecorate(cls):
+ forattrincls.__dict__:# there's propably a better way to do this
+ ifcallable(getattr(cls,attr)):
+ setattr(cls,attr,decorator(getattr(cls,attr)))
+ returncls
+
+ returndecorate
+
+
+def_error_buldozer(func):
+ _valid_exceptions=[
+ PathNotFoundException,NoImagesFoundException,
+ NoSegmentationsFoundException,InvalidCsvFileException,
+ TypeError,ValueError,NotImplementedError
+ ]
+
+ unexpected_exception_exception=Exception('A blackhole to another dimenstion has opened. This exception should never be thrown. Double check your code or make an issue on the WORC github so that we can fix this issue.')
+
+ defdec(*args,**kwargs):
+ try:
+ func(*args,**kwargs)
+ exceptExceptionase:
+ ife.__class__notin_valid_exceptions:
+ raiseunexpected_exception_exception
+ raisee
+ returndec
+
+
+
[docs]deflabels_from_this_file(self,file_path,is_training=True):
+ labels_file=Path(file_path).expanduser()
+
+ ifnotlabels_file.is_file():
+ raisePathNotFoundException(file_path)
+
+ ifnotCsvDetector(labels_file.absolute()):
+ raiseInvalidCsvFileException(labels_file.absolute())
+
+ # TODO: implement sanity check labels file e.g. is it a labels file and are there labels available
+ ifis_training:
+ self._labels_file_train=labels_file.absolute()
+ else:
+ self._labels_file_test=labels_file.absolute()
+
+
[docs]defsemantics_from_this_file(self,file_path,is_training=True):
+ semantics_file=Path(file_path).expanduser()
+
+ ifnotsemantics_file.is_file():
+ raisePathNotFoundException(file_path)
+
+ ifnotCsvDetector(semantics_file.absolute()):
+ raiseInvalidCsvFileException(semantics_file.absolute())
+
+ # TODO: implement sanity check semantics file e.g. is it a semantics file and are there semantics available
+ ifis_training:
+ self._semantics_file_train=semantics_file.absolute()
+ else:
+ self._semantics_file_test=semantics_file.absolute()
+
+
[docs]defpredict_labels(self,label_names:list):
+ ifnotself._labels_file_train:
+ raiseValueError('No labels file set trough labels_from_this_file')
+
+ ifnotisinstance(label_names,list):
+ raiseTypeError(f'label_names is of type {type(label_names)} while list is expected')
+
+ forlabelinlabel_names:
+ iflen(label.strip())==0:
+ raiseValueError('Invalid label, length = 0')
+
+ # TODO: check if labels is in labels file
+
+ # self._worc.label_names = ', '.join(label_names)
+ self._label_names=label_names
+
+ def_set_and_validate_estimators(self,estimators,scoring_method,method,coarse):
+ # validate
+ ifmethod=='classification':
+ valid_estimators=['SVM','RF','SGD','LR','GaussianNB','ComplementNB','LDA','QDA','RankedSVM']
+ elifmethod=='regression':
+ valid_estimators=['SVR','RFR','ElasticNet','Lasso','SGDR']
+ else:
+ valid_estimators=[]
+
+ forestimatorinestimators:
+ ifestimatornotinvalid_estimators:
+ raiseValueError(
+ f'Invalid estimator {estimator} for {method}; must be one of {", ".join(valid_estimators)}')
+
+ # TODO: sanity check scoring method per estimator
+
+ # set
+ self._config_builder.estimator_scoring_overrides(estimators,scoring_method)
+
+ ifcoarse:
+ self._config_builder.coarse_overrides()
+ else:
+ self._config_builder.full_overrides()
+
+ self._method=method
+
+ def_validate(self):
+ ifnotself._images_train:
+ pass# TODO: throw exception
+
+ ifnotself._segmentations_train:
+ pass# TODO: throw exception
+
+ ifnotself._labels_file_train:
+ pass# TODO: throw an exception
+
+ ifnotself._label_names:
+ pass# TODO: throw exception
+
+ ifnotself._method:
+ pass# TODO: throw exception
+
+ iflen(self._images_train)==len(self._segmentations_train):
+ forindex,subjects_dictinenumerate(self._images_train):
+ try:
+ ifsubjects_dict.keys()!=self._segmentations_train[index].keys():
+ raiseValueError('Subjects in images_train and segmentations_train are not the same')
+
+ # TODO: verify subjects in labels files as well
+ # TODO: peform same checks on images_test and segmentations_test if those are not None
+ exceptIndexError:
+ # this should never be thrown, but i put it here just in case
+ raiseValueError(
+ 'A blackhole to another dimenstion has opened. This exception should never be thrown. Double check your code or make an issue on the WORC github so that we can fix this issue.')
+
+
[docs]defexecute(self):
+ # this function is kind of like the build()-function in a builder, except it peforms execute on the object being built as well
+ self._validate()# do some final sanity checking before we execute the thing
+
+ self._worc.images_train=self._images_train
+ self._worc.segmentations_train=self._segmentations_train
+ self._worc.labels_train=self._labels_file_train
+ self._worc.semantics_train=self._semantics_file_train
+
+ ifself._images_test:
+ self._worc.images_test=self._images_test
+
+ ifself._segmentations_test:
+ self._worc.segmentations_test=self._segmentations_test
+
+ ifself._labels_file_test:
+ self._worc.labels_test=self._labels_file_test
+
+ self._worc.label_names=', '.join(self._label_names)
+ self._config_builder._custom_overrides['Labels']=dict()
+ self._config_builder._custom_overrides['Labels']['label_names']=self._worc.label_names
+
+ self._worc.configs=[self._config_builder.build_config(self._worc.defaultconfig())]
+ self._worc.build()
+ ifself._add_evaluation:
+ self._worc.add_evaluation(label_type=self._label_names[self._selected_label])
+
+ self._worc.set()
+ self._worc.execute()
Source code for WORC.featureprocessing.StatisticalTestFeatures
+#!/usr/bin/env python
+
+# Copyright 2016-2019 Biomedical Imaging Group Rotterdam, Departments of
+# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+importos
+importcsv
+importnumpyasnp
+fromscipy.statsimportttest_ind,ranksums,mannwhitneyu
+importWORC.IOparser.config_io_classifierasconfig_io
+fromWORC.classification.trainclassifierimportload_features
+
+
+
[docs]defStatisticalTestFeatures(features,patientinfo,config,output=None,
+ verbose=True,label_type=None):
+ '''
+ Perform several statistical tests on features, such as a student t-test.
+ Useage is similar to trainclassifier.
+
+ Parameters
+ ----------
+ features: string, mandatory
+ contains the paths to all .hdf5 feature files used.
+ modalityname1=file1,file2,file3,... modalityname2=file1,...
+ Thus, modalities names are always between a space and a equal
+ sign, files are split by commas. We assume that the lists of
+ files for each modality has the same length. Files on the
+ same position on each list should belong to the same patient.
+
+ patientinfo: string, mandatory
+ Contains the path referring to a .txt file containing the
+ patient label(s) and value(s) to be used for learning. See
+ the Github Wiki for the format.
+
+ config: string, mandatory
+ path referring to a .ini file containing the parameters
+ used for feature extraction. See the Github Wiki for the possible
+ fields and their description.
+
+ # TODO: outputs
+
+ verbose: boolean, default True
+ print final feature values and labels to command line or not.
+
+ '''
+ # Load variables from the config file
+ config=config_io.load_config(config)
+
+ iftype(patientinfo)islist:
+ patientinfo=''.join(patientinfo)
+
+ iftype(config)islist:
+ config=''.join(config)
+
+ iftype(output)islist:
+ output=''.join(output)
+
+ # Create output folder if required
+ ifnotos.path.exists(os.path.dirname(output)):
+ os.makedirs(os.path.dirname(output))
+
+ iflabel_typeisNone:
+ label_type=config['Labels']['label_names']
+
+ # Read the features and classification data
+ print("Reading features and label data.")
+ label_data,image_features=\
+ load_features(features,patientinfo,label_type)
+
+ # Extract feature labels and put values in an array
+ feature_labels=image_features[0][1]
+ feature_values=np.zeros([len(image_features),len(feature_labels)])
+ fornum,xinenumerate(image_features):
+ feature_values[num,:]=x[0]
+
+ # -----------------------------------------------------------------------
+ # Perform statistical tests
+ print("Performing statistical tests.")
+ label_value=label_data['label']
+ label_name=label_data['label_name']
+
+ header=list()
+ subheader=list()
+ fori_nameinlabel_name:
+ header.append(str(i_name[0]))
+ header.append('')
+ header.append('')
+ header.append('')
+ header.append('')
+ header.append('')
+
+ subheader.append('Label')
+ subheader.append('Ttest')
+ subheader.append('Welch')
+ subheader.append('Wilcoxon')
+ subheader.append('Mann-Whitney')
+ subheader.append('')
+
+ # Open the output file
+ ifoutputisnotNone:
+ myfile=open(output,'w')
+ wr=csv.writer(myfile,quoting=csv.QUOTE_ALL)
+ wr.writerow(header)
+ wr.writerow(subheader)
+
+ savedict=dict()
+ fori_class,i_nameinzip(label_value,label_name):
+ savedict[i_name[0]]=dict()
+ pvalues=list()
+ pvalueswelch=list()
+ pvalueswil=list()
+ pvaluesmw=list()
+
+ fornum,flinenumerate(feature_labels):
+ fv=feature_values[:,num]
+ classlabels=i_class.ravel()
+
+ class1=[iforj,iinenumerate(fv)ifclasslabels[j]==1]
+ class2=[iforj,iinenumerate(fv)ifclasslabels[j]==0]
+
+ pvalues.append(ttest_ind(class1,class2)[1])
+ pvalueswelch.append(ttest_ind(class1,class2,equal_var=False)[1])
+ pvalueswil.append(ranksums(class1,class2)[1])
+ try:
+ pvaluesmw.append(mannwhitneyu(class1,class2)[1])
+ exceptValueErrorase:
+ print("[PREDICT Warning] "+str(e)+'. Replacing metric value by 1.')
+ pvaluesmw.append(1)
+
+ # Sort based on p-values:
+ indices=np.argsort(np.asarray(pvaluesmw))
+ feature_labels_o=np.asarray(feature_labels)[indices].tolist()
+
+ pvalues=np.asarray(pvalues)[indices].tolist()
+ pvalueswelch=np.asarray(pvalueswelch)[indices].tolist()
+ pvalueswil=np.asarray(pvalueswil)[indices].tolist()
+ pvaluesmw=np.asarray(pvaluesmw)[indices].tolist()
+
+ savedict[i_name[0]]['ttest']=pvalues
+ savedict[i_name[0]]['welch']=pvalueswelch
+ savedict[i_name[0]]['wil']=pvalueswil
+ savedict[i_name[0]]['mw']=pvaluesmw
+ savedict[i_name[0]]['labels']=feature_labels_o
+
+ ifoutputisnotNone:
+ fornuminrange(0,len(savedict[i_name[0]]['ttest'])):
+ writelist=list()
+ fori_nameinsavedict.keys():
+ labeldict=savedict[i_name]
+ writelist.append(labeldict['labels'][num])
+ writelist.append(labeldict['ttest'][num])
+ writelist.append(labeldict['welch'][num])
+ writelist.append(labeldict['wil'][num])
+ writelist.append(labeldict['mw'][num])
+ writelist.append('')
+
+ wr.writerow(writelist)
+
+ print("Saved data!")
+
+ returnsavedict
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/WORC/doc/_build/html/_modules/WORC/featureprocessing/StatisticalTestThreshold.html b/WORC/doc/_build/html/_modules/WORC/featureprocessing/StatisticalTestThreshold.html
index eceeb535..81be8cfa 100644
--- a/WORC/doc/_build/html/_modules/WORC/featureprocessing/StatisticalTestThreshold.html
+++ b/WORC/doc/_build/html/_modules/WORC/featureprocessing/StatisticalTestThreshold.html
@@ -8,7 +8,7 @@
- WORC.featureprocessing.StatisticalTestThreshold — WORC 3.0.0 documentation
+ WORC.featureprocessing.StatisticalTestThreshold — WORC 3.1.0 documentation
@@ -59,7 +59,7 @@
- 3.0.0
+ 3.1.0
@@ -229,17 +229,40 @@
Source code for WORC.featureprocessing.StatisticalTestThreshold
self.parameters={}# Perform the statistical test for each feature
+ multilabel=type(Y_train[0])isnp.ndarrayforn_featinrange(0,X_train.shape[1]):
- fv=X_train[:,n_feat]
-
- class1=[iforj,iinenumerate(fv)ifY_train[j]==1]
- class2=[iforj,iinenumerate(fv)ifY_train[j]==0]
+ # Select only this specific feature for all objects
- try:
- metric_value=self.metric_function(class1,class2,**self.parameters)[1]
- exceptValueErrorase:
- print("[PREDICT Warning] "+str(e)+'. Replacing metric value by 1.')
- metric_value=1
+ fv=X_train[:,n_feat]
+ ifmultilabel:
+ # print('Multilabel: take minimum p-value for all label tests.')
+ # We do a statistical test per label and take the minimum p-value
+ n_label=Y_train[0].shape[0]
+ metric_values=list()
+ foriinrange(n_label):
+ class1=[iforj,iinenumerate(fv)ifnp.argmax(Y_train[j])==n_label]
+ class2=[iforj,iinenumerate(fv)ifnp.argmax(Y_train[j])!=n_label]
+
+ try:
+ metric_value_temp=self.metric_function(class1,class2,**self.parameters)[1]
+ exceptValueErrorase:
+ print("[PREDICT Warning] "+str(e)+'. Replacing metric value by 1.')
+ metric_value_temp
+
+ metric_values.append(metric_value_temp)
+
+ metric_value=np.min(metric_values)
+
+ else:
+ # Singlelabel
+ class1=[iforj,iinenumerate(fv)ifY_train[j]==1]
+ class2=[iforj,iinenumerate(fv)ifY_train[j]==0]
+
+ try:
+ metric_value=self.metric_function(class1,class2,**self.parameters)[1]
+ exceptValueErrorase:
+ print("[PREDICT Warning] "+str(e)+'. Replacing metric value by 1.')
+ metric_value=1self.metric_values.append(metric_value)ifmetric_value<self.threshold:
diff --git a/WORC/doc/_build/html/_modules/WORC/featureprocessing/VarianceThreshold.html b/WORC/doc/_build/html/_modules/WORC/featureprocessing/VarianceThreshold.html
index 739c8860..b450b760 100644
--- a/WORC/doc/_build/html/_modules/WORC/featureprocessing/VarianceThreshold.html
+++ b/WORC/doc/_build/html/_modules/WORC/featureprocessing/VarianceThreshold.html
@@ -8,7 +8,7 @@
- WORC.featureprocessing.VarianceThreshold — WORC 3.0.0 documentation
+ WORC.featureprocessing.VarianceThreshold — WORC 3.1.0 documentation
@@ -59,7 +59,7 @@
[docs]defcompute_confidence_bootstrap(bootstrap_metric,test_metric,N_1,alpha=0.95):
+ """
+ Function to calculate confidence interval for bootstrapped samples.
+ metric: numpy array containing the result for a metric for the different bootstrap iterations
+ test_metric: the value of the metric evaluated on the true, full test set
+ alpha: float ranging from 0 to 1 to calculate the alpha*100% CI, default 0.95
+ """
+ metric_std=np.std(bootstrap_metric)
+ CI=st.norm.interval(alpha,loc=test_metric,scale=metric_std)
+ returnCI
+
+
[docs]defcompute_confidence(metric,N_train,N_test,alpha=0.95):"""
- Function to calculate the adjusted confidence interval
+ Function to calculate the adjusted confidence interval for cross-validation. metric: numpy array containing the result for a metric for the different cross validations (e.g. If 20 cross-validations are performed it is a list of length 20 with the calculated accuracy for each cross validation) N_train: Integer, number of training samples N_test: Integer, number of test_samples
- alpha: float ranging from 0 to 1 to calculate the alpha*100% CI, default 95%
+ alpha: float ranging from 0 to 1 to calculate the alpha*100% CI, default 0.95 """
+ # Remove NaN values if they are there
+ ifnp.isnan(metric).any():
+ print('[WORC Warning] Array contains nan: removing.')
+ metric=np.asarray(metric)
+ metric=metric[np.logical_not(np.isnan(metric))]
+
# Convert to floats, as python 2 rounds the divisions if we have integersN_train=float(N_train)N_test=float(N_test)
diff --git a/WORC/doc/_build/html/_modules/WORC/plotting/linstretch.html b/WORC/doc/_build/html/_modules/WORC/plotting/linstretch.html
index bd7a7833..5acffb1b 100644
--- a/WORC/doc/_build/html/_modules/WORC/plotting/linstretch.html
+++ b/WORC/doc/_build/html/_modules/WORC/plotting/linstretch.html
@@ -8,7 +8,7 @@
- WORC.plotting.linstretch — WORC 3.0.0 documentation
+ WORC.plotting.linstretch — WORC 3.1.0 documentation
@@ -59,7 +59,7 @@
[docs]deffit_thresholds(thresholds,estimator,X_train,Y_train,ensemble,ensemble_scoring):
+ print('Fitting thresholds on validation set')
+ ifnothasattr(estimator,'cv_iter'):
+ cv_iter=list(estimator.cv.split(X_train,Y_train))
+ estimator.cv_iter=cv_iter
+
+ p_est=estimator.cv_results_['params'][0]
+ p_all=estimator.cv_results_['params_all'][0]
+ n_iter=len(estimator.cv_iter)
+
+ thresholds_low=list()
+ thresholds_high=list()
+ forit,(train,valid)inenumerate(estimator.cv_iter):
+ print(' - iteration {it + 1} / {n_iter}.')
+ # NOTE: Explicitly exclude validation set, elso refit and score
+ # somehow still seems to use it.
+ X_train_temp=[X_train[i]foriintrain]
+ Y_train_temp=[Y_train[i]foriintrain]
+ train_temp=range(0,len(train))
+
+ # Refit a SearchCV object with the provided parameters
+ ifensemble:
+ estimator.create_ensemble(X_train_temp,Y_train_temp,
+ method=ensemble,verbose=False,
+ scoring=ensemble_scoring)
+ else:
+ estimator.refit_and_score(X_train_temp,Y_train_temp,p_all,
+ p_est,train_temp,train_temp,
+ verbose=False)
+
+ # Predict and save scores
+ X_train_values=[x[0]forxinX_train]# Throw away labels
+ X_train_values_valid=[X_train_values[i]foriinvalid]
+ Y_valid_score_temp=estimator.predict_proba(X_train_values_valid)
+
+ # Only take the probabilities for the second class
+ Y_valid_score_temp=Y_valid_score_temp[:,1]
+
+ # Select thresholds
+ thresholds_low.append(np.percentile(Y_valid_score_temp,thresholds[0]*100.0))
+ thresholds_high.append(np.percentile(Y_valid_score_temp,thresholds[1]*100.0))
+
+ thresholds_val=[np.mean(thresholds_low),np.mean(thresholds_high)]
+ print(f'Thresholds {thresholds} converted to {thresholds_val}.')
+ returnthresholds_val
[docs]defplot_SVM(prediction,label_data,label_type,show_plots=False,alpha=0.95,ensemble=False,verbose=True,ensemble_scoring=None,output='stats',
- modus='singlelabel'):
+ modus='singlelabel',
+ thresholds=None,survival=False,
+ generalization=False,shuffle_estimators=False,
+ bootstrap=False,bootstrap_N=1000):''' Plot the output of a single binary estimator, e.g. a SVM.
@@ -231,6 +284,11 @@
Source code for WORC.plotting.plot_SVM
Determine which results are put out. If stats, the statistics of the estimator will be returned. If scores, the scores will be returned.
+ thresholds: list of integer(s), default None
+ If None, use default threshold of sklearn (0.5) on posteriors to
+ converge to a binary prediction. If one integer is provided, use that one.
+ If two integers are provided, posterior < thresh[0] = 0, posterior > thresh[1] = 1.
+
Returns ---------- Depending on the output parameters, the following outputs are returned:
@@ -251,7 +309,7 @@
Source code for WORC.plotting.plot_SVM
y_predictions: list Contains the predicted label for each object.
- PIDs: list
+ pids: list Contains the patient ID/name for each object. '''
@@ -276,12 +334,14 @@
Source code for WORC.plotting.plot_SVM
label_type=[[label_type]]label_data=lp.load_labels(label_data,label_type)
+ n_labels=len(label_type)patient_IDs=label_data['patient_IDs']labels=label_data['label']iftype(label_type)islist:# FIXME: Support for multiple label types not supported yet.print('[WORC Warning] Support for multiple label types not supported yet. Taking first label for plot_SVM.')
+ original_label_type=label_type[:]label_type=keys[0]# Extract the estimators, features and labels
@@ -294,35 +354,98 @@
Source code for WORC.plotting.plot_SVM
feature_labels=prediction[label_type]['feature_labels']# Create lists for performance measures
- sensitivity=list()
- specificity=list()
- precision=list()
- accuracy=list()
- auc=list()
- f1_score_list=list()
+ ifnotregression:
+ sensitivity=list()
+ specificity=list()
+ precision=list()
+ npv=list()
+ accuracy=list()
+ bca=list()
+ auc=list()
+ f1_score_list=list()
+
+ ifmodus=='multilabel':
+ acc_av=list()
+
+ # Also add scoring measures for all single label scores
+ sensitivity_single=list()
+ specificity_single=list()
+ precision_single=list()
+ npv_single=list()
+ accuracy_single=list()
+ bca_single=list()
+ auc_single=list()
+ f1_score_list_single=list()
+
+ else:
+ r2score=list()
+ MSE=list()
+ coefICC=list()
+ PearsonC=list()
+ PearsonP=list()
+ SpearmanC=list()
+ SpearmanP=list()
+ cindex=list()
+ coxcoef=list()
+ coxp=list()
+
patient_classification_list=dict()
+ percentages_selected=list()
+
ifoutputin['scores','decision']:# Keep track of all groundth truths and scoresy_truths=list()y_scores=list()y_predictions=list()
- PIDs=list()
+ pids=list()
- # Loop over the test sets, which probably correspond with cross validation
- # iterations
- foriinrange(0,len(Y_test)):
+ # Loop over the test sets, which correspond to cross-validation
+ # or bootstrapping iterations
+ ifbootstrap:
+ iterobject=range(0,bootstrap_N)
+ else:
+ iterobject=range(0,len(Y_test))
+
+ foriiniterobject:print("\n")
- print(("Cross validation {} / {}.").format(str(i+1),str(len(Y_test))))
- test_patient_IDs=prediction[label_type]['patient_ID_test'][i]
- train_patient_IDs=prediction[label_type]['patient_ID_train'][i]
- X_test_temp=X_test[i]
- X_train_temp=X_train[i]
- Y_train_temp=Y_train[i]
- Y_test_temp=Y_test[i]
+ ifbootstrap:
+ print(f"Bootstrap {i + 1} / {bootstrap_N}.")
+ else:
+ print(f"Cross validation {i + 1} / {len(Y_test)}.")
+
test_indices=list()
+ # When bootstrapping, there is only a single train/test set.
+ ifbootstrap:
+ X_test_temp=X_test[0]
+ X_train_temp=X_train[0]
+ Y_train_temp=Y_train[0]
+ Y_test_temp=Y_test[0]
+ test_patient_IDs=prediction[label_type]['patient_ID_test'][0]
+ train_patient_IDs=prediction[label_type]['patient_ID_train'][0]
+ fitted_model=SVMs[0]
+ else:
+ X_test_temp=X_test[i]
+ X_train_temp=X_train[i]
+ Y_train_temp=Y_train[i]
+ Y_test_temp=Y_test[i]
+ test_patient_IDs=prediction[label_type]['patient_ID_test'][i]
+ train_patient_IDs=prediction[label_type]['patient_ID_train'][i]
+ fitted_model=SVMs[i]
+
+ # If bootstrap, generate a bootstrapped sample
+ ifbootstrap:
+ X_test_temp,Y_test_temp,test_patient_IDs=resample(X_test_temp,Y_test_temp,test_patient_IDs)
+
# Check which patients are in the test set.fori_IDintest_patient_IDs:
+ ifi_IDnotinpatient_IDs:
+ print(f'[WORC WARNING] Patient {i_ID} is not found the patient labels, removing underscore.')
+ i_ID=np.where(patient_IDs==i_ID.split("_")[0])
+ ifi_IDnotinpatient_IDs:
+ print(f'[WORC WARNING] Did not help, excluding patient {i_ID}.')
+ continue
+
test_indices.append(np.where(patient_IDs==i_ID)[0][0])# Initiate counting how many times a patient is classified correctly
@@ -337,26 +460,81 @@
Source code for WORC.plotting.plot_SVM
# Extract ground truthy_truth=Y_test_temp
+ # If required, shuffle estimators for "Random" ensembling
+ ifshuffle_estimators:
+ # Compute generalization score
+ print('Shuffling estimators for random ensembling.')
+ shuffle(fitted_model.cv_results_['params'])
+ shuffle(fitted_model.cv_results_['params_all'])
+
+ # If required, rank according to generalization score instead of mean_validation_score
+ ifgeneralization:
+ # Compute generalization score
+ print('Using generalization score for estimator ranking.')
+ difference_score=abs(fitted_model.cv_results_['mean_train_score']-fitted_model.cv_results_['mean_test_score'])
+ generalization_score=fitted_model.cv_results_['mean_test_score']-difference_score
+
+ # Rerank based on score
+ indices=np.argsort(generalization_score)
+ fitted_model.cv_results_['params']=[fitted_model.cv_results_['params'][i]foriinindices[::-1]]
+ fitted_model.cv_results_['params_all']=[fitted_model.cv_results_['params_all'][i]foriinindices[::-1]]
+
# If requested, first let the SearchCV object create an ensemble
- ifensemble:
+ ifbootstrapandi>0:
+ # For bootstrapping, only do this at the first iteration
+ pass
+ elifensemble>1:# NOTE: Added for backwards compatability
- ifnothasattr(SVMs[i],'cv_iter'):
- cv_iter=list(SVMs[i].cv.split(X_train_temp,Y_train_temp))
- SVMs[i].cv_iter=cv_iter
+ ifnothasattr(fitted_model,'cv_iter'):
+ cv_iter=list(fitted_model.cv.split(X_train_temp,Y_train_temp))
+ fitted_model.cv_iter=cv_iter# Create the ensembleX_train_temp=[(x,feature_labels)forxinX_train_temp]
- SVMs[i].create_ensemble(X_train_temp,Y_train_temp,
+ fitted_model.create_ensemble(X_train_temp,Y_train_temp,method=ensemble,verbose=verbose,scoring=ensemble_scoring)# Create prediction
- y_prediction=SVMs[i].predict(X_test_temp)
+ y_prediction=fitted_model.predict(X_test_temp)ifregression:y_score=y_prediction
+ elifmodus=='multilabel':
+ y_score=fitted_model.predict_proba(X_test_temp)else:
- y_score=SVMs[i].predict_proba(X_test_temp)[:,1]
+ y_score=fitted_model.predict_proba(X_test_temp)[:,1]
+
+ # Create a new binary score based on the thresholds if given
+ ifthresholdsisnotNone:
+ iflen(thresholds)==1:
+ y_prediction=y_score>=thresholds[0]
+ eliflen(thresholds)==2:
+ # X_train_temp = [x[0] for x in X_train_temp]
+
+ y_score_temp=list()
+ y_prediction_temp=list()
+ y_truth_temp=list()
+ test_patient_IDs_temp=list()
+
+ thresholds_val=fit_thresholds(thresholds,fitted_model,X_train_temp,Y_train_temp,ensemble,
+ ensemble_scoring)
+ forpnuminrange(len(y_score)):
+ ify_score[pnum]<=thresholds_val[0]ory_score[pnum]>thresholds_val[1]:
+ y_score_temp.append(y_score[pnum])
+ y_prediction_temp.append(y_prediction[pnum])
+ y_truth_temp.append(y_truth[pnum])
+ test_patient_IDs_temp.append(test_patient_IDs[pnum])
+
+ perc=float(len(y_prediction_temp))/float(len(y_prediction))
+ percentages_selected.append(perc)
+ print(f"Selected {len(y_prediction_temp)} from {len(y_prediction)} ({perc}%) patients using two thresholds.")
+ y_score=y_score_temp
+ y_prediction=y_prediction_temp
+ y_truth=y_truth_temp
+ test_patient_IDs=test_patient_IDs_temp
+ else:
+ raiseae.WORCValueError(f"Need None, one or two thresholds on the posterior; got {len(thresholds)}.")print("Truth: "+str(y_truth))print("Prediction: "+str(y_prediction))
@@ -373,21 +551,19 @@
[docs]defcombine_multiple_estimators(predictions,label_data,multilabel_type,label_types,
+ ensemble=1,strategy='argmax',alpha=0.95):
+ '''
+ Combine multiple estimators in a single model.
- print("F1-score 95%:"+str(compute_CI.compute_confidence(f1_score_list,N_1,N_2,alpha)))
+ Note: the multilabel_type labels should correspond to the ordering in label_types.
+ Hence, if multilabel_type = 0, the prediction is label_type[0] etc.
+ '''
- print("Precision 95%:"+str(compute_CI.compute_confidence(precision,N_1,N_2,alpha)))
+ # Load the multilabel label data
+ label_data=lp.load_labels(label_data,multilabel_type)
+ patient_IDs=label_data['patient_IDs']
+ labels=label_data['label']
- print("Sensitivity 95%: "+str(compute_CI.compute_confidence(sensitivity,N_1,N_2,alpha)))
+ # Initialize some objects
+ y_truths=list()
+ y_scores=list()
+ y_predictions=list()
+ pids=list()
- print("Specificity 95%:"+str(compute_CI.compute_confidence(specificity,N_1,N_2,alpha)))
+ y_truths_train=list()
+ y_scores_train=list()
+ y_predictions_train=list()
+ pids_train=list()
- # Extract statistics on how often patients got classified correctly
- alwaysright=dict()
- alwayswrong=dict()
- percentages=dict()
- fori_IDinpatient_classification_list:
- percentage_right=patient_classification_list[i_ID]['N_correct']/float(patient_classification_list[i_ID]['N_test'])
+ accuracy=list()
+ sensitivity=list()
+ specificity=list()
+ auc=list()
+ f1_score_list=list()
+ precision=list()
+ npv=list()
+ acc_av=list()
+
+ # Extract all the predictions from the estimators
+ forprediction,label_typeinzip(predictions,label_types):
+ y_truth,y_score,y_prediction,pid,\
+ y_truth_train,y_score_train,y_prediction_train,pid_train=\
+ plot_SVM(prediction,label_data,label_type,
+ ensemble=ensemble,output='allscores')
+ y_truths.append(y_truth)
+ y_scores.append(y_score)
+ y_predictions.append(y_prediction)
+ pids.append(pid)
+
+ y_truths_train.append(y_truth_train)
+ y_scores_train.append(y_score_train)
+ y_predictions_train.append(y_prediction_train)
+ pids_train.append(pid_train)
+
+ # Combine the predictions
+ fori_crossvalinrange(0,len(y_truths[0])):
+ # Extract all values for this cross validation iteration from all objects
+ y_truth=[t[i_crossval]fortiny_truths]
+ y_score=[t[i_crossval]fortiny_scores]
+ pid=[t[i_crossval]fortinpids]
+
+ ifstrategy=='argmax':
+ # For each patient, take the maximum posterior
+ y_prediction=np.argmax(y_score,axis=0)
+ y_score=np.max(y_score,axis=0)
+ elifstrategy=='decisiontree':
+ # Fit a decision tree on the training set
+ a=1
+ else:
+ raiseae.WORCValueError(f"{strategy} is not a valid estimation combining strategy! Should be one of [argmax].")
- ifi_IDinpatient_IDs:
- label=labels[0][np.where(i_ID==patient_IDs)]
- else:
- # Multiple instance of one patient
- label=labels[0][np.where(i_ID.split('_')[0]==patient_IDs)]
-
- label=label[0][0]
- percentages[i_ID]=str(label)+': '+str(round(percentage_right,2)*100)+'%'
- ifpercentage_right==1.0:
- alwaysright[i_ID]=label
- print(("Always Right: {}, label {}").format(i_ID,label))
-
- elifpercentage_right==0:
- alwayswrong[i_ID]=label
- print(("Always Wrong: {}, label {}").format(i_ID,label))
-
- stats["Always right"]=alwaysright
- stats["Always wrong"]=alwayswrong
- stats['Percentages']=percentages
-
- ifshow_plots:
- # Plot some characteristics in boxplots
- importmatplotlib.pyplotasplt
-
- plt.figure()
- plt.boxplot(accuracy)
- plt.ylim([-0.05,1.05])
- plt.ylabel('Accuracy')
- plt.tick_params(
- axis='x',# changes apply to the x-axis
- which='both',# both major and minor ticks are affected
- bottom='off',# ticks along the bottom edge are off
- top='off',# ticks along the top edge are off
- labelbottom='off')# labels along the bottom edge are off
- plt.tight_layout()
- plt.show()
-
- plt.figure()
- plt.boxplot(auc)
- plt.ylim([-0.05,1.05])
- plt.ylabel('AUC')
- plt.tick_params(
- axis='x',# changes apply to the x-axis
- which='both',# both major and minor ticks are affected
- bottom='off',# ticks along the bottom edge are off
- top='off',# ticks along the top edge are off
- labelbottom='off')# labels along the bottom edge are off
- plt.tight_layout()
- plt.show()
-
- plt.figure()
- plt.boxplot(precision)
- plt.ylim([-0.05,1.05])
- plt.ylabel('Precision')
- plt.tick_params(
- axis='x',# changes apply to the x-axis
- which='both',# both major and minor ticks are affected
- bottom='off',# ticks along the bottom edge are off
- top='off',# ticks along the top edge are off
- labelbottom='off')# labels along the bottom edge are off
- plt.tight_layout()
- plt.show()
-
- plt.figure()
- plt.boxplot(sensitivity)
- plt.ylim([-0.05,1.05])
- plt.ylabel('Sensitivity')
- plt.tick_params(
- axis='x',# changes apply to the x-axis
- which='both',# both major and minor ticks are affected
- bottom='off',# ticks along the bottom edge are off
- top='off',# ticks along the top edge are off
- labelbottom='off')# labels along the bottom edge are off
- plt.tight_layout()
- plt.show()
-
- plt.figure()
- plt.boxplot(specificity)
- plt.ylim([-0.05,1.05])
- plt.ylabel('Specificity')
- plt.tick_params(
- axis='x',# changes apply to the x-axis
- which='both',# both major and minor ticks are affected
- bottom='off',# ticks along the bottom edge are off
- top='off',# ticks along the top edge are off
- labelbottom='off')# labels along the bottom edge are off
- plt.tight_layout()
- plt.show()
+ # Compute multilabel performance metrics
+ y_truth=np.argmax(y_truth,axis=0)
+ accuracy_temp,sensitivity_temp,specificity_temp, \
+ precision_temp,npv_temp,f1_score_temp,auc_temp,accav_temp= \
+ metrics.performance_multilabel(y_truth,
+ y_prediction,
+ y_score)
- returnstats
# Save the outputifoutput_texisnotNone:
- print('Saving barchart to {}.').format(output_tex)
+ print(f'Saving barchart to {output_tex}.')tikz_save(output_tex)ifoutput_pngisnotNone:
- print('Saving barchart to {}.').format(output_png)
+ print(f'Saving barchart to {output_png}.')fig.savefig(output_png,bbox_inches='tight',pad_inches=0,dpi=50)
@@ -358,7 +358,7 @@
Source code for WORC.plotting.plot_barchart
[docs]defcount_parameters(parameters):# Count for every parameter how many times a setting occursoutput=dict()
- forsetting,valuesinparameters.iteritems():
+ forsetting,valuesinparameters.items():output[setting]=dict()c=Counter(values)fork,vinzip(c.keys(),c.values()):
diff --git a/WORC/doc/_build/html/_modules/WORC/plotting/plot_boxplot.html b/WORC/doc/_build/html/_modules/WORC/plotting/plot_boxplot.html
index 7cff8a7c..b142cf03 100644
--- a/WORC/doc/_build/html/_modules/WORC/plotting/plot_boxplot.html
+++ b/WORC/doc/_build/html/_modules/WORC/plotting/plot_boxplot.html
@@ -8,7 +8,7 @@
- WORC.plotting.plot_boxplot — WORC 3.0.0 documentation
+ WORC.plotting.plot_boxplot — WORC 3.1.0 documentation
@@ -59,7 +59,7 @@
[docs]defslicer(image,mask,output_name,output_name_zoom=None,thresholds=[-240,160],
+ zoomfactor=4,dpi=500,normalize=False,expand=False):''' image and mask should both be arrays '''
@@ -193,8 +193,6 @@
# Save some memorydelfig
- # Create a bounding box and save zoomed image
- imslice,maskslice=bbox_2D(imslice,maskslice,padding=[20,20])
- imsize=[float(imslice.shape[0]),float(imslice.shape[1])]
+ ifoutput_name_zoomisnotNone:
+ # Create a bounding box and save zoomed image
+ imslice,maskslice=bbox_2D(imslice,maskslice,padding=[20,20])
+ imsize=[float(imslice.shape[0]),float(imslice.shape[1])]
- # NOTE: As these zoomed images get small, we double the spacing
- spacing=spacing*zoomfactor
- figsize=(imsize[0]*spacing/100.0,imsize[1]*spacing/100.0)
- fig=plot_im_and_overlay(imslice,maskslice,figsize=figsize)
- fig.savefig(output_name_zoom,bbox_inches='tight',pad_inches=0,dpi=dpi)
- plt.close('all')
+ # NOTE: As these zoomed images get small, we double the spacing
+ spacing=spacing*zoomfactor
+ figsize=(imsize[0]*spacing/100.0,imsize[1]*spacing/100.0)
+ fig=plot_im_and_overlay(imslice,maskslice,figsize=figsize)
+ fig.savefig(output_name_zoom,bbox_inches='tight',pad_inches=0,dpi=dpi)
+ plt.close('all')
+
+ # Save some memory
+ delfig,image,mask
- # Save some memory
- delfig,image,maskreturnimslice,maskslice
+#!/usr/bin/env python
+
+# Copyright 2016-2019 Biomedical Imaging Group Rotterdam, Departments of
+# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+try:
+ importmatplotlib.pyplotasplt
+exceptImportError:
+ print("[WORC Warning] Cannot use scatterplot function, as _tkinter is not installed")
+
+importpandasaspd
+importargparse
+importWORC.processing.label_processingaslp
+importos
+importglob
+fromnatsortimportnatsorted
+
+
+
self.link_param.collapse='par'# Create Sinks
- self.outtrans=self.network.create_sink('ElastixTransformFile',id_='sink_trans')
- self.outimage=self.network.create_sink('ITKImageFile',id_='sink_image')
- self.outseg=self.network.create_sink('ITKImageFile',id_='sink_seg')
+ self.outtrans=self.network.create_sink('ElastixTransformFile',id='sink_trans')
+ self.outimage=self.network.create_sink('ITKImageFile',id='sink_image')
+ self.outseg=self.network.create_sink('ITKImageFile',id='sink_seg')self.outtrans.inputs['input']=self.elastix_node.outputs['transform']# Transform output image
- self.transformix_node=self.network.create_node(self.transformix_toolname,id_='transformix')
+ self.transformix_node=self.network.create_node('self.transformix_toolname',tool_version='unknown',id='transformix')self.transformix_node.inputs['image']=self.MovingImageSource.outputself.transformix_node.inputs['transform']=self.elastix_node.outputs['transform'][-1]self.outimage.inputs['input']=self.transformix_node.outputs['image']# First change the FinalBSplineInterpolationOrder to 0 for the segmentation
- self.changeorder_node=self.network.create_node('EditElastixTransformFile',id_='editelpara')
+ self.changeorder_node=self.network.create_node('elastixtools/EditElastixTransformFile:0.1',tool_version='0.1',id='editelpara')self.link_trans=self.network.create_link(self.elastix_node.outputs['transform'][-1],self.changeorder_node.inputs['transform'])# self.link_trans.converge = 0# self.link_trans.collapse = 'FixedImage'# self.link_trans.expand = True# Co[y metadata from image to segmentation as Elastix uses this
- self.copymetadata_node=self.network.create_node('CopyMetadata',id_='copymetadata')
+ self.copymetadata_node=self.network.create_node('itktools/0.3.2/CopyMetadata:1.0',tool_version='1.0',id='copymetadata')self.copymetadata_node.inputs['source']=self.MovingImageSource.outputself.copymetadata_node.inputs['destination']=self.ToTransformSource.output# Then transform the segmentation
- self.transformix_node_seg=self.network.create_node(self.transformix_toolname,id_='transformix_seg')
+ self.transformix_node_seg=self.network.create_node('self.transformix_toolname',tool_version='unknown',id='transformix_seg')self.transformix_node_seg.inputs['image']=self.copymetadata_node.outputs['output']self.transformix_node_seg.inputs['transform']=self.changeorder_node.outputs['transform'][-1]self.outseg.inputs['input']=self.transformix_node_seg.outputs['image']else:# Create the network
- self.network=fastr.Network(id_="elastix_group")
+ self.network=fastr.create_network(id="elastix_group")# Create Sources
- self.FixedImageSource=self.network.create_source('ITKImageFile',id_='FixedImage')
- self.FixedMaskSource=self.network.create_source('ITKImageFile',id_='FixedMask')
- self.ToTransformSource=self.network.create_source('ITKImageFile',id_='ToTransform')
- self.ParameterMapSource=self.network.create_source('ElastixParameterFile',id_='ParameterMaps',nodegroup='par')
+ self.FixedImageSource=self.network.create_source('ITKImageFile',id='FixedImage')
+ self.FixedMaskSource=self.network.create_source('ITKImageFile',id='FixedMask')
+ self.ToTransformSource=self.network.create_source('ITKImageFile',id='ToTransform')
+ self.ParameterMapSource=self.network.create_source('ElastixParameterFile',id='ParameterMaps',node_group='par')# Elastix requires the output folder as a sink# self.OutputFolderSource = self.network.create_sink('Directory', id_='Out')# Create Elastix node and links
- self.elastix_node=self.network.create_node(self.elastix_toolname,id_='elastix')
+ self.elastix_node=self.network.create_node('self.elastix_toolname',tool_version='unknown',id='elastix')self.elastix_node.inputs['fixed_image']=self.FixedImageSource.outputself.elastix_node.inputs['fixed_mask']=self.FixedMaskSource.outputself.elastix_node.inputs['moving_image']=self.FixedImageSource.output
@@ -319,19 +320,19 @@
Source code for WORC.tools.Elastix
self.link_param.collapse='par'# Create Sinks
- self.outtrans=self.network.create_sink('ElastixTransformFile',id_='sink_trans')
- self.outimage=self.network.create_sink('ITKImageFile',id_='sink_image')
- self.outseg=self.network.create_sink('ITKImageFile',id_='sink_seg')
+ self.outtrans=self.network.create_sink('ElastixTransformFile',id='sink_trans')
+ self.outimage=self.network.create_sink('ITKImageFile',id='sink_image')
+ self.outseg=self.network.create_sink('ITKImageFile',id='sink_seg')self.outtrans.inputs['input']=self.elastix_node.outputs['transform']# Transform output image
- self.transformix_node=self.network.create_node(self.transformix_toolname,id_='transformix')
+ self.transformix_node=self.network.create_node('self.transformix_toolname',tool_version='unknown',id='transformix')self.transformix_node.inputs['image']=self.MovingImageSource.outputself.transformix_node.inputs['transform']=self.elastix_node.outputs['transform'][-1]self.outimage.inputs['input']=self.transformix_node.outputs['image']# First change the FinalBSplineInterpolationOrder to 0 for the segmentation
- self.changeorder_node=self.network.create_node('EditElastixTransformFile',id_='editelpara')
+ self.changeorder_node=self.network.create_node('elastixtools/EditElastixTransformFile:0.1',tool_version='0.1',id='editelpara')self.changeorder_node.inputs['set']=["FinalBSplineInterpolationOrder=0"]self.link_trans=self.network.create_link(self.elastix_node.outputs['transform'],self.changeorder_node.inputs['transform'][-1])# self.link_trans.converge = 0
@@ -339,12 +340,12 @@
Source code for WORC.tools.Elastix
# self.link_trans.expand = True# Co[y metadata from image to segmentation as Elastix uses this
- self.copymetadata_node=self.network.create_node('CopyMetadata',id_='copymetadata')
+ self.copymetadata_node=self.network.create_node('itktools/0.3.2/CopyMetadata:1.0',tool_version='1.0',id='copymetadata')self.copymetadata_node.inputs['source']=self.MovingImageSource.outputself.copymetadata_node.inputs['destination']=self.ToTransformSource.output# Then transform the segmentation
- self.transformix_node_seg=self.network.create_node(self.transformix_toolname,id_='transformix_seg')
+ self.transformix_node_seg=self.network.create_node('self.transformix_toolname',tool_version='unknown',id='transformix_seg')self.transformix_node_seg.inputs['image']=self.copymetadata_node.outputs['output']self.transformix_node_seg.inputs['transform']=self.changeorder_node.outputs['transform'][-1]self.outseg.inputs['input']=self.transformix_node_seg.outputs['image']
@@ -463,7 +464,7 @@
Source code for WORC.tools.Elastix
# print self.sink_data['Out']# Execute the network
- self.network.draw_network('WORC_Elastix',img_format='svg',draw_dimension=True)
+ self.network.draw(file_path='WORC_Elastix.svg',img_format='svg')self.network.dumpf('{}.json'.format(self.network.id),indent=2)self.network.execute(self.source_data,self.sink_data,tmpdir=self.fastr_tmpdir)
importWORC.addexceptionsasWORCexceptionsimportfastr
+fromfastr.apiimportResourceLimitimportos
+importgraphviz# NOTE: Very important to give images and segmentations as dict with patient names!
[docs]def__init__(self,label_type,ensemble=50,scores='percentages',
- network=None,features=None,
- fastr_plugin='ProcessPoolExecution',
+ parent=None,features=None,
+ fastr_plugin='LinearExecution',name='Example'):''' Build a network that evaluates the performance of an estimator.
@@ -196,23 +198,26 @@
Source code for WORC.tools.Evaluate
to the existing network. '''
- ifnetworkisnotNone:
- self.network=network
+ ifparentisnotNone:
+ self.parent=parent
+ self.network=parent.networkself.mode='WORC'
+ self.name=parent.network.id
+ self.ensemble=parent.configs[0]['Ensemble']['Use']else:self.mode='StandAlone'self.fastr_plugin=fastr_pluginself.name='WORC_Evaluate_'+name
- self.network=fastr.Network(id_=self.name)
+ self.network=fastr.create_network(id=self.name)self.fastr_tmpdir=os.path.join(fastr.config.mounts['tmp'],self.name)
+ self.ensemble=ensembleiffeaturesisNoneandself.mode=='StandAlone':
- raiseWORCexceptions.IOError('Either features as input or a WORC network is required for the Evaluate network.')
+ raiseWORCexceptions.WORCIOError('Either features as input or a WORC network is required for the Evaluate network.')self.features=featuresself.label_type=label_type
- self.ensemble=ensembleself.create_network()
[docs]defexecute(self):""" Execute the network through the fastr.network.execute command. """# Draw and execute nwtwork
- self.network.draw_network(self.network.id,draw_dimension=True)
+ try:
+ self.network.draw(file_path=self.network.id+'.svg',draw_dimensions=True)
+ exceptgraphviz.backend.ExecutableNotFound:
+ print('[WORC WARNING] Graphviz executable not found: not drawing network diagram. MAke sure the Graphviz executables are on your systems PATH.')self.network.execute(self.source_data,self.sink_data,execution_plugin=self.fastr_plugin,tmpdir=self.fastr_tmpdir)
self.mode='StandAlone'self.fastr_plugin=fastr_pluginself.name='WORC_Slicer_'+name
- self.network=fastr.Network(id_=self.name)
+ self.network=fastr.create_network(id=self.name)self.fastr_tmpdir=os.path.join(fastr.config.mounts['tmp'],self.name)ifimagesisNoneandself.mode=='StandAlone':message='Either images and segmentations as input or a WORC'+\
'network is required for the Evaluate network.'
- raiseWORCexceptions.IOError(message)
+ raiseWORCexceptions.WORCIOError(message)
- self.image=images
+ self.images=imagesself.segmentations=segmentationsself.create_network()
@@ -220,23 +221,23 @@
Source code for WORC.tools.Slicer
'''# Create all nodes
- self.network.node_slicer=\
- self.network.create_node('Slicer',memory='20G',id_='Slicer')
+ self.node_slicer=\
+ self.network.create_node('worc/Slicer:1.0',tool_version='1.0',id='Slicer',resources=ResourceLimit(memory='20G'))# Create sinks
- self.network.sink_PNG=\
- self.network.create_sink('PNGFile',id_='PNG')
- self.network.sink_PNGZoomed=\
- self.network.create_sink('PNGFile',id_='PNGZoomed')
+ self.sink_PNG=\
+ self.network.create_sink('PNGFile',id='PNG')
+ self.sink_PNGZoomed=\
+ self.network.create_sink('PNGFile',id='PNGZoomed')# Create links to sinks
- self.network.sink_PNG.input=self.network.node_slicer.outputs['out']
- self.network.sink_PNGZoomed.input=self.network.node_slicer.outputs['outzoom']
+ self.sink_PNG.input=self.node_slicer.outputs['out']
+ self.sink_PNGZoomed.input=self.node_slicer.outputs['outzoom']# Create sources if not supplied by a WORC networkifself.mode=='StandAlone':
- self.network.source_images=self.network.create_source('ITKImage',id_='Images')
- self.network.source_segmentations=self.network.create_source('ITKImage',id_='Segmentations')
+ self.source_images=self.network.create_source('ITKImageFile',id='Images')
+ self.source_segmentations=self.network.create_source('ITKImageFile',id='Segmentations')# Create links to sources that are not supplied by a WORC network# Not needed in this network
@@ -244,8 +245,8 @@
Source code for WORC.tools.Slicer
# Create links to the sources that could be in a WORC networkifself.mode=='StandAlone':# Sources from the Evaluate network are used
- self.network.node_slicer.inputs['image']=self.network.source_images.output
- self.network.node_slicer.inputs['segmentation']=self.network.source_segmentations.output
+ self.node_slicer.inputs['image']=self.source_images.output
+ self.node_slicer.inputs['segmentation']=self.source_segmentations.outputelse:# Sources from the WORC network are usedprint('WIP')
[docs]defexecute(self):""" Execute the network through the fastr.network.execute command. """# Draw and execute nwtwork
- self.network.draw_network(self.network.id,draw_dimension=True)
+ self.network.draw(file_path=self.network.id+'.svg',draw_dimensions=True)self.network.execute(self.source_data,self.sink_data,execution_plugin=self.fastr_plugin,tmpdir=self.fastr_tmpdir)
+#!/usr/bin/env python
+
+# Copyright 2016-2019 Biomedical Imaging Group Rotterdam, Departments of
+# Medical Informatics and Radiology, Erasmus MC, Rotterdam, The Netherlands
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+importnumpyasnp
+fromsklearn.model_selectionimporttrain_test_split
+importWORC.addexceptionsasae
+fromWORC.processing.label_processingimportload_labels
+importpandasaspd
+
+
+
[docs]defcreatefixedsplits(label_file=None,label_type=None,patient_IDs=None,
+ test_size=0.2,N_iterations=1,regression=False,
+ stratify=None,modus='singlelabel',output=None):
+ '''
+ Create fixed splits for a cross validation.
+ '''
+ # Check whether input is valid
+ ifpatient_IDsisNone:
+ iflabel_fileisnotNoneandlabel_typeisnotNone:
+ # Read the label file
+ label_data=load_labels(label_file,label_type)
+ patient_IDs=label_data['patient_IDs']
+
+ # Create the stratification object
+ ifmodus=='singlelabel':
+ stratify=label_data['label']
+ elifmodus=='multilabel':
+ # Create a stratification object from the labels
+ # Label = 0 means no label equals one
+ # Other label numbers refer to the label name that is 1
+ stratify=list()
+ labels=label_data['label']
+ forpnuminrange(0,len(labels[0])):
+ plabel=0
+ forlnum,slabelinenumerate(labels):
+ ifslabel[pnum]==1:
+ plabel=lnum+1
+ stratify.append(plabel)
+
+ else:
+ raiseae.WORCKeyError('{} is not a valid modus!').format(modus)
+ else:
+ raiseae.WORCIOError('Either a label file and label type or patient_IDs need to be provided!')
+
+ pd_dict=dict()
+ foriinrange(N_iterations):
+ print(f'Splitting iteration {i + 1} / {N_iterations}')
+ # Create a random seed for the splitting
+ random_seed=np.random.randint(5000)
+
+ # Define stratification
+ unique_patient_IDs,unique_indices=\
+ np.unique(np.asarray(patient_IDs),return_index=True)
+ ifregression:
+ unique_stratify=None
+ else:
+ unique_stratify=[stratify[i]foriinunique_indices]
+
+ # Split, throw error when dataset is too small for split ratio's
+ try:
+ unique_PID_train,indices_PID_test\
+ =train_test_split(unique_patient_IDs,
+ test_size=test_size,
+ random_state=random_seed,
+ stratify=unique_stratify)
+ exceptValueErrorase:
+ e=str(e)+' Increase the size of your test set.'
+ raiseae.WORCValueError(e)
+
+ # Check for all IDs if they are in test or training
+ indices_train=list()
+ indices_test=list()
+ patient_ID_train=list()
+ patient_ID_test=list()
+ fornum,pidinenumerate(patient_IDs):
+ ifpidinunique_PID_train:
+ indices_train.append(num)
+
+ # Make sure we get a unique ID
+ ifpidinpatient_ID_train:
+ n=1
+ whilestr(pid+'_'+str(n))inpatient_ID_train:
+ n+=1
+ pid=str(pid+'_'+str(n))
+ patient_ID_train.append(pid)
+ else:
+ indices_test.append(num)
+
+ # Make sure we get a unique ID
+ ifpidinpatient_ID_test:
+ n=1
+ whilestr(pid+'_'+str(n))inpatient_ID_test:
+ n+=1
+ pid=str(pid+'_'+str(n))
+ patient_ID_test.append(pid)
+
+ # Add to train object
+ pd_dict[str(i)+'_train']=patient_ID_train
+
+ # Test object has to be same length as training object
+ extras=[""]*(len(patient_ID_train)-len(patient_ID_test))
+ patient_ID_test.extend(extras)
+ pd_dict[str(i)+'_test']=patient_ID_test
+
+ # Convert into pandas dataframe for easy use and conversion
+ df=pd.DataFrame(pd_dict)
+
+ # Write output if required
+ ifoutputisnotNone:
+ print("Writing Output.")
+ df.to_csv(output)
+
+ returndf
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/WORC/doc/_build/html/_modules/fastr/api.html b/WORC/doc/_build/html/_modules/fastr/api.html
index b1adf136..a42f4fe9 100644
--- a/WORC/doc/_build/html/_modules/fastr/api.html
+++ b/WORC/doc/_build/html/_modules/fastr/api.html
@@ -8,7 +8,7 @@
- fastr.api — WORC 3.0.0 documentation
+ fastr.api — WORC 3.1.0 documentation
@@ -59,7 +59,7 @@
diff --git a/WORC/doc/_build/html/_sources/WORC.config.rst.txt b/WORC/doc/_build/html/_sources/WORC.config.rst.txt
new file mode 100644
index 00000000..ead619d3
--- /dev/null
+++ b/WORC/doc/_build/html/_sources/WORC.config.rst.txt
@@ -0,0 +1,126 @@
+================= ======================== ============================================================================================================================================================================================================================================================== ========================================== =================================================================================================================================================================
+Key Subkey Description Default Options
+================= ======================== ============================================================================================================================================================================================================================================================== ========================================== =================================================================================================================================================================
+General cross_validation Determine whether a cross validation will be performed or not. Obsolete, will be removed. True True, False
+ Segmentix Determine whether to use Segmentix tool for segmentation preprocessing. False True, False
+ FeatureCalculator Specifies which feature calculation tool should be used. predict/CalcFeatures:1.0 predict/CalcFeatures:1.0, pyradiomics/CF_pyradiomics:1.0, your own tool reference
+ Preprocessing Specifies which tool will be used for image preprocessing. worc/PreProcess:1.0 worc/PreProcess:1.0, your own tool reference
+ RegistrationNode Specifies which tool will be used for image registration. 'elastix4.8/Elastix:4.8' 'elastix4.8/Elastix:4.8', your own tool reference
+ TransformationNode Specifies which tool will be used for applying image transformations. 'elastix4.8/Transformix:4.8' 'elastix4.8/Transformix:4.8', your own tool reference
+ Joblib_ncores Number of cores to be used by joblib for multicore processing. 4 Integer > 0
+ Joblib_backend Type of backend to be used by joblib for multicore processing. multiprocessing multiprocessing, threading
+ tempsave Determines whether after every cross validation iteration the result will be saved, in addition to the result after all iterations. Especially useful for debugging. False True, False
+Segmentix mask If a mask is supplied, should the mask be subtracted from the contour or multiplied. subtract subtract, multiply
+ segtype If Ring, then a ring around the segmentation will be used as contour. None None, Ring
+ segradius Define the radius of the ring used if segtype is Ring. 5 Integer > 0
+ N_blobs How many of the largest blobs are extracted from the segmentation. If None, no blob extraction is used. 1 Integer > 0
+ fillholes Determines whether hole filling will be used. False True, False
+Normalize ROI If a mask is supplied and this is set to True, normalize image based on supplied ROI. Otherwise, the full image is used for normalization using the SimpleITK Normalize function. Lastly, setting this to False will result in no normalization being applied. Full True, False, Full
+ Method Method used for normalization if ROI is supplied. Currently, z-scoring or using the minimum and median of the ROI can be used. z_score z_score, minmed
+ImageFeatures shape Determine whether orientation features are computed or not. True True, False
+ histogram Determine whether histogram features are computed or not. True True, False
+ orientation Determine whether orientation features are computed or not. True True, False
+ texture_Gabor Determine whether Gabor texture features are computed or not. False True, False
+ texture_LBP Determine whether LBP texture features are computed or not. True True, False
+ texture_GLCM Determine whether GLCM texture features are computed or not. True True, False
+ texture_GLCMMS Determine whether GLCM Multislice texture features are computed or not. True True, False
+ texture_GLRLM Determine whether GLRLM texture features are computed or not. True True, False
+ texture_GLSZM Determine whether GLSZM texture features are computed or not. True True, False
+ texture_NGTDM Determine whether NGTDM texture features are computed or not. True True, False
+ coliage Determine whether coliage features are computed or not. False True, False
+ vessel Determine whether vessel features are computed or not. False True, False
+ log Determine whether LoG features are computed or not. False True, False
+ phase Determine whether local phase features are computed or not. False True, False
+ image_type Modality of images supplied. Determines how the image is loaded. CT CT
+ gabor_frequencies Frequencies of Gabor filters used: can be a single float or a list. 0.05, 0.2, 0.5 Float(s)
+ gabor_angles Angles of Gabor filters in degrees: can be a single integer or a list. 0, 45, 90, 135 Integer(s)
+ GLCM_angles Angles used in GLCM computation in radians: can be a single float or a list. 0, 0.79, 1.57, 2.36 Float(s)
+ GLCM_levels Number of grayscale levels used in discretization before GLCM computation. 16 Integer > 0
+ GLCM_distances Distance(s) used in GLCM computation in pixels: can be a single integer or a list. 1, 3 Integer(s) > 0
+ LBP_radius Radii used for LBP computation: can be a single integer or a list. 3, 8, 15 Integer(s) > 0
+ LBP_npoints Number(s) of points used in LBP computation: can be a single integer or a list. 12, 24, 36 Integer(s) > 0
+ phase_minwavelength Minimal wavelength in pixels used for phase features. 3 Integer > 0
+ phase_nscale Number of scales used in phase feature computation. 5 Integer > 0
+ log_sigma Standard deviation(s) in pixels used in log feature computation: can be a single integer or a list. 1, 5, 10 Integer(s)
+ vessel_scale_range Scale in pixels used for Frangi vessel filter. Given as a minimum and a maximum. 1, 10 Two integers: min and max.
+ vessel_scale_step Step size used to go from minimum to maximum scale on Frangi vessel filter. 2 Integer > 0
+ vessel_radius Radius to determine boundary of between inner part and edge in Frangi vessel filter. 5 Integer > 0
+Featsel Variance If True, exclude features which have a variance < 0.01. Based on ` sklearn `_. True Boolean(s)
+ GroupwiseSearch Randomly select which feature groups to use. Parameters determined by the SelectFeatGroup config part, see below. True Boolean(s)
+ SelectFromModel Select features by first training a LASSO model. The alpha for the LASSO model is randomly generated. See also `sklearn `_. False Boolean(s)
+ UsePCA If True, Use Principle Component Analysis (PCA) to select features. False Boolean(s)
+ PCAType Method to select number of components using PCA: Either the number of components that explains 95% of the variance, or use a fixed number of components.95variance 95variance Inteteger(s), 95variance
+ StatisticalTestUse If True, use statistical test to select features. False Boolean(s)
+ StatisticalTestMetric Define the type of statistical test to be used. ttest, Welch, Wilcoxon, MannWhitneyU ttest, Welch, Wilcoxon, MannWhitneyU
+ StatisticalTestThreshold Specify a threshold for the p-value threshold used in the statistical test to select features. The first element defines the lower boundary, the other the upper boundary. Random sampling will occur between the boundaries. -2, 1.5 Two Integers: loc and scale
+ ReliefUse If True, use Relief to select features. False Boolean(s)
+ ReliefNN Min and max of number of nearest neighbors search range in Relief. 2, 4 Two Integers: loc and scale
+ ReliefSampleSize Min and max of sample size search range in Relief. 1, 1 Two Integers: loc and scale
+ ReliefDistanceP Min and max of positive distance search range in Relief. 1, 3 Two Integers: loc and scale
+ ReliefNumFeatures Min and max of number of features that is selected search range in Relief. 25, 200 Two Integers: loc and scale
+SelectFeatGroup shape_features If True, use shape features in model. True, False Boolean(s)
+ histogram_features If True, use histogram features in model. True, False Boolean(s)
+ orientation_features If True, use orientation features in model. True, False Boolean(s)
+ texture_Gabor_features If True, use Gabor texture features in model. False Boolean(s)
+ texture_GLCM_features If True, use GLCM texture features in model. True, False Boolean(s)
+ texture_GLCMMS_features If True, use GLCM Multislice texture features in model. True, False Boolean(s)
+ texture_GLRLM_features If True, use GLRLM texture features in model. True, False Boolean(s)
+ texture_GLSZM_features If True, use GLSZM texture features in model. True, False Boolean(s)
+ texture_NGTDM_features If True, use NGTDM texture features in model. True, False Boolean(s)
+ texture_LBP_features If True, use LBP texture features in model. True, False Boolean(s)
+ patient_features If True, use patient features in model. False Boolean(s)
+ semantic_features If True, use semantic features in model. False Boolean(s)
+ coliage_features If True, use coliage features in model. False Boolean(s)
+ log_features If True, use log features in model. False Boolean(s)
+ vessel_features If True, use vessel features in model. False Boolean(s)
+ phase_features If True, use phase features in model. False Boolean(s)
+Imputation use If True, use feature imputation methods to replace NaN values. If False, all NaN features will be set to zero. False Boolean(s)
+ strategy Method to be used for imputation. mean, median, most_frequent, constant, knn mean, median, most_frequent, constant, knn
+ n_neighbors When using k-Nearest Neighbors (kNN) for feature imputation, determines the number of neighbors used for imputation. Can be a single integer or a list. 5, 5 Two Integers: loc and scale
+Classification fastr Use fastr for the optimization gridsearch (recommended on clusters, default) or if set to False , joblib (recommended for PCs but not on Windows). True True, False
+ fastr_plugin Name of execution plugin to be used. Default use the same as the self.fastr_plugin for the WORC object. LinearExecution Any `fastr execution plugin `_ .
+ classifiers Select the estimator(s) to use. Most are implemented using `sklearn `_. For abbreviations, see above. SVM SVM , SVR, SGD, SGDR, RF, LDA, QDA, ComplementND, GaussianNB, LR, RFR, Lasso, ElasticNet. All are estimators from `sklearn `_
+ max_iter Maximum number of iterations to use in training an estimator. Only for specific estimators, see `sklearn `_. 100000 Integer
+ SVMKernel When using a SVM, specify the kernel type. poly poly, linear, rbf
+ SVMC Range of the SVM slack parameter. We sample on a uniform log scale: the parameters specify the range of the exponent (a, a + b). 0, 6 Two Integers: loc and scale
+ SVMdegree Range of the SVM polynomial degree when using a polynomial kernel. We sample on a uniform scale: the parameters specify the range (a, a + b). 1, 6 Two Integers: loc and scale
+ SVMcoef0 Range of SVM homogeneity parameter. We sample on a uniform scale: the parameters specify the range (a, a + b). 0, 1 Two Integers: loc and scale
+ SVMgamma Range of the SVM gamma parameter. We sample on a uniform log scale: the parameters specify the range of the exponent (a, a + b) -5, 5 Two Integers: loc and scale
+ RFn_estimators Range of number of trees in a RF. We sample on a uniform scale: the parameters specify the range (a, a + b). 10, 90 Two Integers: loc and scale
+ RFmin_samples_split Range of minimum number of samples required to split a branch in a RF. We sample on a uniform scale: the parameters specify the range (a, a + b). 2, 3 Two Integers: loc and scale
+ RFmax_depth Range of maximum depth of a RF. We sample on a uniform scale: the parameters specify the range (a, a + b). 5, 5 Two Integers: loc and scale
+ LRpenalty Penalty term used in LR. l2, l1 none, l2, l1
+ LRC Range of regularization strength in LR. We sample on a uniform scale: the parameters specify the range (a, a + b). 0.01, 1.0 Two Integers: loc and scale
+ LDA_solver Solver used in LDA. svd, lsqr, eigen svd, lsqr, eigen
+ LDA_shrinkage Range of the LDA shrinkage parameter. We sample on a uniform log scale: the parameters specify the range of the exponent (a, a + b). -5, 5 Two Integers: loc and scale
+ QDA_reg_param Range of the QDA regularization parameter. We sample on a uniform log scale: the parameters specify the range of the exponent (a, a + b). -5, 5 Two Integers: loc and scale
+ ElasticNet_alpha Range of the ElasticNet penalty parameter. We sample on a uniform log scale: the parameters specify the range of the exponent (a, a + b). -5, 5 Two Integers: loc and scale
+ ElasticNet_l1_ratio Range of l1 ratio in LR. We sample on a uniform scale: the parameters specify the range (a, a + b). 0, 1 Two Integers: loc and scale
+ SGD_alpha Range of the SGD penalty parameter. We sample on a uniform log scale: the parameters specify the range of the exponent (a, a + b). -5, 5 Two Integers: loc and scale
+ SGD_l1_ratio Range of l1 ratio in SGD. We sample on a uniform scale: the parameters specify the range (a, a + b). 0, 1 Two Integers: loc and scale
+ SGD_loss hinge, Loss function of SG hinge, squared_hinge, modified_huber hinge, squared_hinge, modified_huber
+ SGD_penalty Penalty term in SGD. none, l2, l1 none, l2, l1
+ CNB_alpha Regularization strenght in ComplementNB. We sample on a uniform scale: the parameters specify the range (a, a + b) 0, 1 Two Integers: loc and scale
+CrossValidation N_iterations Number of times the data is split in training and test in the outer cross-validation. 100 Integer
+ test_size The percentage of data to be used for testing. 0.2 Float
+Labels label_names The labels used from your label file for classification. Label1, Label2 String(s)
+ modus Determine whether multilabel or singlelabel classification or regression will be performed. singlelabel singlelabel, multilabel
+ url WIP WIP Not Supported Yet
+ projectID WIP WIP Not Supported Yet
+HyperOptimization scoring_method Specify the optimization metric for your hyperparameter search. f1_weighted Any `sklearn metric `_
+ test_size Size of test set in the hyperoptimization cross validation, given as a percentage of the whole dataset. 0.15 Float
+ n_splits 5 5
+ N_iterations Number of iterations used in the hyperparameter optimization. This corresponds to the number of samples drawn from the parameter grid. 10000 Integer
+ n_jobspercore Number of jobs assigned to a single core. Only used if fastr is set to true in the classfication. 2000 Integer
+ maxlen 100 100
+ ranking_score test_score test_score
+FeatureScaling scale_features Determine whether to use feature scaling is. True Boolean(s)
+ scaling_method Determine the scaling method. z_score z_score, minmax
+SampleProcessing SMOTE Determine whether to use SMOTE oversampling, see also ` imbalanced learn `_. True Boolean(s)
+ SMOTE_ratio Determine the ratio of oversampling. If 1, the minority class will be oversampled to the same size as the majority class. We sample on a uniform scale: the parameters specify the range (a, a + b). 1, 0 Two Integers: loc and scale
+ SMOTE_neighbors Number of neighbors used in SMOTE. This should be much smaller than the number of objects/patients you supply. We sample on a uniform scale: the parameters specify the range (a, a + b). 5, 15 Two Integers: loc and scale
+ Oversampling Determine whether to random oversampling. False Boolean(s)
+Ensemble Use Determine whether to use ensembling or not. Either provide an integer to state how many estimators to include, or True, which will use the default ensembling method. 1 Boolean or Integer
+Bootstrap Use False False
+ N_iterations 1000 1000
+================= ======================== ============================================================================================================================================================================================================================================================== ========================================== =================================================================================================================================================================
\ No newline at end of file
diff --git a/WORC/doc/_build/html/_sources/autogen/WORC.classification.rst.txt b/WORC/doc/_build/html/_sources/autogen/WORC.classification.rst.txt
index 3e0ac848..7a49a362 100644
--- a/WORC/doc/_build/html/_sources/autogen/WORC.classification.rst.txt
+++ b/WORC/doc/_build/html/_sources/autogen/WORC.classification.rst.txt
@@ -1,6 +1,15 @@
classification Package
======================
+:mod:`classification` Package
+-----------------------------
+
+.. automodule:: WORC.classification
+ :members:
+ :undoc-members:
+ :show-inheritance:
+ :special-members:
+
:mod:`AdvancedSampler` Module
-----------------------------
@@ -10,6 +19,15 @@ classification Package
:show-inheritance:
:special-members:
+:mod:`ObjectSampler` Module
+---------------------------
+
+.. automodule:: WORC.classification.ObjectSampler
+ :members:
+ :undoc-members:
+ :show-inheritance:
+ :special-members:
+
:mod:`RankedSVM` Module
-----------------------
@@ -37,6 +55,15 @@ classification Package
:show-inheritance:
:special-members:
+:mod:`createfixedsplits` Module
+-------------------------------
+
+.. automodule:: WORC.classification.createfixedsplits
+ :members:
+ :undoc-members:
+ :show-inheritance:
+ :special-members:
+
:mod:`crossval` Module
----------------------
@@ -82,6 +109,15 @@ classification Package
:show-inheritance:
:special-members:
+:mod:`regressors` Module
+------------------------
+
+.. automodule:: WORC.classification.regressors
+ :members:
+ :undoc-members:
+ :show-inheritance:
+ :special-members:
+
:mod:`trainclassifier` Module
-----------------------------
diff --git a/WORC/doc/_build/html/_sources/autogen/WORC.config.rst.txt b/WORC/doc/_build/html/_sources/autogen/WORC.config.rst.txt
new file mode 100644
index 00000000..c77d23f6
--- /dev/null
+++ b/WORC/doc/_build/html/_sources/autogen/WORC.config.rst.txt
@@ -0,0 +1,19 @@
+================= ===================================================
+Key Reference
+================= ===================================================
+Bootstrap :ref:`Bootstrap `
+Classification :ref:`Classification `
+CrossValidation :ref:`CrossValidation `
+Ensemble :ref:`Ensemble `
+Featsel :ref:`Featsel `
+FeatureScaling :ref:`FeatureScaling `
+General :ref:`General `
+HyperOptimization :ref:`HyperOptimization `
+ImageFeatures :ref:`ImageFeatures `
+Imputation :ref:`Imputation `
+Labels :ref:`Labels `
+Normalize :ref:`Normalize `
+SampleProcessing :ref:`SampleProcessing `
+Segmentix :ref:`Segmentix `
+SelectFeatGroup :ref:`SelectFeatGroup `
+================= ===================================================
\ No newline at end of file
diff --git a/WORC/doc/_build/html/_sources/autogen/WORC.detectors.rst.txt b/WORC/doc/_build/html/_sources/autogen/WORC.detectors.rst.txt
new file mode 100644
index 00000000..77c54f1b
--- /dev/null
+++ b/WORC/doc/_build/html/_sources/autogen/WORC.detectors.rst.txt
@@ -0,0 +1,12 @@
+detectors Package
+=================
+
+:mod:`detectors` Module
+-----------------------
+
+.. automodule:: WORC.detectors.detectors
+ :members:
+ :undoc-members:
+ :show-inheritance:
+ :special-members:
+
diff --git a/WORC/doc/_build/html/_sources/autogen/WORC.exampledata.rst.txt b/WORC/doc/_build/html/_sources/autogen/WORC.exampledata.rst.txt
new file mode 100644
index 00000000..edd6d5b0
--- /dev/null
+++ b/WORC/doc/_build/html/_sources/autogen/WORC.exampledata.rst.txt
@@ -0,0 +1,12 @@
+exampledata Package
+===================
+
+:mod:`datadownloader` Module
+----------------------------
+
+.. automodule:: WORC.exampledata.datadownloader
+ :members:
+ :undoc-members:
+ :show-inheritance:
+ :special-members:
+
diff --git a/WORC/doc/_build/html/_sources/autogen/WORC.facade.intermediatefacade.rst.txt b/WORC/doc/_build/html/_sources/autogen/WORC.facade.intermediatefacade.rst.txt
new file mode 100644
index 00000000..878fd941
--- /dev/null
+++ b/WORC/doc/_build/html/_sources/autogen/WORC.facade.intermediatefacade.rst.txt
@@ -0,0 +1,30 @@
+intermediatefacade Package
+==========================
+
+:mod:`configbuilder` Module
+---------------------------
+
+.. automodule:: WORC.facade.intermediatefacade.configbuilder
+ :members:
+ :undoc-members:
+ :show-inheritance:
+ :special-members:
+
+:mod:`exceptions` Module
+------------------------
+
+.. automodule:: WORC.facade.intermediatefacade.exceptions
+ :members:
+ :undoc-members:
+ :show-inheritance:
+ :special-members:
+
+:mod:`intermediatefacade` Module
+--------------------------------
+
+.. automodule:: WORC.facade.intermediatefacade.intermediatefacade
+ :members:
+ :undoc-members:
+ :show-inheritance:
+ :special-members:
+
diff --git a/WORC/doc/_build/html/_sources/autogen/WORC.facade.rst.txt b/WORC/doc/_build/html/_sources/autogen/WORC.facade.rst.txt
new file mode 100644
index 00000000..fec9f4e2
--- /dev/null
+++ b/WORC/doc/_build/html/_sources/autogen/WORC.facade.rst.txt
@@ -0,0 +1,19 @@
+facade Package
+==============
+
+:mod:`facade` Package
+---------------------
+
+.. automodule:: WORC.facade
+ :members:
+ :undoc-members:
+ :show-inheritance:
+ :special-members:
+
+Subpackages
+-----------
+
+.. toctree::
+
+ WORC.facade.simpleworc
+
diff --git a/WORC/doc/_build/html/_sources/autogen/WORC.featureprocessing.rst.txt b/WORC/doc/_build/html/_sources/autogen/WORC.featureprocessing.rst.txt
index 29ce3960..eb57f40b 100644
--- a/WORC/doc/_build/html/_sources/autogen/WORC.featureprocessing.rst.txt
+++ b/WORC/doc/_build/html/_sources/autogen/WORC.featureprocessing.rst.txt
@@ -10,6 +10,15 @@ featureprocessing Package
:show-inheritance:
:special-members:
+:mod:`Decomposition` Module
+---------------------------
+
+.. automodule:: WORC.featureprocessing.Decomposition
+ :members:
+ :undoc-members:
+ :show-inheritance:
+ :special-members:
+
:mod:`Imputer` Module
---------------------
diff --git a/WORC/doc/_build/html/_sources/autogen/WORC.plotting.rst.txt b/WORC/doc/_build/html/_sources/autogen/WORC.plotting.rst.txt
index 8cec5656..1f866572 100644
--- a/WORC/doc/_build/html/_sources/autogen/WORC.plotting.rst.txt
+++ b/WORC/doc/_build/html/_sources/autogen/WORC.plotting.rst.txt
@@ -1,6 +1,15 @@
plotting Package
================
+:mod:`plotting` Package
+-----------------------
+
+.. automodule:: WORC.plotting
+ :members:
+ :undoc-members:
+ :show-inheritance:
+ :special-members:
+
:mod:`compute_CI` Module
------------------------
@@ -37,15 +46,6 @@ plotting Package
:show-inheritance:
:special-members:
-:mod:`plot_SVR` Module
-----------------------
-
-.. automodule:: WORC.plotting.plot_SVR
- :members:
- :undoc-members:
- :show-inheritance:
- :special-members:
-
:mod:`plot_barchart` Module
---------------------------
diff --git a/WORC/doc/_build/html/_sources/autogen/WORC.resources.fastr_tools.rst.txt b/WORC/doc/_build/html/_sources/autogen/WORC.resources.fastr_tools.rst.txt
new file mode 100644
index 00000000..b42ed940
--- /dev/null
+++ b/WORC/doc/_build/html/_sources/autogen/WORC.resources.fastr_tools.rst.txt
@@ -0,0 +1,12 @@
+fastr_tools Package
+===================
+
+:mod:`fastr_tools` Package
+--------------------------
+
+.. automodule:: WORC.resources.fastr_tools
+ :members:
+ :undoc-members:
+ :show-inheritance:
+ :special-members:
+
diff --git a/WORC/doc/_build/html/_sources/autogen/WORC.resources.rst.txt b/WORC/doc/_build/html/_sources/autogen/WORC.resources.rst.txt
new file mode 100644
index 00000000..c46f113f
--- /dev/null
+++ b/WORC/doc/_build/html/_sources/autogen/WORC.resources.rst.txt
@@ -0,0 +1,11 @@
+resources Package
+=================
+
+Subpackages
+-----------
+
+.. toctree::
+
+ WORC.resources.fastr_tests
+ WORC.resources.fastr_tools
+
diff --git a/WORC/doc/_build/html/_sources/autogen/WORC.rst.txt b/WORC/doc/_build/html/_sources/autogen/WORC.rst.txt
index 7a6d477b..ae8eff11 100644
--- a/WORC/doc/_build/html/_sources/autogen/WORC.rst.txt
+++ b/WORC/doc/_build/html/_sources/autogen/WORC.rst.txt
@@ -35,8 +35,12 @@ Subpackages
WORC.IOparser
WORC.classification
+ WORC.detectors
+ WORC.exampledata
+ WORC.facade
WORC.featureprocessing
WORC.plotting
WORC.processing
+ WORC.resources
WORC.tools
diff --git a/WORC/doc/_build/html/_sources/autogen/WORC.tools.rst.txt b/WORC/doc/_build/html/_sources/autogen/WORC.tools.rst.txt
index 7649a737..e2645f5e 100644
--- a/WORC/doc/_build/html/_sources/autogen/WORC.tools.rst.txt
+++ b/WORC/doc/_build/html/_sources/autogen/WORC.tools.rst.txt
@@ -37,3 +37,12 @@ tools Package
:show-inheritance:
:special-members:
+:mod:`createfixedsplits` Module
+-------------------------------
+
+.. automodule:: WORC.tools.createfixedsplits
+ :members:
+ :undoc-members:
+ :show-inheritance:
+ :special-members:
+
diff --git a/WORC/doc/_build/html/_sources/index.rst.txt b/WORC/doc/_build/html/_sources/index.rst.txt
index 155c3206..a3569d4c 100644
--- a/WORC/doc/_build/html/_sources/index.rst.txt
+++ b/WORC/doc/_build/html/_sources/index.rst.txt
@@ -53,7 +53,7 @@ WORC has been used in the following studies:
`Jose M. Castillo T., Martijn P. A. Starmans, Ivo Schoots, Wiro J. Niessen, Stefan Klein, Jifke F. Veenland. "CLASSIFICATION OF PROSTATE CANCER: HIGH GRADE VERSUS LOW GRADE USING A RADIOMICS APPROACH." IEEE International Symposium on Biomedical Imaging (ISBI) 2019. `_
-WORC is made possible by contributions from the following people: Martijn Starmans, and Stefan Klein
+WORC is made possible by contributions from the following people: Martijn Starmans, Thomas Phil, and Stefan Klein
WORC Documentation
diff --git a/WORC/doc/_build/html/_sources/static/configuration.rst.txt b/WORC/doc/_build/html/_sources/static/configuration.rst.txt
index a46085ab..48d71936 100644
--- a/WORC/doc/_build/html/_sources/static/configuration.rst.txt
+++ b/WORC/doc/_build/html/_sources/static/configuration.rst.txt
@@ -3,7 +3,23 @@
Configuration
=============
+Introduction
+------------
+WORC has defaults for all settings so it can be run out of the box to test the examples.
+However, you may want to alter the fastr configuration to your system settings, e.g.
+to locate your input and output folders and how much you want to parallelize the execution.
+
+Fastr will search for a config file named ``config.py`` in the ``$FASTRHOME`` directory
+(which defaults to ``~/.fastr/`` if it is not set). So if ``$FASTRHOME`` is set the ``~/.fastr/``
+will be ignored. Additionally, .py files from the ``$FASTRHOME/config.d`` folder will be parsed
+as well. You will see that upon installation, WORC has already put a ``WORC_config.py`` file in the
+``config.d`` folder.
+
+For a sample configuration file and a complete overview of the options in ``config.py`` see
+the :ref:`configuration-chapter` section.
+
+% Note: Above was originally from quick start
As ``WORC`` and the default tools used are mostly Python based, we've chosen
to put our configuration in a ``configparser`` object. This has several
advantages:
@@ -12,6 +28,10 @@ advantages:
2. Second, each tool can be set to parse only specific parts of the configuration,
enabling us to supply one file to all tools instead of needing many parameter files.
+
+Creation and interaction
+-------------------------
+
The default configuration is generated through the
:py:meth:`WORC.defaultconfig() `
function. You can then change things as you would in a dictionary and
@@ -52,48 +72,87 @@ means that the SVM is 2x more likely to be tested in the model selection than LR
list can be created by using commas for separation, e.g.
:py:meth:`Network.create_source <'value1, value2, ... ')>`.
+Contents
+--------
+The config object can be indexed as ``config[key][subkey] = value``. The various keys, subkeys, and the values
+(description, defaults and options) can be found below.
-General
--------
+.. include:: ../autogen/WORC.config.rst
+Details on each section of the config can be found below.
-PREDICTGeneral
---------------
-These fields contain general settings for when using PREDICT.
+.. _config-General:
+
+General
+~~~~~~~
+These fields contain general settings for when using WORC.
For more info on the Joblib settings, which are used in the Joblib
Parallel function, see `here `__. When you run
WORC on a cluster with nodes supporting only a single core to be used
per node, e.g. the BIGR cluster, use only 1 core and threading as a
backend.
+**Description:**
+.. include:: ../autogen/config/WORC.config_General_description.rst
+**Defaults and Options:**
+
+.. include:: ../autogen/config/WORC.config_General_defopts.rst
+
+
+.. _config-Segmentix:
Segmentix
----------
+~~~~~~~~~
These fields are only important if you specified using the segmentix
tool in the general configuration.
+**Description:**
+
+.. include:: ../autogen/config/WORC.config_Segmentix_description.rst
-Preprocessing
--------------
+**Defaults and Options:**
+
+.. include:: ../autogen/config/WORC.config_Segmentix_defopts.rst
+
+
+.. _config-Normalize:
+Normalize
+~~~~~~~~~~~~~
The preprocessing node acts before the feature extraction on the image.
Currently, only normalization is included: hence the dictionary name is
*Normalize*. Additionally, scans with image type CT (see later in the
tutorial) provided as DICOM are scaled to Hounsfield Units.
+**Description:**
+
+.. include:: ../autogen/config/WORC.config_Normalize_description.rst
-Imagefeatures
--------------
+**Defaults and Options:**
+.. include:: ../autogen/config/WORC.config_Normalize_defopts.rst
+
+
+.. _config-ImageFeatures:
+ImageFeatures
+~~~~~~~~~~~~~
If using the PREDICT toolbox, you can specify some settings for the
feature computation here. Also, you can select if the certain features
are computed or not.
+**Description:**
-Featsel
--------
+.. include:: ../autogen/config/WORC.config_ImageFeatures_description.rst
+
+**Defaults and Options:**
+
+.. include:: ../autogen/config/WORC.config_ImageFeatures_defopts.rst
+
+.. _config-Featsel:
+Featsel
+~~~~~~~
When using the PREDICT toolbox for classification, these settings can be
used for feature selection methods. Note that these settings are
actually used in the hyperparameter optimization. Hence you can provide
@@ -102,9 +161,18 @@ which finally the best setting in combination with the other
hyperparameters is selected. Again, these should be formatted as string
containing the actual values, e.g. value1, value2.
+**Description:**
+
+.. include:: ../autogen/config/WORC.config_Featsel_description.rst
+
+**Defaults and Options:**
+.. include:: ../autogen/config/WORC.config_Featsel_defopts.rst
+
+
+.. _config-SelectFeatGroup:
SelectFeatGroup
----------------
+~~~~~~~~~~~~~~~
If the PREDICT feature computation and classification tools are used,
then you can do a gridsearch among the various feature groups for the
optimal combination. If you do not want this, set all fields to a single
@@ -114,8 +182,18 @@ Previously, there was a single parameter for the texture features,
selecting all, none or a single group. This is still supported, but not
recommended, and looks as follows:
+**Description:**
+
+.. include:: ../autogen/config/WORC.config_SelectFeatGroup_description.rst
+
+**Defaults and Options:**
+
+.. include:: ../autogen/config/WORC.config_SelectFeatGroup_defopts.rst
+
+
+.. _config-Imputation:
Imputation
-----------------
+~~~~~~~~~~~~~~~~
When using the PREDICT toolbox for classification, these settings are
used for feature imputation.Note that these settings are actually used
in the hyperparameter optimization. Hence you can provide multiple
@@ -123,22 +201,50 @@ values per field, of which random samples will be drawn of which finally
the best setting in combination with the other hyperparameters is
selected.
+**Description:**
+
+.. include:: ../autogen/config/WORC.config_Imputation_description.rst
+**Defaults and Options:**
+
+.. include:: ../autogen/config/WORC.config_Imputation_defopts.rst
+
+
+.. _config-Classification:
Classification
---------------
+~~~~~~~~~~~~~~
When using the PREDICT toolbox for classification, you can specify the
following settings. Almost all of these are used in CASH. Most of the
classifiers are implemented using sklearn; hence descriptions of the
hyperparameters can also be found there.
+**Description:**
+
+.. include:: ../autogen/config/WORC.config_Classification_description.rst
+
+**Defaults and Options:**
+.. include:: ../autogen/config/WORC.config_Classification_defopts.rst
+
+
+.. _config-CrossValidation:
CrossValidation
----------------
+~~~~~~~~~~~~~~~
When using the PREDICT toolbox for classification and you specified
using cross validation, specify the following settings.
+**Description:**
+
+.. include:: ../autogen/config/WORC.config_CrossValidation_description.rst
+
+**Defaults and Options:**
+
+.. include:: ../autogen/config/WORC.config_CrossValidation_defopts.rst
+
+
+.. _config-Labels:
Labels
---------
+~~~~~~~~
When using the PREDICT toolbox for classification, you have to set the
label used for classification.
@@ -162,8 +268,6 @@ You can supply a single label or multiple labels split by commas, for
each of which an estimator will be fit. For example, suppose you simply
want to use Label1 for classification, then set:
-
-
.. code-block:: python
config['Labels']['label_names'] = 'Label1'
@@ -173,44 +277,86 @@ If you want to first train a classifier on Label1 and then Label2,
set: ``config[Genetics][label_names] = Label1, Label2``
+**Description:**
+
+.. include:: ../autogen/config/WORC.config_Labels_description.rst
+
+**Defaults and Options:**
+.. include:: ../autogen/config/WORC.config_Labels_defopts.rst
+
+.. _config-HyperOptimization:
Hyperoptimization
------------------
+~~~~~~~~~~~~~~~~~
When using the PREDICT toolbox for classification, you have to supply
your hyperparameter optimization procedure here.
+**Description:**
+
+.. include:: ../autogen/config/WORC.config_HyperOptimization_description.rst
+
+**Defaults and Options:**
+
+.. include:: ../autogen/config/WORC.config_HyperOptimization_defopts.rst
+
+.. _config-FeatureScaling:
FeatureScaling
---------------
+~~~~~~~~~~~~~~
Determines which method is applied to scale each feature.
+**Description:**
+
+.. include:: ../autogen/config/WORC.config_FeatureScaling_description.rst
+
+**Defaults and Options:**
+
+.. include:: ../autogen/config/WORC.config_FeatureScaling_defopts.rst
+
+.. _config-SampleProcessing:
SampleProcessing
-----------------
+~~~~~~~~~~~~~~~~
Before performing the hyperoptimization, you can use SMOTE: Synthetic
Minority Over-sampling Technique to oversample your data.
+**Description:**
+
+.. include:: ../autogen/config/WORC.config_SampleProcessing_description.rst
+**Defaults and Options:**
+.. include:: ../autogen/config/WORC.config_SampleProcessing_defopts.rst
+.. _config-Ensemble:
Ensemble
---------
+~~~~~~~~
WORC supports ensembling of workflows. This is not a default approach in
radiomics, hence the default is to not use it and select only the best
performing workflow.
+**Description:**
+
+.. include:: ../autogen/config/WORC.config_Ensemble_description.rst
+
+**Defaults and Options:**
+
+.. include:: ../autogen/config/WORC.config_Ensemble_defopts.rst
+
+
+.. _config-Bootstrap:
+Bootstrap
+~~~~~~~~~
+Besides cross validation, WORC supports bootstrapping on the test set for performance evaluation.
+**Description:**
+.. include:: ../autogen/config/WORC.config_Bootstrap_description.rst
-FASTR_bugs
-----------
-Currently, when using XNAT as a source, FASTR can only retrieve DICOM
-directories. We made a workaround for this for the images and
-segmentations, but this only works if all your files have the same name
-and extension. These are provided in this configuration part.
+**Defaults and Options:**
+.. include:: ../autogen/config/WORC.config_Bootstrap_defopts.rst
-.. include:: ../autogen/WORC.config.rst
\ No newline at end of file
diff --git a/WORC/doc/_build/html/_sources/static/quick_start.rst.txt b/WORC/doc/_build/html/_sources/static/quick_start.rst.txt
index 47be8c5a..4a5b8106 100644
--- a/WORC/doc/_build/html/_sources/static/quick_start.rst.txt
+++ b/WORC/doc/_build/html/_sources/static/quick_start.rst.txt
@@ -10,6 +10,9 @@ Installation
You can install WORC either using pip, or from the source code.
+.. note:: The version of PyRadiomics which WORC currently uses requires numpy to be installed beforehand. Make sure you do so, e.g. ``pip install numpy``.
+
+
Installing via pip
``````````````````
@@ -47,35 +50,226 @@ library. For Ubuntu this is in the ``/usr/local/lib/python3.x/dist-packages/`` f
.. note:: If you want to develop WORC, you might want to use ``pip install -e .`` to get an editable install
-.. note:: You might want to consider installing ``WORC`` in a `virtualenv `_
-
-
-Configuration
--------------
+.. note:: You might want to consider installing ``WORC`` in a
+ `virtualenv `_
-WORC has defaults for all settings so it can be run out of the box to test the examples.
-However, you may want to alter the fastr configuration to your system settings, e.g.
-to locate your input and output folders and how much you want to parallelize the execution.
+Windows installation
+````````````````````
-Fastr will search for a config file named ``config.py`` in the ``$FASTRHOME`` directory
-(which defaults to ``~/.fastr/`` if it is not set). So if ``$FASTRHOME`` is set the ``~/.fastr/``
-will be ignored. Additionally, .py files from the ``$FASTRHOME/config.d`` folder will be parsed
-as well. You will see that upon installation, WORC has already put a ``WORC_config.py`` file in the
-``config.d`` folder.
+On Windows, we strongly recommend to install python through the
+`Anaconda distribution `_.
-For a sample configuration file and a complete overview of the options in ``config.py`` see
-the :ref:`Config file ` section.
+Regardless of your installation, you will need `Microsoft Visual Studio `_: the Community
+edition can be downloaded and installed for free.
+If you still get an error similar to error: ``Microsoft Visual C++ 14.0 is required. Get it with``
+`Microsoft Visual C++ Build Tools `_
+, please follow the respective link and install the requirements.
-Tutorial
---------
-To start out using WORC, we recommend you to follow the tutorial located in the
-[WORCTutorial Github](https://github.com/MStarmans91/WORCTutorial). Besides some more advanced tutorials,
-the main tutorial can be found in the WORCTutorial.ipynb Jupyter notebook. Instructions on how
-to use the notebook can be found in the Github.
+Tutorials
+---------
+To start out using WORC, we recommend you to follow the tutorials located in the
+`WORCTutorial Github `_. This repository
+contains tutorials for an introduction to WORC, as well as more advanced workflows.
If you run into any issue, you can first debug your network using
`the fastr trace tool `_.
If you're stuck, feel free to post an issue on the `WORC Github `_.
+
+Hello World
+------------
+
+Below is the same script as found in the SimpleWORC tutorial found in the `WORCTutorial Github `_.
+
+Import packages
+```````````````
+
+First, import WORC and some additional python packages.
+
+.. code-block:: python
+
+ from WORC import SimpleWORC
+ import os
+
+ # These packages are only used in analysing the results
+ import pandas as pd
+ import json
+ import fastr
+ import glob
+
+ # If you don't want to use your own data, we use the following example set,
+ # see also the next code block in this example.
+ from WORC.exampledata.datadownloader import download_HeadAndNeck
+
+ # Define the folder this script is in, so we can easily find the example data
+ script_path = os.path.dirname(os.path.abspath(__file__))
+
+Input
+`````
+The minimal inputs to WORC are:
+
+ 1. Images
+ 2. Segmentations
+ 3. Labels
+
+In SimpleWORC, we assume you have a folder "datadir", in which there is a
+folder for each patient, where in each folder there is a image.nii.gz and a mask.nii.gz:
+
+ * Datadir
+
+ * Patient_001
+
+ * image.nii.gz
+ * mask.nii.gz
+
+ * Patient_002
+
+ * image.nii.gz
+ * mask.nii.gz
+
+ * ...
+
+In the example, we will use open source data from the online
+`BMIA XNAT platform `_
+This dataset consists of CT scans of patients with Head and Neck tumors. We will download
+a subset of 20 patients in this folder. You can change this settings if you like.
+
+.. code-block:: python
+
+ nsubjects = 20 # use "all" to download all patients
+ data_path = os.path.join(script_path, 'Data')
+ download_HeadAndNeck(datafolder=data_path, nsubjects=nsubjects)
+
+.. note:: You can skip this code block if you use your own data.
+
+Identify our data structure: change the fields below accordingly if you use your own dataset.
+
+.. code-block:: python
+
+ imagedatadir = os.path.join(data_path, 'stwstrategyhn1')
+ image_file_name = 'image.nii.gz'
+ segmentation_file_name = 'mask.nii.gz'
+
+ # File in which the labels (i.e. outcome you want to predict) is stated
+ # Again, change this accordingly if you use your own data.
+ label_file = os.path.join(data_path, 'Examplefiles', 'pinfo_HN.csv')
+
+ # Name of the label you want to predict
+ label_name = 'imaginary_label_1'
+
+ # Determine whether we want to do a coarse quick experiment, or a full lengthy
+ # one. Again, change this accordingly if you use your own data.
+ coarse = True
+
+ # Give your experiment a name
+ experiment_name = 'Example_STWStrategyHN4'
+
+ # Instead of the default tempdir, let's but the temporary output in a subfolder
+ # in the same folder as this script
+ tmpdir = os.path.join(script_path, 'WORC_' + experiment_name)
+
+The actual experiment
+`````````````````````
+
+After defining the inputs, the following code can be used to run your first experiment.
+
+.. code-block:: python
+
+ # Create a Simple WORC object
+ network = SimpleWORC(experiment_name)
+
+ # Set the input data according to the variables we defined earlier
+ network.images_from_this_directory(imagedatadir,
+ image_file_name=image_file_name)
+ network.segmentations_from_this_directory(imagedatadir,
+ segmentation_file_name=segmentation_file_name)
+ network.labels_from_this_file(label_file)
+ network.predict_labels([label_name])
+
+ # Use the standard workflow for binary classification
+ network.binary_classification(coarse=coarse)
+
+ # Set the temporary directory
+ experiment.set_tmpdir(tmpdir)
+
+ # Run the experiment!
+ network.execute()
+
+.. note:: Precomputed features can be used instead of images and masks by instead using ``network.features_from_this_directory()`` in a similar fashion.
+
+Analysis of the results
+```````````````````````
+There are two main outputs: the features for each patient/object, and the overall
+performance. These are stored as .hdf5 and .json files, respectively. By
+default, they are saved in the so-called "fastr output mount", in a subfolder
+named after your experiment name.
+
+.. code-block:: python
+
+ # Locate output folder
+ outputfolder = fastr.config.mounts['output']
+ experiment_folder = os.path.join(outputfolder, 'WORC_' + experiment_name)
+
+ print(f"Your output is stored in {experiment_folder}.")
+
+ # Read the features for the first patient
+ # NOTE: we use the glob package for scanning a folder to find specific files
+ feature_files = glob.glob(os.path.join(experiment_folder,
+ 'Features',
+ 'features_*.hdf5'))
+ featurefile_p1 = feature_files[0]
+ features_p1 = pd.read_hdf(featurefile_p1)
+
+ # Read the overall peformance
+ performance_file = os.path.join(experiment_folder, 'performance_all_0.json')
+ with open(performance_file, 'r') as fp:
+ performance = json.load(fp)
+
+ # Print the feature values and names
+ print("Feature values:")
+ for v, l in zip(features_p1.feature_values, features_p1.feature_labels):
+ print(f"\t {l} : {v}.")
+
+ # Print the output performance
+ print("\n Performance:")
+ stats = performance['Statistics']
+ del stats['Percentages'] # Omitted for brevity
+ for k, v in stats.items():
+ print(f"\t {k} {v}.")
+
+.. note:: the performance is probably horrible, which is expected as we ran the experiment on coarse settings. These settings are recommended to only use for testing: see also below.
+
+
+Tips and Tricks
+```````````````
+
+For tips and tricks on running a full experiment instead of this simple
+example, adding more evaluation options, debugging a crashed network etcetera,
+please go to :ref:`usermanual-chapter` or follow the intermediate
+or advanced tutorials on `WORCTutorial Github `_.
+
+Some things we would advice to always do:
+
+* Run actual experiments on the full settings (coarse=False):
+
+.. code-block:: python
+
+ coarse = False
+ network.binary_classification(coarse=coarse)
+
+.. note:: This will result in more computation time. We therefore recommmend
+ to run this script on either a cluster or high performance PC. If so,
+ you may change the execution to use multiple cores to speed up computation
+ just before before ``experiment.execute()``:
+
+ .. code-block:: python
+
+ experiment.set_multicore_execution()
+
+* Add extensive evaluation: ``network.add_evaluation()`` before ``network.execute()``:
+
+.. code-block:: python
+
+ network.add_evaluation()
\ No newline at end of file
diff --git a/WORC/doc/_build/html/_sources/static/user_manual.rst.txt b/WORC/doc/_build/html/_sources/static/user_manual.rst.txt
index 1a5494bc..f84191ff 100644
--- a/WORC/doc/_build/html/_sources/static/user_manual.rst.txt
+++ b/WORC/doc/_build/html/_sources/static/user_manual.rst.txt
@@ -1,3 +1,5 @@
+.. usermanual-chapter:
+
User Manual
===========
@@ -22,14 +24,12 @@ The WORC toolbox consists of one main object, the WORC object:
It's attributes are split in a couple of categories. We will not discuss
the WORC.defaultconfig() function here, which generates the default
-configuration, as it is listed in a separate page, see the :ref:`config file section `.
+configuration, as it is listed in a separate page, see the :doc:`config file section `.
Attributes: Sources
--------------------
-
-
+~~~~~~~~~~~~~~~~~~~
There are numerous WORC attributes which serve as source nodes for the
FASTR network. These are:
@@ -97,42 +97,8 @@ appending procedure can be used.
did not supply a segmentation. **WORC will always align these sequences with no segmentations to the first sequence, i.e. the first object in the images_train list.**
Hence make sure you supply the sequence for which you have a ROI as the first object.
-
-
-Attributes: Settings
---------------------
-
-
-There are several attributes in WORC which define how your pipeline is
-executed:
-
-
-
-- fastr_plugin
-- fastr_tmpdir
-- Tools: additional workflows are stored here. Currently only includes
- a pipeline for image registration without any Radiomics.
-- CopyMetadata: Whether to automatically copy the metadata info
- (e.g. direction of cosines) from the images to the segmentations
- before applying transformix.
-
-An explanation of the FASTR settings is given below.
-
-
-
-Attributes: Functions
----------------------
-
-The WORC.configs() attribute contains the configparser files, which you
-can easily edit. The WORC.set() function saves these objects in a
-temporary folder and converts the filename into as FASTR source, which
-is then put in the WORC.fastrconfigs() objects. Hence you do not need to
-edit the fastrconfigs object manually.
-
-
-
Images and segmentations
-~~~~~~~~~~~~~~~~~~~~~~~~
+^^^^^^^^^^^^^^^^^^^^^^^^
@@ -148,7 +114,7 @@ image formats such as DICOM, NIFTI, TIFF, NRRD and MHD.
Semantics
-~~~~~~~~~
+^^^^^^^^^
Semantic features are used in the PREDICT CalcFeatures tool. You can
supply these as a .csv listing your features per patient. The first
@@ -183,7 +149,7 @@ case, your sources should look as following:
Labels
-~~~~~~
+^^^^^^
The labels are used in classification. For PREDICT, these should be
supplied as a .txt file. Similar to the semantics, the first column
@@ -193,7 +159,7 @@ semantics file.
Masks
------------
+^^^^^
WORC contains a segmentation preprocessing tool, called segmentix. This
tool is still under development. The idea is that you can manipulate
@@ -204,7 +170,7 @@ radius around your ROI and mask it.
Features
---------
+^^^^^^^^
If you already computed your features, e.g. from a previous run, you can
directly supply the features instead of the images and segmentations and
@@ -213,7 +179,7 @@ matching the PREDICT CalcFeatures format.
Metadata
---------
+^^^^^^^^
This source can be used if you want to use tags from the DICOM header as
features, e.g. patient age and sex. In this case, this source should
@@ -224,7 +190,7 @@ implemented tags.
Elastix_Para
-------------
+^^^^^^^^^^^^
If you have multiple images for each patient, e.g. T1 and T2, but only a
single segmentation, you can use image registration to align and
@@ -237,9 +203,39 @@ is made on the first WORC.images source you supply. The segmentation
will be alingned to all other image sources.**
+Attributes: Settings
+~~~~~~~~~~~~~~~~~~~~
+
+
+There are several attributes in WORC which define how your pipeline is
+executed:
+
+
+
+- fastr_plugin
+- fastr_tmpdir
+- Tools: additional workflows are stored here. Currently only includes
+ a pipeline for image registration without any Radiomics.
+- CopyMetadata: Whether to automatically copy the metadata info
+ (e.g. direction of cosines) from the images to the segmentations
+ before applying transformix.
+
+An explanation of the FASTR settings is given below.
+
+
+
+Attributes: Functions
+~~~~~~~~~~~~~~~~~~~~~
+
+The WORC.configs() attribute contains the configparser files, which you
+can easily edit. The WORC.set() function saves these objects in a
+temporary folder and converts the filename into as FASTR source, which
+is then put in the WORC.fastrconfigs() objects. Hence you do not need to
+edit the fastrconfigs object manually.
+
FASTR settings
---------------
+~~~~~~~~~~~~~~
There are two WORC attributes which contain settings on running FASTR.
In WORC.fastr_plugin, you can specify which Execution Plugin should be
@@ -250,10 +246,8 @@ The default is the ProcessPollExecution plugin. The WORC.fastr_tempdir
sets the temporary directory used in your run.
-
Construction and execution commands
------------------------------------
-
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
After supplying your sources, you need to build the FASTR network. This
@@ -273,3 +267,101 @@ WORC.source_data_data and WORC.sink objects.
Finally, after completing above steps, you can execute the network
through the WORC.execute() command.
+
+
+Evaluation of your network
+--------------------------
+
+In WORC, there are two options for testing your fitted models:
+
+1. Single dataset: cross-validation (currently only random-split)
+2. Separate train and test dataset: bootstrapping on test dataset
+
+Within these evaluation settings, the following performance evaluation methods are used:
+
+1. Confidence intervals on several metrics:
+
+ For classification:
+
+ a) Area under the curve (AUC) of the receiver operating characteristic (ROC) curve. In a multiclass setting, weuse the multiclass AUC from the `TADPOLE Challenge `_.
+ b) Accuracy.
+ c) Balanced classification accuracy as defined by the `TADPOLE Challenge `_.
+ d) F1-score
+ e) Sensitivity, aka recall or true positive rate
+ f) Specificity, aka true negative rate
+ g) Negative predictive value (NPV)
+ h) Precision, aka Positive predictive value (PPV)
+
+ For regression:
+
+ a) R2-score
+ b) Mean Squared Error (MSE)
+ c) Intraclass Correlation Coefficient (ICC)
+ d) Pearson correlation coefficient and p-value
+ e) Spearmand correlation coefficient and p-value
+
+ For survival, in addition to the regression scores:
+ a) Concordance index
+ b) Cox regression coefficient and p-value
+
+ In cross-validation, by default, 95% confidence intervals for the mean performance measures are constructed using
+ the corrected resampled t-test base on all cross-validation iterations, thereby taking into account that the samples
+ in the cross-validation splits are not statistically independent. See als
+ `Nadeau C, Bengio Y. Inference for the generalization error. In Advances in Neural Information Processing Systems, 2000; 307–313.`
+
+ In bootstrapping, 95% confidence intervals are created using the ''standard'' method according to a normal distribution: see Table 6, method 1 in `Efron B., Tibshirani R. Bootstrap Methods for Standard Errors,
+ Confidence Intervals, and Other Measures of Statistical Accuracy, Statistical Science Vol.1, No,1, 54-77, 1986`.
+
+2. ROC curve with 95% confidence intervals using the fixed-width bands method, see `Macskassy S. A., Provost F., Rosset S. ROC Confidence Bands: An Empirical Evaluation. In: Proceedings of the 22nd international conference on Machine learning. 2005.`
+
+3. Univariate statistical testing of the features using:
+
+ a) A student t-test
+ b) A Welch test
+ c) A Wilcoxon test
+ d) A Mann-Whitney U test
+
+ The uncorrected p-values for all these tests are reported in a single excel sheet. Pick the right test and significance
+ level based on your assumptions. Normally, we make use of the Mann-Whitney U test, as our features do not have to be normally
+ distributed, it's nonparametric, and assumes independent samples.
+
+4. Ranking patients from typical to atypical as determined by the model, based on either:
+
+ a) The percentage of times a patient was classified correctly when occuring in the test set. Patients always correctly classified
+ can be seen as typical examples; patients always classified incorrectly as atypical.
+ b) The mean posterior of the patient when occuring in the test set.
+
+ These measures can only be used in classification. Besides an Excel with the rankings, snapshots of the middle slice
+ of the image + segmentation are saved with the ground truth label and the percentage/posterior in the filename. In
+ this way, one can scroll through the patients from typical to atypical to distinguish a pattern.
+
+5. A barchart of how often certain features groups were selected in the optimal methods. Only useful when using
+ groupwise feature selection.
+
+By default, only the first evaluation method, e.g. metric computation, is used. The other methods can simply be added
+to WORC by using the ``add_evaluation()`` function, either directly in WORC or through the facade:
+
+
+.. code-block:: python
+
+ import WORC
+ network = WORC.WORC('somename')
+ label_type = 'name_of_label_predicted_for_evaluation'
+ ...
+ network.add_evaluation(label_type)
+
+.. code-block:: python
+
+ import WORC
+ from WORC import IntermediateFacade
+ I = IntermediateFacade('somename')
+ ...
+ I.add_evaluation()
+
+Debugging
+---------
+
+As WORC is based on fastr, debugging is similar to debugging a fastr pipeline: see therefore also
+`the fastr debugging guidelines `_.
+
+If you run into any issue, please create an issue on the `WORC Github `_.
\ No newline at end of file
diff --git a/WORC/doc/_build/html/_static/documentation_options.js b/WORC/doc/_build/html/_static/documentation_options.js
index ef965269..12a2ee43 100644
--- a/WORC/doc/_build/html/_static/documentation_options.js
+++ b/WORC/doc/_build/html/_static/documentation_options.js
@@ -1,6 +1,6 @@
var DOCUMENTATION_OPTIONS = {
URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'),
- VERSION: '3.0.0',
+ VERSION: '3.1.0',
LANGUAGE: 'None',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
diff --git a/WORC/doc/_build/html/autogen/WORC.IOparser.html b/WORC/doc/_build/html/autogen/WORC.IOparser.html
index 64ae7bc1..946a5d06 100644
--- a/WORC/doc/_build/html/autogen/WORC.IOparser.html
+++ b/WORC/doc/_build/html/autogen/WORC.IOparser.html
@@ -8,7 +8,7 @@
- IOparser Package — WORC 3.0.0 documentation
+ IOparser Package — WORC 3.1.0 documentation
@@ -61,7 +61,7 @@