CRATE v0.6.0
Description
Included additional clustering algorithms to compute the Cluster-reduced Representative Volume Element (CRVE) and two algorithms to perform the standardization of the global clustering data matrix.
New Features and Improvements
-
New clustering algorithms. Among the myriad of clustering algorithms proposed in the literature, some clustering algorithms have been selected and are now available in CRATE besides the standard k-Means clustering algorithm. These clustering algorithms were added following three criteria: (1) first, their implementation availability in open source packages or repositories, (2) second, their suitability to efficiently handle large datasets and (3) third, their potential to develop advanced clustering-based analysis strategies. In this regard, it is important to remark that the appropriate characterization of the feature data space and clusters shape, and the actual performance of clustering algorithms has not yet been made. Such studies may reveal that some of the clustering algorithms included in this version are in fact not suitable or, in alternative, suggest that additional clustering algorithms are required. The clustering algorithms available in this version are:
- K-Means (source: scikit-learn);
- K-Means (source: pyclustering);
- Mini-Batch K-Means (source: scikit-learn);
- Agglomerative (source: scikit-learn)
- Agglomerative (source: scipy)
- Agglomerative (source: fastcluster)
- Birch (source: scikit-learn)
- Birch (source: pyclustering)
- Cure (source: pyclustering)
- X-Means (source: pyclustering)
-
Different implementations of the same clustering algorithm are included for non-dependence of a single source and/or to benefit from additional methods available from different sources. Moreover, most of the clustering algorithm's hyperparameters are left with the default values from the original implementation source (i.e., have not been tuned by any means). The choice of a given clustering algorithm is made from the already existent clustering scheme keyword (specification documented in CRATE's input data file in the section Clustering scheme) .
-
Clustering data standardization. The global clustering data matrix is now standardized before the cluster analysis procedure through one of two available and commonly used standardization procedures: (1) Min-Max Scaler (default); (2) Standard Normal Distribution Scaler. The standardization method can be prescribed as usual through an associated keyword specification (specification documented in CRATE's input data file in the section Clustering data standardization).
Bug Fixes
- Self-Consistent Scheme. The default self-consistent scheme has been changed to the regression-based scheme and CRATE's input data file documentation has been updated to reflect the optional nature of this parameter.