You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+39-12Lines changed: 39 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,18 +4,21 @@
4
4
fast1dkmeans
5
5
========
6
6
7
-
A Python library which implements several variations of optimal *k*-means clustering on 1D data, based on the algorithms presented by Gronlund et al. (2017). This package is inspired by the [kmeans1d](https://github.com/dstein64/kmeans1d) package but extends it by implementing additional algorithms, in particular those with reduced memory requirements O(n) instead of O(kn).
7
+
A Python library which implements several algorithms to optimally solve *k*-means clustering on 1D data.
8
+
Unlike in higher dimensions, the optimal solutions can be found efficiently.
9
+
The selection of algorithms is based on those presented by Gronlund et al. (2017).
8
10
9
-
There are several different ways to compute the optimal k-means clustering in 1d.
10
-
Currently the package implements the following methods:
This package is inspired by the [kmeans1d](https://github.com/dstein64/kmeans1d) package but improves it by implementing additional algorithms with memory requirements of $O(n)$ instead of $O(kn)$. This makes it easier and faster to use *fast1dkmeans* for larger values on $n$ and $k$.
15
12
13
+
Currently this package implements the following algorithms:
All the methods rely on first sorting the values to be clustered which is omitted in the runtime analysis.
17
20
18
-
The code is written in Python and relies on the [numba](https://numba.pydata.org/) compiler for speed.
21
+
The code is written in Python but all the number crunching is done in compiled code. To achieve this, this project relies on the [numba](https://numba.pydata.org/) compiler for speed.
19
22
20
23
Requirements
21
24
------------
@@ -25,7 +28,8 @@ Requirements
25
28
Installation
26
29
------------
27
30
28
-
[fast1dkmeans](https://pypi.python.org/pypi/fast1dkmeans) is available on PyPI, the Python Package Index.
31
+
[fast1dkmeans](https://pypi.python.org/pypi/fast1dkmeans) is available on PyPI, the Python Package Index. It can thus be installed by the following:
32
+
29
33
30
34
```sh
31
35
$ pip3 install fast1dkmeans
@@ -34,18 +38,41 @@ $ pip3 install fast1dkmeans
34
38
Example Usage
35
39
-------------
36
40
41
+
A simple use of this package is shown below, where we want to cluster the list of values in `x` into four (`k = 4`) clusters.
42
+
The optimal clustering of `x` into four groups is pretty obvious as there are essentially four groups of values.
43
+
One group around 4.1, one group around -50, one group around 200 and the last group around 100. Let us use *fast1dkmeans* to find the optimal clustering.
Important notice: On first usage the the code is compiled once which may take about 30s. On subsequent usages this is no longer necessary and execution is much faster.
57
+
The resulting array `clusters` consists of integers indicating the cluster memberships of values in `x`.
58
+
The first three values of `clusters` (three ones) indicate that the first three values of `x` (`[4.0, 4.1, 4.2]`) should be its own cluster (these are the only ones in `clusters`).
59
+
The fourth value of `clusters` is the only zero and shows that the fourth value of `x` (-50) should be it's own cluster.
60
+
The threes (`3`) in `clusters` indicate that the values `[200.2, 200.4, 200.9, 201]` should be one cluster. Lastly the remaining two's (`2`) form the last cluster of the values [80,100,102].
61
+
62
+
63
+
A different method of clustering can be chosen by passing a keyword argument. Below we for example choose the space reduced dynamic program.
All the algorithms will return one optimal clustering (of the potentially many) but they runtime and space requirements are very different.
71
+
72
+
73
+
*Important notice*: On first usage the the code is compiled once which may take about 30s. On subsequent usages this is no longer necessary and execution is much faster.
0 commit comments