Skip to content

Commit 7e20f61

Browse files
committed
further improved the Readme
1 parent 7c1e5a3 commit 7e20f61

File tree

1 file changed

+39
-12
lines changed

1 file changed

+39
-12
lines changed

README.md

Lines changed: 39 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,21 @@
44
fast1dkmeans
55
========
66

7-
A Python library which implements several variations of optimal *k*-means clustering on 1D data, based on the algorithms presented by Gronlund et al. (2017). This package is inspired by the [kmeans1d](https://github.com/dstein64/kmeans1d) package but extends it by implementing additional algorithms, in particular those with reduced memory requirements O(n) instead of O(kn).
7+
A Python library which implements several algorithms to optimally solve *k*-means clustering on 1D data.
8+
Unlike in higher dimensions, the optimal solutions can be found efficiently.
9+
The selection of algorithms is based on those presented by Gronlund et al. (2017).
810

9-
There are several different ways to compute the optimal k-means clustering in 1d.
10-
Currently the package implements the following methods:
11-
- `"binary-search-interpolation"` *default* [O(n lg(U) ), O(n) space, "wilber-interpolation"]
12-
- `"dynamic-programming-kn"` [O(kn), O(kn) space]
13-
- `"dynamic-programming-space"` [O(kn), O(n) space, "dp-linear"]
14-
- `"binary-search-normal"` [O(n lg(U) ), O(n) space, section 2.4, "wilber-binary"]
11+
This package is inspired by the [kmeans1d](https://github.com/dstein64/kmeans1d) package but improves it by implementing additional algorithms with memory requirements of $O(n)$ instead of $O(kn)$. This makes it easier and faster to use *fast1dkmeans* for larger values on $n$ and $k$.
1512

13+
Currently this package implements the following algorithms:
14+
- `"binary-search-interpolation"` *default* [O(n lg(U)), O(n) space]
15+
- `"dynamic-programming-kn"` [O(kn), O(kn) space]
16+
- `"dynamic-programming-space"` [O(kn), O(n) space]
17+
- `"binary-search-normal"` [O(n lg(U) ), O(n) space]
1618

19+
All the methods rely on first sorting the values to be clustered which is omitted in the runtime analysis.
1720

18-
The code is written in Python and relies on the [numba](https://numba.pydata.org/) compiler for speed.
21+
The code is written in Python but all the number crunching is done in compiled code. To achieve this, this project relies on the [numba](https://numba.pydata.org/) compiler for speed.
1922

2023
Requirements
2124
------------
@@ -25,7 +28,8 @@ Requirements
2528
Installation
2629
------------
2730

28-
[fast1dkmeans](https://pypi.python.org/pypi/fast1dkmeans) is available on PyPI, the Python Package Index.
31+
[fast1dkmeans](https://pypi.python.org/pypi/fast1dkmeans) is available on PyPI, the Python Package Index. It can thus be installed by the following:
32+
2933

3034
```sh
3135
$ pip3 install fast1dkmeans
@@ -34,18 +38,41 @@ $ pip3 install fast1dkmeans
3438
Example Usage
3539
-------------
3640

41+
A simple use of this package is shown below, where we want to cluster the list of values in `x` into four (`k = 4`) clusters.
42+
The optimal clustering of `x` into four groups is pretty obvious as there are essentially four groups of values.
43+
One group around 4.1, one group around -50, one group around 200 and the last group around 100. Let us use *fast1dkmeans* to find the optimal clustering.
44+
3745
```python
3846
import fast1dkmeans
3947

40-
x = [4.0, 4.1, 4.2, -50, 200.2, 200.4, 200.9, 80, 100, 102]
48+
x = [4.0, 4.1, 4.2, -50, 201, 200.4, 80, 102, 100, 200.9, 200.2]
4149
k = 4
4250

4351
clusters = fast1dkmeans.cluster(x, k)
4452

45-
print(clusters) # [1, 1, 1, 0, 3, 3, 3, 2, 2, 2]
53+
print(clusters)
54+
# [1, 1, 1, 0, 3, 3, 2, 2, 2, 3, 3]
4655
```
4756

48-
Important notice: On first usage the the code is compiled once which may take about 30s. On subsequent usages this is no longer necessary and execution is much faster.
57+
The resulting array `clusters` consists of integers indicating the cluster memberships of values in `x`.
58+
The first three values of `clusters` (three ones) indicate that the first three values of `x` (`[4.0, 4.1, 4.2]`) should be its own cluster (these are the only ones in `clusters`).
59+
The fourth value of `clusters` is the only zero and shows that the fourth value of `x` (-50) should be it's own cluster.
60+
The threes (`3`) in `clusters` indicate that the values `[200.2, 200.4, 200.9, 201]` should be one cluster. Lastly the remaining two's (`2`) form the last cluster of the values [80,100,102].
61+
62+
63+
A different method of clustering can be chosen by passing a keyword argument. Below we for example choose the space reduced dynamic program.
64+
```python
65+
clusters = fast1dkmeans.cluster(x, k, method='dynamic-programming-space')
66+
print(clusters)
67+
# [1, 1, 1, 0, 3, 3, 2, 2, 2, 3, 3]
68+
```
69+
70+
All the algorithms will return one optimal clustering (of the potentially many) but they runtime and space requirements are very different.
71+
72+
73+
*Important notice*: On first usage the the code is compiled once which may take about 30s. On subsequent usages this is no longer necessary and execution is much faster.
74+
75+
4976

5077
Tests
5178
-----

0 commit comments

Comments
 (0)