further improved the Readme

Feelx234 · Feelx234 · commit 7e20f614f454 · 2025-01-21T17:20:06.000+01:00
diff --git a/README.md b/README.md
@@ -4,18 +4,21 @@
 fast1dkmeans
 ========
 
-A Python library which implements several variations of optimal *k*-means clustering on 1D data, based on the algorithms presented by Gronlund et al. (2017). This package is inspired by the [kmeans1d](https://github.com/dstein64/kmeans1d) package but extends it by implementing additional algorithms, in particular those with reduced memory requirements O(n) instead of O(kn).
+A Python library which implements several algorithms to optimally solve *k*-means clustering on 1D data.
+Unlike in higher dimensions, the optimal solutions can be found efficiently.
+The selection of algorithms is based on those presented by Gronlund et al. (2017).
 
-There are several different ways to compute the optimal k-means clustering in 1d.
-Currently the package implements the following methods:
-- `"binary-search-interpolation"` *default* [O(n lg(U) ), O(n) space, "wilber-interpolation"]
-- `"dynamic-programming-kn"` [O(kn), O(kn) space]
-- `"dynamic-programming-space"` [O(kn), O(n) space, "dp-linear"]
-- `"binary-search-normal"` [O(n lg(U) ), O(n) space, section 2.4, "wilber-binary"]
+This package is inspired by the [kmeans1d](https://github.com/dstein64/kmeans1d) package but improves it by implementing additional algorithms with memory requirements of $O(n)$ instead of $O(kn)$. This makes it easier and faster to use *fast1dkmeans* for larger values on $n$ and $k$.
 
+Currently this package implements the following algorithms:
+- `"binary-search-interpolation"` *default* [O(n lg(U)), O(n) space]
+- `"dynamic-programming-kn"` [O(kn), O(kn) space]
+- `"dynamic-programming-space"` [O(kn), O(n) space]
+- `"binary-search-normal"` [O(n lg(U) ), O(n) space]
 
+All the methods rely on first sorting the values to be clustered which is omitted in the runtime analysis.
 
-The code is written in Python and relies on the [numba](https://numba.pydata.org/) compiler for speed.
+The code is written in Python but all the number crunching is done in compiled code. To achieve this, this project relies on the [numba](https://numba.pydata.org/) compiler for speed.
 
 Requirements
 ------------
@@ -25,7 +28,8 @@ Requirements
 Installation
 ------------
 
-[fast1dkmeans](https://pypi.python.org/pypi/fast1dkmeans) is available on PyPI, the Python Package Index.
+[fast1dkmeans](https://pypi.python.org/pypi/fast1dkmeans) is available on PyPI, the Python Package Index. It can thus be installed by the following:
+
 
 ```sh
 $ pip3 install fast1dkmeans
@@ -34,18 +38,41 @@ $ pip3 install fast1dkmeans
 Example Usage
 -------------
 
+A simple use of this package is shown below, where we want to cluster the list of values in `x` into four (`k = 4`) clusters.
+The optimal clustering of `x` into four groups is pretty obvious as there are essentially four groups of values.
+One group around 4.1, one group around -50, one group around 200 and the last group around 100. Let us use *fast1dkmeans* to find the optimal clustering.
+
 ```python
 import fast1dkmeans
 
-x = [4.0, 4.1, 4.2, -50, 200.2, 200.4, 200.9, 80, 100, 102]
+x = [4.0, 4.1, 4.2, -50, 201, 200.4, 80, 102, 100, 200.9, 200.2]
 k = 4
 
 clusters = fast1dkmeans.cluster(x, k)
 
-print(clusters)   # [1, 1, 1, 0, 3, 3, 3, 2, 2, 2]
+print(clusters)
+ # [1, 1, 1, 0, 3, 3, 2, 2, 2, 3, 3]
 ```
 
-Important notice: On first usage the the code is compiled once which may take about 30s. On subsequent usages this is no longer necessary and execution is much faster.
+The resulting array `clusters` consists of integers indicating the cluster memberships of values in `x`.
+The first three values of `clusters` (three ones) indicate that the first three values of `x` (`[4.0, 4.1, 4.2]`) should be its own cluster (these are the only ones in `clusters`).
+The fourth value of `clusters` is the only zero and shows that the fourth value of `x` (-50) should be it's own cluster.
+The threes (`3`) in `clusters` indicate that the values `[200.2, 200.4, 200.9, 201]` should be one cluster. Lastly the remaining two's (`2`) form the last cluster of the values [80,100,102].
+
+
+A different method of clustering can be chosen by passing a keyword argument. Below we for example choose the space reduced dynamic program.
+```python
+clusters = fast1dkmeans.cluster(x, k, method='dynamic-programming-space')
+print(clusters)
+ # [1, 1, 1, 0, 3, 3, 2, 2, 2, 3, 3]
+```
+
+All the algorithms will return one optimal clustering (of the potentially many) but they runtime and space requirements are very different.
+
+
+*Important notice*: On first usage the the code is compiled once which may take about 30s. On subsequent usages this is no longer necessary and execution is much faster.
+
+
 
 Tests
 -----