Skip to content

Commit b32dcd3

Browse files
authored
Fixes and improves examples in README and documentation. (finos#32)
* Fixes and improves examples in README and documentation. Signed-off-by: rafa-be <raphael@noisycamp.com> * Shutdown any currently running backend when exiting Python. Signed-off-by: rafa-be <raphael@noisycamp.com> * Adds new examples (random forest and bigram counter). Signed-off-by: rafa-be <raphael@noisycamp.com> * Adds the examples to the documentation. Signed-off-by: rafa-be <raphael@noisycamp.com> * Bump version number. Signed-off-by: rafa-be <raphael@noisycamp.com> * Run and test the documentation examples before publishing. Signed-off-by: rafa-be <raphael@noisycamp.com> * Donwload the example file for the bigram example instead of storing it in the repository. Signed-off-by: rafa-be <raphael@noisycamp.com> * Removes the use of argparse in examples. Signed-off-by: rafa-be <raphael@noisycamp.com> * Shows the package version in the documentation's sidebar. Signed-off-by: rafa-be <raphael@noisycamp.com> * Uses Scaler's default address for the local managed backend. Signed-off-by: rafa-be <raphael@noisycamp.com> * Rewrite and improve examples from the quickstart tutorial. Signed-off-by: rafa-be <raphael@noisycamp.com> --------- Signed-off-by: rafa-be <raphael@noisycamp.com>
1 parent c577402 commit b32dcd3

File tree

28 files changed

+720
-387
lines changed

28 files changed

+720
-387
lines changed

.github/workflows/linter.yml

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ jobs:
3434
python -m pip install --upgrade pip
3535
pip install flake8 pyproject-flake8 mypy
3636
pip install -r requirements.txt
37-
pip install pandas dask[distributed]
37+
pip install pandas dask[distributed] scaler
3838
- name: Lint with flake8
3939
run: |
4040
pflake8 .
@@ -44,3 +44,14 @@ jobs:
4444
- name: Run python unittest
4545
run: |
4646
python -m unittest discover -v tests
47+
- name: Run the examples
48+
run: |
49+
find examples -type f -name '*.py' ! -name '__init__.py' | while read -r file; do
50+
echo "Running $file"
51+
PYTHONPATH=. python "$file"
52+
done
53+
- name: Test the documentation examples
54+
run: |
55+
pip install -r docs/requirements.txt
56+
cd docs
57+
make doctest

README.md

Lines changed: 38 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -31,12 +31,16 @@ and distributed systems**.
3131
The main feature of the library is its `@parfun` decorator that transparently executes standard Python functions
3232
following the [map-reduce](https://en.wikipedia.org/wiki/MapReduce) pattern:
3333

34+
3435
```Python
36+
from typing import List
37+
3538
from parfun import parfun
3639
from parfun.combine.collection import list_concat
3740
from parfun.partition.api import per_argument
3841
from parfun.partition.collection import list_by_chunk
3942

43+
4044
@parfun(
4145
split=per_argument(
4246
values=list_by_chunk
@@ -45,32 +49,42 @@ from parfun.partition.collection import list_by_chunk
4549
)
4650
def list_pow(values: List[float], factor: float) -> List[float]:
4751
return [v**factor for v in values]
52+
53+
54+
if __name__ == "__main__":
55+
from parfun.entry_point import set_parallel_backend_context
56+
57+
with set_parallel_backend_context("local_multiprocessing"): # use a local pool of processes
58+
print(list_pow([1, 2, 3], 2)) # runs in parallel, prints [1, 4, 9]
4859
```
4960

5061

5162
## Features
5263

53-
* **Provides significant speedups** to existing Python functions
54-
* **Does not require any deep knowledge of parallel or distributed computing systems**
55-
* **Automatically estimates the optimal sub-task splitting** (the *partition size*)
56-
* **Automatically handles data transmission, caching and synchronization**.
57-
* **Supports various distributed computing backends**, including Python's multiprocessing,
58-
[Scaler](https://github.com/citi/scaler) or Dask.
64+
* **Provides significant speedups** to existing Python functions.
65+
* **Only requires basic understanding of parallel and distributed computing systems**.
66+
* **Automatically estimates the optimal sub-task splitting strategy** (the *partition size*).
67+
* **Automatically handles data transmission, caching, and synchronization**.
68+
* **Supports various distributed computing backends**:
69+
- Python's built-in [multiprocessing module](https://docs.python.org/3/library/multiprocessing.html).
70+
- [Scaler](https://github.com/citi/scaler).
71+
- [Dask](https://www.dask.org/).
5972

6073

61-
## Benchmarks
74+
## Quick Start
6275

63-
**Parfun efficiently parallelizes short-duration functions**.
6476

65-
When running a short 0.28-second ML function on an AMD Epyc 7313 16-Cores Processor, Parfun provides an impressive
66-
**7.4x speedup**. Source code for this experiment [here](benchmarks/california_housing.py).
77+
Install Parfun directly from PyPI:
6778

68-
![Benchmark Results](benchmarks/california_housing_results.svg)
79+
```bash
80+
pip install parfun
81+
pip install "parfun[pandas,scaler,dask]" # with optional dependencies
82+
```
6983

84+
The official documentation is available at [citi.github.io/parfun/](https://citi.github.io/parfun/).
7085

71-
## Quick Start
72-
73-
The official documentation is availaible at [citi.github.io/parfun/](https://citi.github.io/parfun/).
86+
Take a look at our documentation's [quickstart tutorial](https://citi.github.io/parfun/tutorials/quickstart.html) to get
87+
more examples and a deeper overview of the library.
7488

7589
Alternatively, you can build the HTML documentation from the source code:
7690

@@ -80,10 +94,17 @@ pip install -r requirements.txt
8094
make html
8195
```
8296

83-
The documentation's main page can then ben found at `docs/build/html/index.html`.
97+
The documentation's main page can then be found at `docs/build/html/index.html`.
8498

85-
Take a look at our documentation's [quickstart tutorial](https://citi.github.io/parfun/tutorials/quickstart.html) to get
86-
more examples and a deeper overview of the library.
99+
100+
## Benchmarks
101+
102+
**Parfun effectively parallelizes even short-duration functions**.
103+
104+
When running a short 0.28-second ML function on an AMD Epyc 7313 16-Cores Processor, Parfun provides an impressive
105+
**7.4x speedup**. Source code for this experiment [here](examples/california_housing/main.py).
106+
107+
![Benchmark Results](images/benchmark_results.svg)
87108

88109

89110
## Contributing

docs/source/_static/style.css

Lines changed: 0 additions & 20 deletions
This file was deleted.

docs/source/_templates/layout.html

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,11 @@
11
{% extends "!layout.html" %}
2-
{% block extrahead %}
3-
<link href="{{ pathto("_static/style.css", True) }}" rel="stylesheet" type="text/css">
2+
3+
{% block sidebartitle %}
4+
<div class="switch-menus">
5+
<div class="version">
6+
<strong>Version:</strong> {{ version }}
7+
</div>
8+
</div>
9+
10+
{{ super() }}
411
{% endblock %}

docs/source/conf.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,6 @@
2121
project = "Parfun"
2222
author = "Citi"
2323

24-
2524
version = __import__("parfun").__version__
2625
release = f"{version}-py3-none-any"
2726

docs/source/index.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,12 @@
11
Welcome to Parfun's documentation!
22
================================================
33

4-
**Parfun** is a lightweight and user-friendly Python library to assist users in running a pure Python function i
5-
parallel on multiple core and distributed systems.
4+
**Parfun** is a lightweight and user-friendly Python library to assist users in running a Python function in
5+
parallel.
66

7-
Users, who do not have any deep knowledge about parallelism and distributed computing systems, can benefit from
8-
significant speedups when running Python code.
7+
With limited knowledge of parallelism and distributed systems, users can significantly speedup their Python code.
98

10-
Parfun supports multiple execution backend, including Python's multiprocessing, Dask or Scaler.
9+
Parfun supports multiple execution backend, including Python's built-in multiprocessing, Dask and Scaler.
1110

1211

1312
Content
@@ -17,6 +16,7 @@ Content
1716
:maxdepth: 2
1817

1918
tutorials/quickstart
19+
tutorials/examples
2020
tutorials/implementation_details
2121
api/index
2222

docs/source/tutorials/examples.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
Examples
2+
========
3+
4+
Parallely count bigrams in a text
5+
---------------------------------
6+
7+
.. literalinclude:: ../../../examples/count_bigrams/main.py
8+
:language: python
9+
:linenos:
10+
11+
12+
Parallely train a random tree regressor
13+
---------------------------------------
14+
15+
.. literalinclude:: ../../../examples/california_housing/main.py
16+
:language: python
17+
:linenos:

0 commit comments

Comments
 (0)