Skip to content

attilabalint/intro-to-distributed-computation-in-python

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distributed Computation in Python

Lyttleton Harbour, N.Z., Inside the Breakwater - John Gib - 1886
Photographed at the Christchurch Art Gallery in 2016

Two hour tutorial at Kiwi Pycon 2022 in Ōtautahi (Christchurch) New Zealand.

Covers the foundations needed to be effective with running computation across many workers.

About Me

Senior Data Scientist @ Orkestra Energy - I live in Christchurch - LinkedIn - blog.

Outcomes Of This Tutorial

  • functional programming fundamentals - map, filter, functools.reduce,
  • CPU cores, threads & processes,
  • concurrency, parallelism & asynchrony,
  • why & how to use multiprocessing and asyncio,
  • introduction to the ecosystem for distributing compute over many machines in Python,
  • demonstration of using an EC2/Dask/Coiled/Prefect stack to deploy a Dask cluster to EC2.

Agenda

1. Functional Programming - run in binder

20 min theory, 20 min practical

Warmup session introducing functional programming in Python.

2. Single Machine - run in binder

30 min theory, 20 min practical

Options for distributing computation on a single machine with the Python standard library - doing many things at once on single machine.

  • threads & threading, processes & multiprocessing, asyncio,
  • why not to used threads in Python (possible but dangerous),
  • multiprocessing for CPU bound tasks,
  • asyncio for IO bound tasks,
  • 3 short exercises & one large exercise,

3. Many Machines - read in binder

10 min theory, 15 min demonstration, balance practical

  • options for distributing computation on the cloud (many machines) in Python,
  • demonstration of distributing compute over an AWS EC2 cluster with Dask, Coiled & Prefect,
  • note that this notebook 3-many-machines will not run in Binder - you can run locally combined with a $ pip install -r requirements-coiled.txt.

Setup

As this repo targets two environments (Binder & Coiled):

  • requirements.txt = Binder (important to not install Dask in Binder),
  • requirements-coiled.txt = Coiled (will run on EC2 instances),
  • requirements-coiled.txt = local development.

Python version is important for Dask/Coiled.

To setup Python locally:

$ make setup

About

Two hour tutorial at Kiwi Pycon 2022.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 83.3%
  • Python 16.5%
  • Makefile 0.2%