From 37d88ed58bcdec5bb22b4009aa844aeeb32dc093 Mon Sep 17 00:00:00 2001
From: Madhukar Mishra <madhukar.mishra@grofers.com>
Date: Sat, 5 Jan 2019 12:05:16 +0530
Subject: [PATCH] Add readme with notes

---
 README.md                | 168 +++++++++++++++++++++++++++++++++++++++
 python_concurrency.ipynb |   6 +-
 2 files changed, 171 insertions(+), 3 deletions(-)
 create mode 100644 README.md

diff --git a/README.md b/README.md
new file mode 100644
index 0000000..95da1ab
--- /dev/null
+++ b/README.md
@@ -0,0 +1,168 @@
+[Talk at pybits](https://youtu.be/rzpP05WJilo)
+
+# Setup
+1. Install pipenv
+`pip install pipenv`
+2. Install python dependencies
+`pipenv install`
+
+# Running
+## notebook
+`pipenv run jupyter notebook python_concurrency.ipynb`
+## webserver (required for http client examples)
+`pipenv run python scripts/server.py`
+
+# Notes
+
+### The free lunch is over
+- we've gone from khz to ghz in 20 years
+- but we can't fit more transistors onto ICs anymore.
+
+- Adding resources to a system is called vertical scaling
+  If we can't vertically scale up, how do we achieve more throughput ?
+
+- we do it by adding 'more' of that resource, instead of using more powerful
+  resource
+
+- This is called scaling horizontally
+
+- some languages which offer better constructs for concurrency, and they are
+    gaining traction.
+
+### cocurrency is the way forward
+- Ahmdal's Law says that throughput gains are bounded by how much of the program
+  has to run in a sequential manner.
+  (The more you can can parallelize, the faster you can go)
+
+ - web scale applications like gmail take this to its pinnacle.
+ - Their application runs on millions of CPUs.
+ - Those don't typically fit on a sigle machine
+ - that becomes the domain distributed computing
+
+ - You can run tensorflow on your puny laptop,
+  or train your models on the cloud where
+  multitudes of processors will help get the job done much faster.
+
+ - This is explains a rise of languages that can tackle the challenges of distributed computing head on.
+
+- better resource utilization since we don't waste CPU cycles busy waiting
+- allows us to make games and UIs, applications where user shouldn't be kept
+    waiting by the program for feedback
+- The real world is concurrent !
+
+- The phenomena we're talking about is everpresent around us, though it's of particular interest in CS.
+- If all this talk of processors, and web scale confused you, let's take a more grounded example.
+- there's been way too much talk without seeing any action
+
+## Concurrency is not parallelism
+- Knowing this distinction allows you to pick the right tool for the right job
+
+- by intelligently mixing CPU-bound and I/O-bound threads, developers can get the most
+efficiency out of their code.
+
+- Apparently, Python is bad for programming concurrent programs because only one
+  python thread can run at a time.  That is unfortunately correct.
+
+- but if that were the case. python would lose out to other languages.
+  though it seems to be going from strength to strength
+
+## Threading in python
+Advantages: so what are they good for ?
+
+- scheduled preemptively - which means that the scheduler decides when to switch
+  context,
+  so you don't have to handle it in code.
+  Python threads map directly on to OS threads, and are scheduled by the OS.
+  Also Cpython avoids a lot of complexity in it's own implementation.
+
+- Multiple threads are excellent for speeding up blocking I/O bound programs,
+  because the scheduler is I/O aware. Instead of waiting on I/0
+
+- don't have to write any extra code for it. by the OS, easier python implementation
+
+- They have a smaller memory footprint than processes.
+
+- Threads share resources, and thus communication between them is easier
+
+### Disdavantages
+- GIL means no parallel threads
+
+- While communication between threads may be easier,
+- you must be very careful not to implement code that is subject to race conditions.
+  It is a comparitively quite difficult to get this right.
+
+- Hard to test, hard to spot bugs in the code, hard to reproduce bugs.
+
+- It's computationally expensive to switch context between multiple threads.
+  By adding multiple threads, you could see a degradation in your program's
+  overall performance if not used correctly. Especially with irresponsible locking
+
+- Preemptive scheduling is a double edged sword like most things in CS.
+- Easy to execute threads, but since you can't make any assumptions about
+  when the task switch might happen, you have to guard portions of your code
+  that access a shared resource(The critical section).
+
+
+### The dreaded GIL
+
+- Python threads have to acquire a mutex lock on the interpreter to execute.
+
+- What that essentially means is that only one thread can run in a python
+  process at one time, utilizing only one core at a time.
+
+- It is necessary because python interpreters internal datastructures, aren't thread safe.
+
+- people have been successful at removing the GIL before,
+but at the cost having severly degraded single CPU performance, lowering overall performance quite a bit in
+most cases.
+
+- Note that the GIL is present in only the default implementation - CPython,
+  and doesn't exist in runtimes who support parallel threads like Jython, IronPython
+
+## Multiprocessing
+
+- The model of having GIL with parallelism deferred to the OS via the
+
+- For utilizing multiple cores, multiprocessing module works well.
+
+### Advantages:
+- They are better than multiple threads at handling CPU-intensive tasks
+
+- We can sidestep the limitations of the GIL by spawning multiple processes
+
+- workers model - eg. __Gunicorn__ - If you've run any python webapp/service in production, you might have heard of it.
+- Suppose your program is leaking memory, by no fault for your own, It's a bug
+  in one of the libraries you are using. You can kill off the process when it
+  isn't serving a requests, if its memory grows beyond some threshold. You can't
+  do something like this with threads or coroutines.
+
+### Disadvantages:
+- No shared memory between processes - have to implement some form of IPC, which is much more resource heavy on the computer.
+
+- slower context switches.
+
+- more startup cost.
+
+## Async
+### Advantages:
+- very low cost, since no context switch. Incidentally they're cheaper than function calls. Cheaper switching mechanism among all techniques. People prefer this model to locking,
+- No cost of synchronization = less CPU consumption. Async servers > threaded servers. You can run many many more async tasks than threads.
+
+### disadvantages:
+- Need code that gives up control
+- Need an event loop
+- Everything has to be non-blocking.
+- Need to learn a lot of new things - new syntax, new libraries(aysnc versions), eventloops, futures.
+- Your program doesn't have to block on a slow network call, your program can be doing other things. This is the foundation of AJAX
+- highly successful model for web servers. This is how nodejs and nginx handle massive scale.
+
+## conclusion
+
+So this is what we have realized so far:
+
+- Sync: Blocking operations.
+- Async: Non blocking operations.
+- Concurrency: Making progress together.
+- Parallelism: Making progress in parallel.
+- locking: avoid
+- Parallelism implies Concurrency. But Concurrency doesn’t always mean Parallelism.
diff --git a/python_concurrency.ipynb b/python_concurrency.ipynb
index b130929..bcb43dd 100644
--- a/python_concurrency.ipynb
+++ b/python_concurrency.ipynb
@@ -109,7 +109,7 @@
     "        time.sleep(4)\n",
     "        self.order_taken = True\n",
     "        waiter.take_order()\n",
-    "        print(f\"{self.name} has given it's order to {waiter.name}\")\n",
+    "        print(f\"{self.name} has given its order to {waiter.name}\")\n",
     "\n",
     "tables = [Table(\"A\"), Table(\"B\"), Table(\"C\"), Table(\"D\"), Table(\"E\")]\n",
     "waiter = Waiter(\"John Doe\")\n",
@@ -160,7 +160,7 @@
     "        time.sleep(4)\n",
     "        self.order_taken = True\n",
     "        waiter.take_order()\n",
-    "        print(f\"{self.name} has given it's order to {waiter.name}\")\n",
+    "        print(f\"{self.name} has given its order to {waiter.name}\")\n",
     "\n",
     "tables = [Table(\"A\"), Table(\"B\"), Table(\"C\"), Table(\"D\"), Table(\"E\")]\n",
     "waiter = Waiter(\"John Doe\")\n",
@@ -456,7 +456,7 @@
     "        # Execution will resume after 4 s.\n",
     "        self.order_taken = True\n",
     "        waiter.take_order()\n",
-    "        print(f\"{self.name} has given it's order to {waiter.name}\")\n",
+    "        print(f\"{self.name} has given its order to {waiter.name}\")\n",
     "\n",
     "tables = [Table(\"A\"), Table(\"B\"), Table(\"C\"), Table(\"D\"), Table(\"E\")]\n",
     "waiter = Waiter(\"John Doe\")\n",