aphyr · sitano · Apr 24, 2019 · Apr 24, 2019 · Apr 24, 2019 · Apr 24, 2019
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1 @@
+*.pdf
diff --git a/README.markdown b/README.markdown
@@ -9,13 +9,13 @@ lecture and discussion.  Participants will gain an intuitive understanding of
 key distributed systems terms, an overview of the algorithmic landscape, and
 explore production concerns.
 
-## What makes a thing distributed?
+## What makes a thing distributed
 
 Lamport, 1987:
 
->  A distributed system is one in which the failure of a computer
->  you didn't even know existed can render your own computer
->  unusable.
+> A distributed system is one in which the failure of a computer
+> you didn't even know existed can render your own computer
+> unusable.
 
 - First glance: \*nix boxen in our colo, running processes communicating via
   TCP or UDP.
@@ -162,7 +162,7 @@ Lamport, 1987:
   - This causes all kinds of havoc in, say, metrics collection
   - And debugging it is *hard*
   - TCP gives you flow control and repacks logical messages into packets
-    - You'll need to re-build flow-control and backpressure
+    - You'll need to re-build flow-control and back-pressure
   - TLS over UDP is a thing, but tough
 - UDP is really useful where TCP FSM overhead is prohibitive
   - Memory pressure
@@ -187,15 +187,15 @@ Lamport, 1987:
   - Caveat: Hardware can drift
   - Caveat: By *centuries*
     - NTP might not care
-    - http://rachelbythebay.com/w/2017/09/27/2153/
+    - <http://rachelbythebay.com/w/2017/09/27/2153/>
   - Caveat: NTP can still jump the clock backwards (default: delta > 128 ms)
-    - https://www.eecis.udel.edu/~mills/ntp/html/clock.html
+    - <https://www.eecis.udel.edu/~mills/ntp/html/clock.html>
   - Caveat: POSIX time is not monotonic by *definition*
     - Cloudflare 2017: Leap second at midnight UTC meant time flowed backwards
     - At the time, Go didn't offer access to CLOCK_MONOTONIC
     - Computed a negative duration, then fed it to rand.int63n(), which paniced
     - Caused DNS resolutions to fail: 1% of HTTP requests affected for several hours
-    - https://blog.cloudflare.com/how-and-why-the-leap-second-affected-cloudflare-dns/
+    - <https://blog.cloudflare.com/how-and-why-the-leap-second-affected-cloudflare-dns/>
   - Caveat: The timescales you want to measure may not be attainable
   - Caveat: Threads can sleep
   - Caveat: Runtimes can sleep
@@ -252,7 +252,6 @@ Lamport, 1987:
   - I don't know who's doing it yet, but I'd bet datacenters in the
     future will offer dedicated HW interfaces for bounded-accuracy time.
 
-
 ## Review
 
 We've covered the fundamental primitives of distributed systems. Nodes
@@ -261,9 +260,6 @@ various ways. Protocols like TCP and UDP give us primitive channels for
 processes to communicate, and we can order events using clocks. Now, we'll
 discuss some high-level *properties* of distributed systems.
 
-
-
-
 ## Availability
 
 - Availability is basically the fraction of attempted operations which succeed.
@@ -309,7 +305,6 @@ discuss some high-level *properties* of distributed systems.
     - "Apdex for the user service just dropped to 0.5; page ops!"
 - Ideally: integral of happiness delivered by your service?
 
-
 ## Consistency
 
 - A consistency model is the set of "safe" histories of events in the system
@@ -465,7 +460,6 @@ discuss some high-level *properties* of distributed systems.
     - Monotonic Reads
     - Monotonic Writes
 
-
 ### Harvest and Yield
 
 - Fox & Brewer, 1999: Harvest, Yield, and Scalable Tolerant Systems
@@ -482,8 +476,8 @@ discuss some high-level *properties* of distributed systems.
     - e.g. "99% of the time, you can read 90% of your prior writes"
   - Strongly dependent on workload, HW, topology, etc
   - Can tune harvest vs yield on a per-request basis
-   - "As much as possible in 10ms, please"
-   - "I need everything, and I understand you might not be able to answer"
+    - "As much as possible in 10ms, please"
+    - "I need everything, and I understand you might not be able to answer"
 
 ### Hybrid systems
 
@@ -505,7 +499,6 @@ consistency models generally come at the cost of performance and availability.
 Next, we'll talk about different ways to build systems, from weak to strong
 consistency.
 
-
 ## Avoid Consensus Wherever Possible
 
 ### CALM conjecture
@@ -534,7 +527,6 @@ consistency.
   - Unordered programming with flow analysis
   - Can tell you where coordination *would* be required
 
-
 ### Gossip
 
 - Message broadcast system
@@ -552,7 +544,7 @@ consistency.
   - Hop up to a connector node which relays to other connector nodes
   - Reduces superfluous messages
   - Reduces latency
-  - Plumtree (Leit ̃ao, Pereira, & Rodrigues, 2007: Epidemic Broadcast Trees)
+  - Plumtree (Leitao, Pereira, & Rodrigues, 2007: Epidemic Broadcast Trees)
 - Push-Sum et al
   - Sum inputs from everyone you've received data from
   - Broadcast that to a random peer
@@ -603,7 +595,6 @@ consistency.
   - Probably best in concert with stronger transactional systems
   - See also: COPS, Swift, Eiger, Calvin, etc
 
-
 ## Fine, We Need Consensus, What Now?
 
 - The consensus problem:
@@ -648,7 +639,6 @@ consistency.
     majority of nodes.
   - More during cluster transitions.
 
-
 ### Paxos
 
 - Paxos is the Gold Standard of consensus algorithms
@@ -740,8 +730,6 @@ transactions. Serializability and linearizability require *consensus*, which we
 can obtain through Paxos, ZAB, VR, or Raft. Now, we'll talk about different
 *scales* of distributed systems.
 
-
-
 ## Characteristic latencies
 
 - Latency is *never* zero
@@ -791,12 +779,11 @@ can obtain through Paxos, ZAB, VR, or Raft. Now, we'll talk about different
 - Network is within an order of mag compared to uncached disk seeks
   - Or faster, in EC2
     - EC2 disk latencies can routinely hit 20ms
-      - 200ms?
-        - *20,000* ms???
-          - Because EBS is actually other computers
-          - LMAO if you think anything in EC2 is real
-            - Wait, *real disks do this too*?
-              - What even are IO schedulers?
+      - 200ms? *20,000* ms???
+      - Because EBS is actually other computers
+      - LMAO if you think anything in EC2 is real
+      - Wait, *real disks do this too*?
+      - What even are IO schedulers?
 - But network is waaaay slower than memory/computation
   - If your aim is *throughput*, work units should probably take longer than a
     millisecond
@@ -849,7 +836,6 @@ latencies are short enough for many network hops before users take notice. In
 geographically replicated systems, high latencies drive eventually consistent
 and datacenter-pinned solutions.
 
-
 ## Common distributed systems
 
 ### Outsourced heaps
@@ -951,7 +937,6 @@ low-latency processing of datasets, and tend to look more like frameworks than
 databases. Their dual, distributed queues, focus on the *messages* rather
 than the *transformations*.
 
-
 ## A Pattern Language
 
 - General recommendations for building distributed systems
@@ -1093,7 +1078,7 @@ than the *transformations*.
   - Sharding for scalability
   - Avoiding coordination via CRDTs
   - Flake IDs: *mostly* time-ordered identifiers, zero-coordination
-    - See http://yellerapp.com/posts/2015-02-09-flake-ids.html
+    - See <http://yellerapp.com/posts/2015-02-09-flake-ids.html>
   - Partial availability: users can still use some parts of the system
   - Processing a queue: more consumers reduces the impact of expensive events
 
@@ -1290,8 +1275,6 @@ scale. As software grows, different components must scale independently,
 and we break out libraries into distinct services. Service structure goes
 hand-in-hand with teams.
 
-
-
 ## Production Concerns
 
 - More than design considerations
@@ -1362,8 +1345,8 @@ hand-in-hand with teams.
     - In relation to its dependencies
     - Which can, in turn, drive new tests
   - In a way, good monitoring is like continuous testing
-   - But not a replacement: these are distinct domains
-   - Both provide assurance that your changes are OK
+    - But not a replacement: these are distinct domains
+    - Both provide assurance that your changes are OK
   - Want high-frequency monitoring
     - Production behaviors can take place on 1ms scales
       - TCP incast
@@ -1381,13 +1364,13 @@ hand-in-hand with teams.
     - Key metrics for most systems
       - Apdex: successful response WITHIN latency SLA
       - Latency profiles: 0, 0.5, 0.95, 0.99, 1
-        - Percentiles, not means
-        - BTW you can't take the mean of percentiles either
+      - ^ Percentiles, not means
+      - ^ BTW you can't take the mean of percentiles either
       - Overall throughput
       - Queue statistics
       - Subjective experience of other systems latency/throughput
-        - The DB might think it's healthy, but clients could see it as slow
-        - Combinatorial explosion--best to use this when drilling into a failure
+      - ^ The DB might think it's healthy, but clients could see it as slow
+      - ^ Combinatorial explosion--best to use this when drilling into a failure
     - You probably have to write this instrumentation yourself
       - Invest in a metrics library
   - Out-of-the-box monitoring usually doesn't measure what really matters: your
@@ -1504,7 +1487,6 @@ hand-in-hand with teams.
     - Ask Jeff Hodges why it's hard: see his RICON West 2013 talk
     - See Zach Tellman - Everything Will Flow
 
-
 ## Review
 
 Running distributed systems requires cooperation between developers, QA, and
@@ -1519,10 +1501,15 @@ special care.
 
 ### Online
 
-- Mixu has a delightful book on distributed systems with incredible detail. http://book.mixu.net/distsys/
-- Jeff Hodges has some excellent, production-focused advice. https://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-for-young-bloods/
-- The Fallacies of Distributed Computing is a classic text on mistaken assumptions we make designing distributed systems. http://www.rgoarchitects.com/Files/fallacies.pdf
-- Christopher Meiklejohn has a list of key papers in distributed systems. http://christophermeiklejohn.com/distributed/systems/2013/07/12/readings-in-distributed-systems.html
+- Mixu has a delightful book on distributed systems with incredible detail.
+  <http://book.mixu.net/distsys/>
+- Jeff Hodges has some excellent, production-focused advice.
+  <https://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-for-young-bloods/>
+- The Fallacies of Distributed Computing is a classic text
+  on mistaken assumptions we make designing distributed systems.
+  <http://www.rgoarchitects.com/Files/fallacies.pdf>
+- Christopher Meiklejohn has a list of key papers in distributed systems.
+  <http://christophermeiklejohn.com/distributed/systems/2013/07/12/readings-in-distributed-systems.html>
 
 ### Trees
 

diff --git a/pandoc_pdf.sh b/pandoc_pdf.sh
@@ -0,0 +1,3 @@
+#!/bin/sh
+
+pandoc --verbose --from=markdown_github --output=aphyr-distsys-intro.pdf --variable classoption=twocolumn --standalone README.markdown
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		#!/bin/sh

		pandoc --verbose --from=markdown_github --output=aphyr-distsys-intro.pdf --variable classoption=twocolumn --standalone README.markdown