Tidied up the kubecon talk notes

kristenjacobs · kristenjacobs · commit d4401e48596f · 2018-11-28T08:46:23.000Z
diff --git a/slides/201812-kubecon/notes.md b/slides/201812-kubecon/notes.md
@@ -1,38 +1,36 @@
 # Container Networking Talk Notes
 
-* I work in the Oracle cloud infrastruture group, more specifically on
-  Kubernetes related stuff, and some time back I was given the task of looking
-  into updating the networking layer in the Oracle managed Kubernetes service
-  from using Flannel (an overlay network) to a solution which utilises the native
-  networking features of the Oracle cloud (secondary VNICs + IPs). Dont worry if
-  you dont know what Flannel is, or know what an overlay network is, as that is
-  the point of this talk!  However, once I started digging in, I quickly found
-  that I didn't understand how Flannle worked, and as it seemed a little wrong to
-  replace one thing with another solution, if you dont understand how the
-  original worked, I started digging deeper, and then relaised that I dont
-  understand networking in general!  Long story short, big rabitt hole, learnt
-  some stuff, and most importantly found that I really enjoyed this, so I thought
-  I would write a talk and come and spread the networking love!
+* I work for Oracle, and Oracle has a managed Kubernetes service, and some time
+  back I was given the task of looking into updating the networking layer in
+  this service from using Flannel (an overlay network) to a solution which
+  utilises the native networking features of the Oracle cloud (secondary VNICs +
+  IPs). Don't worry if you dont know what Flannel is, or know what an overlay
+  network is, as that is the point of this talk!  However, once I started digging
+  in, I quickly found that I didn't understand how Flannel worked, and it seemed
+  a little wrong to replace one thing with another solution, if you dont
+  understand how the original worked. So, I started digging deeper, and then soon
+  realised that I don't understand networking in general!  Long story short, big
+  rabbit hole, learnt some stuff, and most importantly, found that I really
+  enjoyed this area, so I thought I would write a talk and come and spread the
+  networking love!
   
 * So, I'm Kris, and in the next 30 minutes or so, I'm going to attempt to explain
-  how a contaniner on one computer on the internet, can talk to a container on 
+  how a container on one computer on the internet, can connect to a container on 
   another computer, somewhere else on the internet.
 
 ## Slide: The aim
 
-* Aim to model the Kubernetes model (LHS diagram).
-    * Each container (pod) has its own unique IP. 
-    * No NAT'ing going on.
-    * Host can talk to containers, and vice versa.
-
-* Note: We are not covering the default docker model here, where
-  containers on different nodes can have the same IPs.
+* Aim to use the Kubernetes model of networking.
+    
+1. Each container (pod) has its own unique IP. 
+2. No NAT'ing going on.
+3. Host can talk to containers, and vice versa.
 
 ## Slide: The plan
 
 * Going to work our way toward the general case in 4 steps.
 
-* Foreach, we will explain the model via a diagram. Show some code, run the code, 
+* For each, we will explain the model via a diagram. Show some code, run the code, 
   then test what we have created. 
 
 * Each step we be created using vagrant based VMs.
@@ -43,26 +41,26 @@
 
 * Describe the outer box (the node). Could be a physical machine, or a VM as in this case.
 
-* Describe containers vs namespaces: Containers use a bunch of different linux mechanisms 
+* Describe containers vs namespaces: Containers use a bunch of different Linux mechanisms 
   to isolate the processes running inside, both in terms of system calls, available resources, 
   what it can see, i.e. filesystems, other processes, etc. However, from a network connectivity 
   point of view, the only mechanism that matters here is the network namespace, so from now on, 
   whenever I say container, what I really mean is network namespace.
 
 * What is a network namespace: It's another instance of the kernels network stack containing: 
-    * It's own interfaces.
-    * It's own routing + route tables.
-    * It's own IPtables rules.
+    1. It's own interfaces.
+    2. It's own routing + route tables.
+    3. It's own IPtables rules.
         
 * When created, it is empty, i.e. no interfaces, routing or IP tables rules.
 
 * Describe VETH pair: Ethernet cable with NIC on each end.
 
 * Describe the relevant routing from/to the network namespace:
-    * Directly connected route from the host to the network namespace.
-    * Default route out of the network namespace.
+    1. Directly connected route from the host to the network namespace.
+    2. Default route out of the network namespace.
 
-* Note The 'aha' monment, when I worked out the possible types of routing rules. 
+* Note The 'aha' moment, when I worked out the possible types of routing rules. 
   For me, understanding these was for me the key to understanding networking in general.
 
 ## Code: Single network namespace setup.sh
@@ -127,7 +125,7 @@ sudo ip netns exec con1 ping 10.0.0.10
 * 2 nodes on the same subnet, each setup the same as 2 but with containing different network namespace subnets.
 * Talk about the routing within the node. 
 * Talk about the (next hop) routing between nodes (only works if the nodes are on the same L2 network). 
-* Note that this is how the the *host-gw* flannel backend works, and also single L2 *Calico*.
+* Note that this is how the *host-gw* flannel backend works, and also single L2 *Calico*.
 
 ## Code: Multi node setup.sh
 
@@ -160,20 +158,20 @@ sudo ip netns exec con1 ping 172.16.1.2
 sudo ip netns exec con1 ping 10.0.0.20
 ```
 
-* When we ping from a network namespaces to another network namespace across nodes:
-    * Highlight the TTL. Explain the reported value.
-* When we ping a network namespace on the other node from the node:
-    * Highlight the TTL. Explain the reported value.
+* When we ping from a network namespaces to another network namespace across nodes, 
+  highlight the TTL. Explain the reported value.
+
+* When we ping a network namespace on the other node from the node, 
+  highlight the TTL. Explain the reported value.
 
 ## Slide: Diagram of multiple network namespaces on different nodes on different L2 networks (the overlay network)
 
 * Now can't use static routes, as nodes could be on different subnets. Options:
-    * Update routes on all routers in between (which can he done if you have control over the routers).
-    * If running on cloud, then they might provide an option to add routes (node-\>pod-subnet mappings) into your virtual network. For example, AWS (and Oracle cloud) both allow this.
-    * Another way us to use overlay network, which is what we will describe here.
-* Introduce *tun/tap* devices. A network interface backed by a user-space process.
-    * *tun* device accepts/outputs raw IP packets.  
-    * *tap* device accepts/outputs raw ethernet packets.  
+    1. Update routes on all routers in between (which can he done if you have control over the routers).
+    2. If running on cloud, then they might provide an option to add routes (node-\>pod-subnet mappings) into your virtual network. For example, AWS (and Oracle cloud) both allow this.
+    3. Another way us to use overlay network, which is what we will describe here.
+* Introduce *tun* devices. A network interface backed by a user-space process.
+* A *tun* device accepts/outputs raw IP packets.  
 * How would we use it in this case.
 * Now no need for the static routes.
 
@@ -189,23 +187,22 @@ sudo ip netns exec con1 ping 10.0.0.20
 
 * Explain that we are now using a (new but similar) 2 node vagrant setup.
 * Talk through the *setup.sh*. 
-    * Describe the parts common to the previous step.
-    * We need packet forwarding enabled here. This allows the node to act as a router, i.e.
-      to accept and forward packets recieved, but not destined for, the IP of the node.
-    * Now no extra routes, but contains the socat implementation of the overlay.
+* Describe (briefly) the parts common to the previous step.
+* We still need IP forwarding enabled here. This allows the node to act as a router, i.e.
+  to accept and forward packets received, but not destined for, the IP of the node.
+* Now no extra routes, but contains the *socat* implementation of the overlay.
 * Describe *socat* in general. It creates 2 bidirectional bytestreams, and transfers data between them.
 * Describe how *socat* is being used here. 
-* Note the MTU settings, what is going on here? We reduce the MTU of the tun0 
+* Note the MTU settings, what is going on here? We reduce the MTU of the *tun0* 
   device as this allows for the 8 bytes UDP header that will be added, thus ensuring that 
   fragmentation does not occur.
-* Reverse packet filtering:
-    * What is this: Discards incoming packets from interfaces where they shouldn't be.
-    * It's purpose: A security feature to stop IP spoofed packets from being propagated.
-    * Why do we need the reverse packet filtering in this case? Consider the case where we send
-      a packet from a node to a container on the other node. The outward packet will go over the 
-      tunnel. However, the response will not (as it is destined for the node), thus the response
-      will emerge on a different interface to which the request packet went. Therefore, the kernel 
-      consider this suspicious, unless we tell it that all is ok.
+* Reverse packet filtering: What is this: Discards incoming packets from interfaces where they shouldn't be.
+* It's purpose: A security feature to stop IP spoofed packets from being propagated.
+* Why do we need the reverse packet filtering in this case? Consider the case where we send
+  a packet from a node to a container on the other node. The outward packet will go over the 
+  tunnel. However, the response will not (as it is destined for the node), thus the response
+  will emerge on a different interface to which the request packet went. Therefore, the kernel 
+  consider this suspicious, unless we tell it that all is ok.
 
 ## Demo: Overlay network
 
@@ -222,10 +219,11 @@ sudo ip netns exec con1 ping 172.16.1.2
 sudo ip netns exec con1 ping 10.0.0.20
 ```
 
-* When we ping from a network namespace to a network namespace across nodes:
-    * Highlight the TTL. Explain the reported value (should have decreased by 2).
-* When we ping from a node to a remote network namespace:
-    * Highlight the TTL. Explain the reported value (should have decreased by 1).
+* When we ping from a network namespace to a network namespace across nodes, 
+  highlight the TTL. Explain the reported value (should have decreased by 2).
+
+* When we ping from a node to a remote network namespace, 
+  highlight the TTL. Explain the reported value (should have decreased by 1).
 
 To see the encapsulation process more clearly:
 
@@ -250,18 +248,20 @@ Meanwhile, on node 10.0.0.20:
 
 ## Slide: Putting it all together
 
-So how does this work in the real world?
+* So how does this work in the real world?
 
-* Need a way to map nodes to subnets. In Kubernetes, this could be Etcd.
+* Can characterise existing Kubernetes networking solutions in terms of 
+  2 properties. 1. How they connect, and 2. Where they store their pod-subnet
+  to node mappings.
 
 * Popular network solutions:
     * 1. *Flannel* 
-        * Uses *etcd* to store the node->pod-subnet mapping.
         * Multiple backends:
             * *host-gw*: step 3
             * *udp*: step 4
             * *VXLAN*: step 4, but more efficient. 
             * *awsvpc*: Sets routes in AWS.
+        * Uses *etcd* to store the node->pod-subnet mapping.
     * 2. *Calico*
         * No overlay for intra L2. Uses next-hop routing (step 3).
         * For inter L2 node comminucation, uses IPIP overlay.