Skip to content

Commit d4401e4

Browse files
committed
Tidied up the kubecon talk notes
1 parent b23455d commit d4401e4

File tree

1 file changed

+60
-60
lines changed

1 file changed

+60
-60
lines changed

Diff for: slides/201812-kubecon/notes.md

+60-60
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,36 @@
11
# Container Networking Talk Notes
22

3-
* I work in the Oracle cloud infrastruture group, more specifically on
4-
Kubernetes related stuff, and some time back I was given the task of looking
5-
into updating the networking layer in the Oracle managed Kubernetes service
6-
from using Flannel (an overlay network) to a solution which utilises the native
7-
networking features of the Oracle cloud (secondary VNICs + IPs). Dont worry if
8-
you dont know what Flannel is, or know what an overlay network is, as that is
9-
the point of this talk! However, once I started digging in, I quickly found
10-
that I didn't understand how Flannle worked, and as it seemed a little wrong to
11-
replace one thing with another solution, if you dont understand how the
12-
original worked, I started digging deeper, and then relaised that I dont
13-
understand networking in general! Long story short, big rabitt hole, learnt
14-
some stuff, and most importantly found that I really enjoyed this, so I thought
15-
I would write a talk and come and spread the networking love!
3+
* I work for Oracle, and Oracle has a managed Kubernetes service, and some time
4+
back I was given the task of looking into updating the networking layer in
5+
this service from using Flannel (an overlay network) to a solution which
6+
utilises the native networking features of the Oracle cloud (secondary VNICs +
7+
IPs). Don't worry if you dont know what Flannel is, or know what an overlay
8+
network is, as that is the point of this talk! However, once I started digging
9+
in, I quickly found that I didn't understand how Flannel worked, and it seemed
10+
a little wrong to replace one thing with another solution, if you dont
11+
understand how the original worked. So, I started digging deeper, and then soon
12+
realised that I don't understand networking in general! Long story short, big
13+
rabbit hole, learnt some stuff, and most importantly, found that I really
14+
enjoyed this area, so I thought I would write a talk and come and spread the
15+
networking love!
1616

1717
* So, I'm Kris, and in the next 30 minutes or so, I'm going to attempt to explain
18-
how a contaniner on one computer on the internet, can talk to a container on
18+
how a container on one computer on the internet, can connect to a container on
1919
another computer, somewhere else on the internet.
2020

2121
## Slide: The aim
2222

23-
* Aim to model the Kubernetes model (LHS diagram).
24-
* Each container (pod) has its own unique IP.
25-
* No NAT'ing going on.
26-
* Host can talk to containers, and vice versa.
27-
28-
* Note: We are not covering the default docker model here, where
29-
containers on different nodes can have the same IPs.
23+
* Aim to use the Kubernetes model of networking.
24+
25+
1. Each container (pod) has its own unique IP.
26+
2. No NAT'ing going on.
27+
3. Host can talk to containers, and vice versa.
3028

3129
## Slide: The plan
3230

3331
* Going to work our way toward the general case in 4 steps.
3432

35-
* Foreach, we will explain the model via a diagram. Show some code, run the code,
33+
* For each, we will explain the model via a diagram. Show some code, run the code,
3634
then test what we have created.
3735

3836
* Each step we be created using vagrant based VMs.
@@ -43,26 +41,26 @@
4341

4442
* Describe the outer box (the node). Could be a physical machine, or a VM as in this case.
4543

46-
* Describe containers vs namespaces: Containers use a bunch of different linux mechanisms
44+
* Describe containers vs namespaces: Containers use a bunch of different Linux mechanisms
4745
to isolate the processes running inside, both in terms of system calls, available resources,
4846
what it can see, i.e. filesystems, other processes, etc. However, from a network connectivity
4947
point of view, the only mechanism that matters here is the network namespace, so from now on,
5048
whenever I say container, what I really mean is network namespace.
5149

5250
* What is a network namespace: It's another instance of the kernels network stack containing:
53-
* It's own interfaces.
54-
* It's own routing + route tables.
55-
* It's own IPtables rules.
51+
1. It's own interfaces.
52+
2. It's own routing + route tables.
53+
3. It's own IPtables rules.
5654

5755
* When created, it is empty, i.e. no interfaces, routing or IP tables rules.
5856

5957
* Describe VETH pair: Ethernet cable with NIC on each end.
6058

6159
* Describe the relevant routing from/to the network namespace:
62-
* Directly connected route from the host to the network namespace.
63-
* Default route out of the network namespace.
60+
1. Directly connected route from the host to the network namespace.
61+
2. Default route out of the network namespace.
6462

65-
* Note The 'aha' monment, when I worked out the possible types of routing rules.
63+
* Note The 'aha' moment, when I worked out the possible types of routing rules.
6664
For me, understanding these was for me the key to understanding networking in general.
6765

6866
## Code: Single network namespace setup.sh
@@ -127,7 +125,7 @@ sudo ip netns exec con1 ping 10.0.0.10
127125
* 2 nodes on the same subnet, each setup the same as 2 but with containing different network namespace subnets.
128126
* Talk about the routing within the node.
129127
* Talk about the (next hop) routing between nodes (only works if the nodes are on the same L2 network).
130-
* Note that this is how the the *host-gw* flannel backend works, and also single L2 *Calico*.
128+
* Note that this is how the *host-gw* flannel backend works, and also single L2 *Calico*.
131129

132130
## Code: Multi node setup.sh
133131

@@ -160,20 +158,20 @@ sudo ip netns exec con1 ping 172.16.1.2
160158
sudo ip netns exec con1 ping 10.0.0.20
161159
```
162160

163-
* When we ping from a network namespaces to another network namespace across nodes:
164-
* Highlight the TTL. Explain the reported value.
165-
* When we ping a network namespace on the other node from the node:
166-
* Highlight the TTL. Explain the reported value.
161+
* When we ping from a network namespaces to another network namespace across nodes,
162+
highlight the TTL. Explain the reported value.
163+
164+
* When we ping a network namespace on the other node from the node,
165+
highlight the TTL. Explain the reported value.
167166

168167
## Slide: Diagram of multiple network namespaces on different nodes on different L2 networks (the overlay network)
169168

170169
* Now can't use static routes, as nodes could be on different subnets. Options:
171-
* Update routes on all routers in between (which can he done if you have control over the routers).
172-
* If running on cloud, then they might provide an option to add routes (node-\>pod-subnet mappings) into your virtual network. For example, AWS (and Oracle cloud) both allow this.
173-
* Another way us to use overlay network, which is what we will describe here.
174-
* Introduce *tun/tap* devices. A network interface backed by a user-space process.
175-
* *tun* device accepts/outputs raw IP packets.
176-
* *tap* device accepts/outputs raw ethernet packets.
170+
1. Update routes on all routers in between (which can he done if you have control over the routers).
171+
2. If running on cloud, then they might provide an option to add routes (node-\>pod-subnet mappings) into your virtual network. For example, AWS (and Oracle cloud) both allow this.
172+
3. Another way us to use overlay network, which is what we will describe here.
173+
* Introduce *tun* devices. A network interface backed by a user-space process.
174+
* A *tun* device accepts/outputs raw IP packets.
177175
* How would we use it in this case.
178176
* Now no need for the static routes.
179177

@@ -189,23 +187,22 @@ sudo ip netns exec con1 ping 10.0.0.20
189187

190188
* Explain that we are now using a (new but similar) 2 node vagrant setup.
191189
* Talk through the *setup.sh*.
192-
* Describe the parts common to the previous step.
193-
* We need packet forwarding enabled here. This allows the node to act as a router, i.e.
194-
to accept and forward packets recieved, but not destined for, the IP of the node.
195-
* Now no extra routes, but contains the socat implementation of the overlay.
190+
* Describe (briefly) the parts common to the previous step.
191+
* We still need IP forwarding enabled here. This allows the node to act as a router, i.e.
192+
to accept and forward packets received, but not destined for, the IP of the node.
193+
* Now no extra routes, but contains the *socat* implementation of the overlay.
196194
* Describe *socat* in general. It creates 2 bidirectional bytestreams, and transfers data between them.
197195
* Describe how *socat* is being used here.
198-
* Note the MTU settings, what is going on here? We reduce the MTU of the tun0
196+
* Note the MTU settings, what is going on here? We reduce the MTU of the *tun0*
199197
device as this allows for the 8 bytes UDP header that will be added, thus ensuring that
200198
fragmentation does not occur.
201-
* Reverse packet filtering:
202-
* What is this: Discards incoming packets from interfaces where they shouldn't be.
203-
* It's purpose: A security feature to stop IP spoofed packets from being propagated.
204-
* Why do we need the reverse packet filtering in this case? Consider the case where we send
205-
a packet from a node to a container on the other node. The outward packet will go over the
206-
tunnel. However, the response will not (as it is destined for the node), thus the response
207-
will emerge on a different interface to which the request packet went. Therefore, the kernel
208-
consider this suspicious, unless we tell it that all is ok.
199+
* Reverse packet filtering: What is this: Discards incoming packets from interfaces where they shouldn't be.
200+
* It's purpose: A security feature to stop IP spoofed packets from being propagated.
201+
* Why do we need the reverse packet filtering in this case? Consider the case where we send
202+
a packet from a node to a container on the other node. The outward packet will go over the
203+
tunnel. However, the response will not (as it is destined for the node), thus the response
204+
will emerge on a different interface to which the request packet went. Therefore, the kernel
205+
consider this suspicious, unless we tell it that all is ok.
209206

210207
## Demo: Overlay network
211208

@@ -222,10 +219,11 @@ sudo ip netns exec con1 ping 172.16.1.2
222219
sudo ip netns exec con1 ping 10.0.0.20
223220
```
224221

225-
* When we ping from a network namespace to a network namespace across nodes:
226-
* Highlight the TTL. Explain the reported value (should have decreased by 2).
227-
* When we ping from a node to a remote network namespace:
228-
* Highlight the TTL. Explain the reported value (should have decreased by 1).
222+
* When we ping from a network namespace to a network namespace across nodes,
223+
highlight the TTL. Explain the reported value (should have decreased by 2).
224+
225+
* When we ping from a node to a remote network namespace,
226+
highlight the TTL. Explain the reported value (should have decreased by 1).
229227

230228
To see the encapsulation process more clearly:
231229

@@ -250,18 +248,20 @@ Meanwhile, on node 10.0.0.20:
250248

251249
## Slide: Putting it all together
252250

253-
So how does this work in the real world?
251+
* So how does this work in the real world?
254252

255-
* Need a way to map nodes to subnets. In Kubernetes, this could be Etcd.
253+
* Can characterise existing Kubernetes networking solutions in terms of
254+
2 properties. 1. How they connect, and 2. Where they store their pod-subnet
255+
to node mappings.
256256

257257
* Popular network solutions:
258258
* 1. *Flannel*
259-
* Uses *etcd* to store the node->pod-subnet mapping.
260259
* Multiple backends:
261260
* *host-gw*: step 3
262261
* *udp*: step 4
263262
* *VXLAN*: step 4, but more efficient.
264263
* *awsvpc*: Sets routes in AWS.
264+
* Uses *etcd* to store the node->pod-subnet mapping.
265265
* 2. *Calico*
266266
* No overlay for intra L2. Uses next-hop routing (step 3).
267267
* For inter L2 node comminucation, uses IPIP overlay.

0 commit comments

Comments
 (0)