1
1
# Container Networking Talk Notes
2
2
3
- * I work in the Oracle cloud infrastruture group, more specifically on
4
- Kubernetes related stuff, and some time back I was given the task of looking
5
- into updating the networking layer in the Oracle managed Kubernetes service
6
- from using Flannel (an overlay network) to a solution which utilises the native
7
- networking features of the Oracle cloud (secondary VNICs + IPs). Dont worry if
8
- you dont know what Flannel is, or know what an overlay network is, as that is
9
- the point of this talk! However, once I started digging in, I quickly found
10
- that I didn't understand how Flannle worked, and as it seemed a little wrong to
11
- replace one thing with another solution, if you dont understand how the
12
- original worked, I started digging deeper, and then relaised that I dont
13
- understand networking in general! Long story short, big rabitt hole, learnt
14
- some stuff, and most importantly found that I really enjoyed this, so I thought
15
- I would write a talk and come and spread the networking love!
3
+ * I work for Oracle, and Oracle has a managed Kubernetes service, and some time
4
+ back I was given the task of looking into updating the networking layer in
5
+ this service from using Flannel (an overlay network) to a solution which
6
+ utilises the native networking features of the Oracle cloud (secondary VNICs +
7
+ IPs). Don't worry if you dont know what Flannel is, or know what an overlay
8
+ network is, as that is the point of this talk! However, once I started digging
9
+ in, I quickly found that I didn't understand how Flannel worked, and it seemed
10
+ a little wrong to replace one thing with another solution, if you dont
11
+ understand how the original worked. So, I started digging deeper, and then soon
12
+ realised that I don't understand networking in general! Long story short, big
13
+ rabbit hole, learnt some stuff, and most importantly, found that I really
14
+ enjoyed this area, so I thought I would write a talk and come and spread the
15
+ networking love!
16
16
17
17
* So, I'm Kris, and in the next 30 minutes or so, I'm going to attempt to explain
18
- how a contaniner on one computer on the internet, can talk to a container on
18
+ how a container on one computer on the internet, can connect to a container on
19
19
another computer, somewhere else on the internet.
20
20
21
21
## Slide: The aim
22
22
23
- * Aim to model the Kubernetes model (LHS diagram).
24
- * Each container (pod) has its own unique IP.
25
- * No NAT'ing going on.
26
- * Host can talk to containers, and vice versa.
27
-
28
- * Note: We are not covering the default docker model here, where
29
- containers on different nodes can have the same IPs.
23
+ * Aim to use the Kubernetes model of networking.
24
+
25
+ 1 . Each container (pod) has its own unique IP.
26
+ 2 . No NAT'ing going on.
27
+ 3 . Host can talk to containers, and vice versa.
30
28
31
29
## Slide: The plan
32
30
33
31
* Going to work our way toward the general case in 4 steps.
34
32
35
- * Foreach , we will explain the model via a diagram. Show some code, run the code,
33
+ * For each , we will explain the model via a diagram. Show some code, run the code,
36
34
then test what we have created.
37
35
38
36
* Each step we be created using vagrant based VMs.
43
41
44
42
* Describe the outer box (the node). Could be a physical machine, or a VM as in this case.
45
43
46
- * Describe containers vs namespaces: Containers use a bunch of different linux mechanisms
44
+ * Describe containers vs namespaces: Containers use a bunch of different Linux mechanisms
47
45
to isolate the processes running inside, both in terms of system calls, available resources,
48
46
what it can see, i.e. filesystems, other processes, etc. However, from a network connectivity
49
47
point of view, the only mechanism that matters here is the network namespace, so from now on,
50
48
whenever I say container, what I really mean is network namespace.
51
49
52
50
* What is a network namespace: It's another instance of the kernels network stack containing:
53
- * It's own interfaces.
54
- * It's own routing + route tables.
55
- * It's own IPtables rules.
51
+ 1 . It's own interfaces.
52
+ 2 . It's own routing + route tables.
53
+ 3 . It's own IPtables rules.
56
54
57
55
* When created, it is empty, i.e. no interfaces, routing or IP tables rules.
58
56
59
57
* Describe VETH pair: Ethernet cable with NIC on each end.
60
58
61
59
* Describe the relevant routing from/to the network namespace:
62
- * Directly connected route from the host to the network namespace.
63
- * Default route out of the network namespace.
60
+ 1 . Directly connected route from the host to the network namespace.
61
+ 2 . Default route out of the network namespace.
64
62
65
- * Note The 'aha' monment , when I worked out the possible types of routing rules.
63
+ * Note The 'aha' moment , when I worked out the possible types of routing rules.
66
64
For me, understanding these was for me the key to understanding networking in general.
67
65
68
66
## Code: Single network namespace setup.sh
@@ -127,7 +125,7 @@ sudo ip netns exec con1 ping 10.0.0.10
127
125
* 2 nodes on the same subnet, each setup the same as 2 but with containing different network namespace subnets.
128
126
* Talk about the routing within the node.
129
127
* Talk about the (next hop) routing between nodes (only works if the nodes are on the same L2 network).
130
- * Note that this is how the the * host-gw* flannel backend works, and also single L2 * Calico* .
128
+ * Note that this is how the * host-gw* flannel backend works, and also single L2 * Calico* .
131
129
132
130
## Code: Multi node setup.sh
133
131
@@ -160,20 +158,20 @@ sudo ip netns exec con1 ping 172.16.1.2
160
158
sudo ip netns exec con1 ping 10.0.0.20
161
159
```
162
160
163
- * When we ping from a network namespaces to another network namespace across nodes:
164
- * Highlight the TTL. Explain the reported value.
165
- * When we ping a network namespace on the other node from the node:
166
- * Highlight the TTL. Explain the reported value.
161
+ * When we ping from a network namespaces to another network namespace across nodes,
162
+ highlight the TTL. Explain the reported value.
163
+
164
+ * When we ping a network namespace on the other node from the node,
165
+ highlight the TTL. Explain the reported value.
167
166
168
167
## Slide: Diagram of multiple network namespaces on different nodes on different L2 networks (the overlay network)
169
168
170
169
* Now can't use static routes, as nodes could be on different subnets. Options:
171
- * Update routes on all routers in between (which can he done if you have control over the routers).
172
- * If running on cloud, then they might provide an option to add routes (node-\> pod-subnet mappings) into your virtual network. For example, AWS (and Oracle cloud) both allow this.
173
- * Another way us to use overlay network, which is what we will describe here.
174
- * Introduce * tun/tap* devices. A network interface backed by a user-space process.
175
- * * tun* device accepts/outputs raw IP packets.
176
- * * tap* device accepts/outputs raw ethernet packets.
170
+ 1 . Update routes on all routers in between (which can he done if you have control over the routers).
171
+ 2 . If running on cloud, then they might provide an option to add routes (node-\> pod-subnet mappings) into your virtual network. For example, AWS (and Oracle cloud) both allow this.
172
+ 3 . Another way us to use overlay network, which is what we will describe here.
173
+ * Introduce * tun* devices. A network interface backed by a user-space process.
174
+ * A * tun* device accepts/outputs raw IP packets.
177
175
* How would we use it in this case.
178
176
* Now no need for the static routes.
179
177
@@ -189,23 +187,22 @@ sudo ip netns exec con1 ping 10.0.0.20
189
187
190
188
* Explain that we are now using a (new but similar) 2 node vagrant setup.
191
189
* Talk through the * setup.sh* .
192
- * Describe the parts common to the previous step.
193
- * We need packet forwarding enabled here. This allows the node to act as a router, i.e.
194
- to accept and forward packets recieved , but not destined for, the IP of the node.
195
- * Now no extra routes, but contains the socat implementation of the overlay.
190
+ * Describe (briefly) the parts common to the previous step.
191
+ * We still need IP forwarding enabled here. This allows the node to act as a router, i.e.
192
+ to accept and forward packets received , but not destined for, the IP of the node.
193
+ * Now no extra routes, but contains the * socat* implementation of the overlay.
196
194
* Describe * socat* in general. It creates 2 bidirectional bytestreams, and transfers data between them.
197
195
* Describe how * socat* is being used here.
198
- * Note the MTU settings, what is going on here? We reduce the MTU of the tun0
196
+ * Note the MTU settings, what is going on here? We reduce the MTU of the * tun0*
199
197
device as this allows for the 8 bytes UDP header that will be added, thus ensuring that
200
198
fragmentation does not occur.
201
- * Reverse packet filtering:
202
- * What is this: Discards incoming packets from interfaces where they shouldn't be.
203
- * It's purpose: A security feature to stop IP spoofed packets from being propagated.
204
- * Why do we need the reverse packet filtering in this case? Consider the case where we send
205
- a packet from a node to a container on the other node. The outward packet will go over the
206
- tunnel. However, the response will not (as it is destined for the node), thus the response
207
- will emerge on a different interface to which the request packet went. Therefore, the kernel
208
- consider this suspicious, unless we tell it that all is ok.
199
+ * Reverse packet filtering: What is this: Discards incoming packets from interfaces where they shouldn't be.
200
+ * It's purpose: A security feature to stop IP spoofed packets from being propagated.
201
+ * Why do we need the reverse packet filtering in this case? Consider the case where we send
202
+ a packet from a node to a container on the other node. The outward packet will go over the
203
+ tunnel. However, the response will not (as it is destined for the node), thus the response
204
+ will emerge on a different interface to which the request packet went. Therefore, the kernel
205
+ consider this suspicious, unless we tell it that all is ok.
209
206
210
207
## Demo: Overlay network
211
208
@@ -222,10 +219,11 @@ sudo ip netns exec con1 ping 172.16.1.2
222
219
sudo ip netns exec con1 ping 10.0.0.20
223
220
```
224
221
225
- * When we ping from a network namespace to a network namespace across nodes:
226
- * Highlight the TTL. Explain the reported value (should have decreased by 2).
227
- * When we ping from a node to a remote network namespace:
228
- * Highlight the TTL. Explain the reported value (should have decreased by 1).
222
+ * When we ping from a network namespace to a network namespace across nodes,
223
+ highlight the TTL. Explain the reported value (should have decreased by 2).
224
+
225
+ * When we ping from a node to a remote network namespace,
226
+ highlight the TTL. Explain the reported value (should have decreased by 1).
229
227
230
228
To see the encapsulation process more clearly:
231
229
@@ -250,18 +248,20 @@ Meanwhile, on node 10.0.0.20:
250
248
251
249
## Slide: Putting it all together
252
250
253
- So how does this work in the real world?
251
+ * So how does this work in the real world?
254
252
255
- * Need a way to map nodes to subnets. In Kubernetes, this could be Etcd.
253
+ * Can characterise existing Kubernetes networking solutions in terms of
254
+ 2 properties. 1. How they connect, and 2. Where they store their pod-subnet
255
+ to node mappings.
256
256
257
257
* Popular network solutions:
258
258
* 1 . * Flannel*
259
- * Uses * etcd* to store the node->pod-subnet mapping.
260
259
* Multiple backends:
261
260
* * host-gw* : step 3
262
261
* * udp* : step 4
263
262
* * VXLAN* : step 4, but more efficient.
264
263
* * awsvpc* : Sets routes in AWS.
264
+ * Uses * etcd* to store the node->pod-subnet mapping.
265
265
* 2 . * Calico*
266
266
* No overlay for intra L2. Uses next-hop routing (step 3).
267
267
* For inter L2 node comminucation, uses IPIP overlay.
0 commit comments