Skip to content

Commit 3d9cfcd

Browse files
authored
Merge pull request #1499 from zrggw/adjust_ipsec
Add IPsec document content and remove unnecessary OutputMark for egress
2 parents 932c468 + c22bf91 commit 3d9cfcd

File tree

5 files changed

+107
-14
lines changed

5 files changed

+107
-14
lines changed

docs/pics/IPsec_decrypt_ESP_Packet_flow.svg

Lines changed: 4 additions & 0 deletions
Loading

docs/pics/IPsec_encrypt_Packet_flow.svg

Lines changed: 4 additions & 0 deletions
Loading

docs/proposal/kmesh_support_encrypt.md

Lines changed: 65 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Kmesh仅使用IPSec的加密功能,IPSec的预共享密钥由用户设置在K8
3636

3737
用户通过kmeshctl向K8s中设置名称为Kmesh-ipsec-keys的secret类型资源,格式如下:
3838

39-
kmeshctl secret --key=<aead key>
39+
kmeshctl secret create --key=<aead key>
4040

4141
当前仅支持rfc4106 gcm aes (aead)算法,该资源中包含有ipsec使用的aead key,以及ipsec的icv长度
4242

@@ -180,18 +180,77 @@ CRD数据结构定义如下:
180180

181181
ipsec示例:本机ip地址为7.6.122.84,获取到对端的node ip 地址信息为7.6.122.220,设置ipsec配置预览如下
182182

183+
## 数据加密、解密流程
184+
185+
### 5.1 加密过程
186+
187+
当数据包从Pod发出并到达Pod peer veth设备时,数据包的加密处理流程如下:
188+
189+
1. **目标检查**:TC程序根据数据包的目的IP地址在LPM前缀树map中进行查询
190+
2. **加密标记**:如果查询到对应的CIDR记录,说明目标节点启用了IPsec加密,TC程序会为数据包打上特定的mark标记(mark值为0x000000e0),将其标识为需要通过IPsec加密发送的数据包
191+
3. **IPsec处理**:带有加密标记的数据包会被内核的IPsec子系统拦截,根据配置的xfrm policy和state规则进行加密处理
192+
4. **隧道封装**:由于使用IPsec隧道模式,加密后的数据包会被重新封装,新的外层IP头部中的源地址和目的地址会被替换为发送节点和接收节点的网卡IP地址,原始的Pod IP地址被封装在内层
193+
194+
### 5.2 解密过程
195+
196+
数据包到达目标节点网卡时,解密处理流程相对复杂,需要区分不同类型的数据包:
197+
198+
#### 5.2.1 ESP协议数据包处理
199+
200+
如果接收到的数据包协议为ESP(Encapsulating Security Payload),说明这是一个IPsec加密的数据包:
201+
202+
1. **Xfrm解码**:数据包会被送入内核的`Xfrm decode`阶段
203+
2. **状态查找**:系统查找匹配的xfrm state规则
204+
3. **解密处理**:如果找到匹配的规则,进行解密处理,解密后的数据包会携带output-mark(mark值为0x00d0)并第二次进入ingress阶段
205+
4. **丢弃处理**:如果未找到匹配的规则,数据包将被丢弃
206+
207+
#### 5.2.2 非ESP协议数据包处理
208+
209+
如果数据包协议不是ESP,则存在两种可能情况,需要通过mark值进行区分:
210+
211+
1. **未加密的普通数据包**
212+
- 这是第一次进入ingress阶段的普通数据包
213+
- TC程序将其mark值设置为0x0,避免后续错误匹配到xfrm policy导致数据包被意外丢弃
214+
215+
2. **解密后的数据包**
216+
- 这是经过`Xfrm decode`解密处理后的数据包,携带output-mark(mark值为0x00d0)
217+
- TC程序保持其mark值不变,以便后续能够正确匹配到相应的xfrm policy进行进一步处理
218+
- 数据包经过验证后转发给目标Pod
219+
220+
### 5.3 Mark值说明
221+
222+
系统使用不同的mark值来标识数据包的状态和处理阶段:
223+
224+
| Mark值 | 用途 | 说明 |
225+
|--------|------|------|
226+
| 0x000000e0 | 加密标记 | 标识需要IPsec加密的出站数据包 |
227+
| 0x000000d0 | 解密标记 | 标识已完成IPsec解密的入站数据包 |
228+
| 0x0 | 普通标记 | 标识普通未加密数据包,避免错误匹配 |
229+
230+
### 5.4 流程图
231+
232+
**下面是加密数据包的解密过程流程图**
233+
234+
![解密过程流程图](../pics/IPsec_decrypt_ESP_Packet_flow.svg)
235+
236+
**下面是未加密数据包的加密过程流程图**
237+
238+
![加密过程流程图](../pics/IPsec_encrypt_Packet_flow.svg)
239+
240+
关于流程图中的细节可以进一步参考:[Nftables - Netfilter and VPN/IPsec packet flow](https://thermalcircle.de/doku.php?id=blog:linux:nftables_ipsec_packet_flow#context)以及[RFC 4301](https://www.rfc-editor.org/rfc/rfc4301).
241+
183242
# state配置
184243

185244
ip xfrm state add src 7.6.122.84 dst 7.6.122.220 proto esp spi 0x1 mode tunnel reqid 1 {\$aead-algo} {\$aead-出口密钥} {\$icv-len}
186-
ip xfrm state add src 7.6.122.220 dst 7.6.122.84 proto esp spi 0x1 mode tunnel reqid 1 {\$aead-algo} {\$aead-入口密钥} {\$icv-len}
245+
ip xfrm state add src 7.6.122.220 dst 7.6.122.84 proto esp spi 0x1 mode tunnel reqid 1 {\$aead-algo} {\$aead-入口密钥} {\$icv-len} output-mark 0x000000d0 mask 0xffffffff
187246

188247
# policy配置
189248

190-
ip xfrm policy add src 0.0.0.0/0 dst {\$对端CIDR} dir out tmpl src 7.6.122.84 dst 7.6.122.220 proto esp spi 0x1 reqid 1 mode tunnel mark 0x000000e0 mask 0xffff
191-
ip xfrm policy add src 0.0.0.0/0 dst {\$本端CIDR} dir in tmpl src 7.6.122.220 dst 7.6.122.84 proto esp reqid 1 mode tunnel mark 0x000000d0 mask 0xfffffff
192-
ip xfrm policy add src 0.0.0.0/0 dst {\$本端CIDR} dir fwd tmpl src 7.6.122.220 dst 7.6.122.84 proto esp reqid 1 mode tunnel mark 0x000000d0 mask 0xfffffff
249+
ip xfrm policy add src 0.0.0.0/0 dst {\$对端CIDR} dir out tmpl src 7.6.122.84 dst 7.6.122.220 proto esp spi 0x1 reqid 1 mode tunnel mark 0x000000e0 mask 0xffffffff
250+
ip xfrm policy add src 0.0.0.0/0 dst {\$本端CIDR} dir in tmpl src 7.6.122.220 dst 7.6.122.84 proto esp reqid 1 mode tunnel mark 0x000000d0 mask 0xffffffff
251+
ip xfrm policy add src 0.0.0.0/0 dst {\$本端CIDR} dir fwd tmpl src 7.6.122.220 dst 7.6.122.84 proto esp reqid 1 mode tunnel mark 0x000000d0 mask 0xffffffff
193252

194-
- 更新lpm前缀树map,key为对端CIDR地址,value当前全部设置为1,tc根据目标pod ip在前缀树找到记录,确定对端pod为Kmesh纳管,为流量打上对应的加密、解密标签
253+
- 更新lpm前缀树map,key为对端CIDR地址,value当前全部设置为1,tc根据目标pod ip在前缀树找到记录,确定对端pod为Kmesh纳管,为流量打上对应的加密标签
195254
- Kmesh-daemon将本端的spi、IPsec设备ip、podCIDRs更新到api-server中,触发其他节点更新机器上的IPsec配置
196255

197256
**Kmesh-daemon检测到node节点新增时:**

pkg/controller/encryption/ipsec/ipsec_handler.go

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,7 @@ func (is *IpSecHandler) createXfrmRuleIngress(rawRemoteIP, rawLocalNicIP, remote
202202
// localNicIPInfo must exist, spi is local node info spi
203203
newKey := is.generateIPSecKey(rawRemoteIP, rawLocalNicIP, remoteBootID, localBootID, is.historyIpSecKey[spi].AeadKey)
204204

205-
err := is.createStateRule(src, dst, newKey, is.historyIpSecKey[spi])
205+
err := is.createStateRule(src, dst, newKey, is.historyIpSecKey[spi], true)
206206
if err != nil {
207207
return err
208208
}
@@ -244,7 +244,7 @@ func (is *IpSecHandler) createXfrmRuleEgress(rawLocalNicIP, rawRemoteIP, localBo
244244

245245
newKey := is.generateIPSecKey(rawLocalNicIP, rawRemoteIP, localBootID, remoteBootID, ipsecKey.AeadKey)
246246

247-
err := is.createStateRule(src, dst, newKey, ipsecKey)
247+
err := is.createStateRule(src, dst, newKey, ipsecKey, false)
248248
if err != nil {
249249
return err
250250
}
@@ -267,7 +267,7 @@ func (is *IpSecHandler) createXfrmRuleEgress(rawLocalNicIP, rawRemoteIP, localBo
267267
return nil
268268
}
269269

270-
func (is *IpSecHandler) createStateRule(src net.IP, dst net.IP, key []byte, ipsecKey IpSecKey) error {
270+
func (is *IpSecHandler) createStateRule(src net.IP, dst net.IP, key []byte, ipsecKey IpSecKey, ingress bool) error {
271271
state := &netlink.XfrmState{
272272
Src: src,
273273
Dst: dst,
@@ -280,11 +280,15 @@ func (is *IpSecHandler) createStateRule(src net.IP, dst net.IP, key []byte, ipse
280280
Key: key,
281281
ICVLen: ipsecKey.Length,
282282
},
283-
OutputMark: &netlink.XfrmMark{
283+
}
284+
285+
if ingress {
286+
state.OutputMark = &netlink.XfrmMark{
284287
Value: constants.XfrmDecryptedMark,
285288
Mask: constants.XfrmMarkMask,
286-
},
289+
}
287290
}
291+
288292
err := netlink.XfrmStateAdd(state)
289293
if err != nil && !os.IsExist(err) {
290294
return fmt.Errorf("failed to add xfrm state to host in inserting xfrm out rule, %v", err)

pkg/controller/encryption/ipsec/ipsec_handler_test.go

Lines changed: 25 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ import (
3030
"github.com/stretchr/testify/assert"
3131
"github.com/stretchr/testify/require"
3232
"github.com/vishvananda/netlink"
33+
34+
"kmesh.net/kmesh/pkg/constants"
3335
)
3436

3537
// DecodeHex is a utility function to decode a hex string into bytes.
@@ -303,7 +305,23 @@ func hasStateRule(state *netlink.XfrmState) (bool, error) {
303305
s.Proto == state.Proto && s.Mode == state.Mode &&
304306
s.Aead.Name == state.Aead.Name && s.Aead.Key != nil && bytes.Equal(s.Aead.Key, state.Aead.Key) &&
305307
s.Aead.ICVLen == state.Aead.ICVLen {
308+
if state.OutputMark != nil {
309+
// If we expect a mark, the state from system must have a matching one.
310+
if s.OutputMark == nil || s.OutputMark.Value != state.OutputMark.Value {
311+
continue
312+
}
313+
// Special handling for Mask: allow 0 or 0xffffffff as equivalent
314+
if s.OutputMark.Mask != state.OutputMark.Mask && !(s.OutputMark.Mask == 0 && state.OutputMark.Mask == 0xffffffff) {
315+
continue
316+
}
317+
} else {
318+
// If we don't expect a mark, the state from system must not have one.
319+
if s.OutputMark != nil {
320+
continue
321+
}
322+
}
306323
found = true
324+
break
307325
}
308326
}
309327
if !found {
@@ -341,7 +359,7 @@ func TestCreateStateRule(t *testing.T) {
341359
}
342360

343361
t.Run("test_create_state_rule", func(t *testing.T) {
344-
err := handler.createStateRule(state.Src, state.Dst, testKey, ipsecKey)
362+
err := handler.createStateRule(state.Src, state.Dst, testKey, ipsecKey, false)
345363

346364
require.NoError(t, err, "Failed to add XFRM state rule: %v", err)
347365
// Verify the state was added
@@ -352,7 +370,11 @@ func TestCreateStateRule(t *testing.T) {
352370
state2 := *state
353371
state2.Src = net.ParseIP("10.0.3.100")
354372
state2.Dst = net.ParseIP("10.0.4.100")
355-
handler.createStateRule(state2.Src, state2.Dst, testKey, ipsecKey)
373+
state2.OutputMark = &netlink.XfrmMark{
374+
Value: constants.XfrmDecryptedMark,
375+
Mask: constants.XfrmMarkMask,
376+
}
377+
handler.createStateRule(state2.Src, state2.Dst, testKey, ipsecKey, true)
356378
// Verify the state was added
357379
found, err = hasStateRule(&state2)
358380
require.NoError(t, err, "Failed to check XFRM state rule: %v", err)
@@ -534,7 +556,7 @@ func TestFlush(t *testing.T) {
534556
},
535557
}
536558

537-
err := handler.createStateRule(state.Src, state.Dst, testKey, ipsecKey)
559+
err := handler.createStateRule(state.Src, state.Dst, testKey, ipsecKey, false)
538560
assert.NoError(t, err, "Failed to add state rule")
539561
// Verify the state was added
540562
found, err := hasStateRule(state)

0 commit comments

Comments
 (0)