-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
l4 port redirect & fix loopback for ip6 packets #112
base: main
Are you sure you want to change the base?
Conversation
christophefontaine
commented
Dec 19, 2024
- Fixes the redirection of IPv6 packets
- Prepare the tunnel over UDP, so we can register for a specific UDP port
- simplifies the number of nodes by removing TCP / SCTP dedicated nodes as we don't use them.
bf4715b
to
2dcd062
Compare
We wrongly assumed that the data was still present when prepending an ip header. Rewrite the ip/ip6 header instead. Fixes: e27fab5 ("ip/ip6: post tcp/udp packets to loopback interface") Signed-off-by: Christophe Fontaine <[email protected]>
For VxLan tunnels, we need to register a specific destination udp port. Add a new node and api, and push all non-handled packets back to the management (loopback) interface, as done today. Signed-off-by: Christophe Fontaine <[email protected]>
2dcd062
to
2b780c9
Compare
goto next; | ||
} | ||
|
||
if (mbuf->packet_type & RTE_PTYPE_L3_IPV4) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relying on RTE_PTYPE_L3_IPV4 is risky (is the tun driver really setting this correctly?).
You'll need something else.
if (mbuf->packet_type & RTE_PTYPE_L3_IPV4) { | ||
struct ip_local_mbuf_data *d = ip_local_mbuf_data(mbuf); | ||
struct rte_ipv4_hdr *ip; | ||
ip = (struct rte_ipv4_hdr *)rte_pktmbuf_prepend(mbuf, sizeof(*ip)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do you make sure there is enough headroom?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, will fix it
ip->src_addr = d->src; | ||
ip->dst_addr = d->dst; | ||
ip->total_length = rte_cpu_to_be_16(d->len) + sizeof(*ip); | ||
ip->next_proto_id = d->proto; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some fields don't seem initialized, so you are keeping some stale values from memory.
struct rte_ipv4_hdr {
union {
uint8_t version_ihl; /* 0 1 */
struct {
uint8_t ihl:4; /* 0: 0 1 */
uint8_t version:4; /* 0: 4 1 */
}; /* 0 1 */
}; /* 0 1 */
uint8_t type_of_service; /* 1 1 */
rte_be16_t total_length; /* 2 2 */
rte_be16_t packet_id; /* 4 2 */
rte_be16_t fragment_offset; /* 6 2 */
uint8_t time_to_live; /* 8 1 */
uint8_t next_proto_id; /* 9 1 */
rte_be16_t hdr_checksum; /* 10 2 */
rte_be32_t src_addr; /* 12 4 */
rte_be32_t dst_addr; /* 16 4 */
/* size: 20, cachelines: 1, members: 10 */
/* last cacheline: 20 bytes */
} __attribute__((__aligned__(2)));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will also fix for ipv6 hdr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are dedicated functions to fill in ip(v6) headers based on local data.
grout/modules/ip/datapath/gr_ip4_datapath.h
Lines 40 to 52 in 7135627
static inline void ip_set_fields(struct rte_ipv4_hdr *ip, struct ip_local_mbuf_data *data) { | |
ip->version_ihl = IPV4_VERSION_IHL; | |
ip->type_of_service = 0; | |
ip->total_length = rte_cpu_to_be_16(data->len + rte_ipv4_hdr_len(ip)); | |
ip->fragment_offset = 0; | |
ip->packet_id = 0; | |
ip->time_to_live = data->ttl ?: IPV4_DEFAULT_TTL; | |
ip->next_proto_id = data->proto; | |
ip->src_addr = data->src; | |
ip->dst_addr = data->dst; | |
ip->hdr_checksum = 0; | |
ip->hdr_checksum = rte_ipv4_cksum(ip); | |
} |
grout/modules/ip6/datapath/gr_ip6_datapath.h
Lines 36 to 49 in 7135627
static inline void ip6_set_fields( | |
struct rte_ipv6_hdr *ip, | |
uint16_t len, | |
uint8_t proto, | |
const struct rte_ipv6_addr *src, | |
const struct rte_ipv6_addr *dst | |
) { | |
ip->vtc_flow = RTE_BE32(0x60000000); | |
ip->payload_len = rte_cpu_to_be_16(len); | |
ip->proto = proto; | |
ip->hop_limits = IP6_DEFAULT_HOP_LIMIT; | |
ip->src_addr = *src; | |
ip->dst_addr = *dst; | |
} |
ip->dst_addr = d->dst; | ||
ip->payload_len = rte_cpu_to_be_16(d->len); | ||
ip->hop_limits = d->hop_limit; | ||
ip->proto = d->proto; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idem, not initialized fields.
@@ -134,6 +134,7 @@ static void iface_loopback_poll(evutil_socket_t, short reason, void *ev_iface) { | |||
|
|||
// packet sent from linux tun iface, no need to compute checksum; | |||
mbuf->ol_flags = RTE_MBUF_F_RX_IP_CKSUM_GOOD; | |||
mbuf->packet_type = data[0] == 6 ? RTE_PTYPE_L3_IPV6 : RTE_PTYPE_L3_IPV4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, ok.
I guess the previous patch relies on this.
This must be documented, or a different mechanism is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wished to avoid setting unnecessary data if the packet type was already defined by the drivers.
The other option is to add a family
field in the buffer metadata for both ip_local_mbuf_data
and ip6_local_mbuf_data
to define which kind of header we should fill in.
I found it uglier as the 2 priv data types are defined in different modules.
GR_MBUF_PRIV_DATA_TYPE(ip_local_mbuf_data, {
+ uint8_t family;
ip4_addr_t src;
ip4_addr_t dst;
uint16_t len;
uint16_t vrf_id;
uint8_t proto;
uint8_t ttl;
});
GR_MBUF_PRIV_DATA_TYPE(ip6_local_mbuf_data, {
+ uint8_t family;
struct rte_ipv6_addr src;
struct rte_ipv6_addr dst;
uint16_t len;
uint8_t hop_limit;
uint8_t proto;
});
@rjarry Any opinion about that ?
Should we define instead a common header for both local_mbuf_data instead ?
GR_MBUF_PRIV_DATA_TYPE(ip46_local_mbuf_data, {
uint8_t family;
union {
struct rte_ipv6_addr v6;
ip4_addr_t v4;
} src;
union {
struct rte_ipv6_addr v6;
ip4_addr_t v4;
} dst;
uint16_t len;
union {
uint8_t hop_limit;
uint8_t ttl;
};
uint8_t proto;
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh dear. I think I prefer the packet_type
hacking, even though it is ugly.
|
||
static void l4_input_local_register(void) { | ||
ip_input_local_add_proto(IPPROTO_UDP, "l4_input_local"); | ||
ip6_input_local_add_proto(IPPROTO_TCP, "l4_input_local"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ouch .... should register for both ip4/udp ip6/udp and ip4/tcp ip4/tcp
static struct rte_node_register tcp_redirect_loopback_node = { | ||
.name = "tcp_redirect_loopback", | ||
.process = redirect_loopback_process, | ||
.nb_edges = EDGE_COUNT, | ||
.next_nodes = { | ||
[REDIRECT] = "loopback_output", | ||
[NO_IFACE] = "no_loop_iface", | ||
}, | ||
}; | ||
|
||
static struct rte_node_register udp_redirect_loopback_node = { | ||
.name = "udp_redirect_loopback", | ||
.process = redirect_loopback_process, | ||
.nb_edges = EDGE_COUNT, | ||
.next_nodes = { | ||
[REDIRECT] = "loopback_output", | ||
[NO_IFACE] = "no_loop_iface", | ||
}, | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked the idea of having two separate nodes for tcp and udp. We may not want to register the same local ports depending on the protocol.
Maybe these ports should be renamed to tcp_input_local and udp_input_local which would (or would not) redirect to loopback_output after reconstructing the ip(v6) headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already the case, the register function already takes a proto as a parameter, so we can select TCP/UDP/...
We only support udp port as of now, as I'm not yet aware of a TCP based tunnel.
All non-handled traffic will be redirected to the loop interface.
Even this behavior may have to be refined, in a future patch.
}; | ||
|
||
static struct rte_node_register sctp_redirect_loopback_node = { | ||
.name = "sctp_redirect_loopback", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, sctp is probably out of scope.
5c890dd
to
d9c6ad3
Compare