Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OVN-IC segfaulted due to a warn log #268

Open
positiveEV opened this issue Jan 6, 2025 · 2 comments
Open

OVN-IC segfaulted due to a warn log #268

positiveEV opened this issue Jan 6, 2025 · 2 comments

Comments

@positiveEV
Copy link

Hi,

When starting ovn-ic with a log level superior at Err, it segfaults.

Environment:
OS: Debian bookworm
ovn-ic version: 23.03.1-1~deb12u2 and was able to reproduce using 24.09.0-1 from debian testing

Here is the gdb full backtrace, I don't know what else could be usefull.

#1  0x0000555555586fb8 in add_network_to_routes_ad (routes_ad=<optimized out>, network=<optimized out>, nb_lrp=0x5555559582f0, nexthop_addresses=0x7fffffffe790, nb_options=<optimized out>, nb_lr=<optimized out>) at ic/ovn-ic.c:1248
        prefix = {__in6_u = {__u6_addr8 = "\000\000\000\000\000\000\000\000\000\000\377\377\271\016\202\201", __u6_addr16 = {0, 0, 0, 0, 0, 65535, 3769, 33154}, __u6_addr32 = {0, 0, 4294901760, 2172784313}}}
        nexthop = {__in6_u = {__u6_addr8 = "\000\000\000\000\000\000\000\000\000\000\377\377\251\376o!", __u6_addr16 = {0, 0, 0, 0, 0, 65535, 65193, 8559}, __u6_addr32 = {0, 0, 4294901760, 560987817}}}
        plen = 32
        prefix = <optimized out>
        nexthop = <optimized out>
        plen = <optimized out>
        level__ = <optimized out>
        __a = <optimized out>
        msg = <optimized out>
        __a = <optimized out>
        level__ = <optimized out>
#2  build_ts_routes_to_adv (ctx=0x7fffffffe6c0, ic_lr=0x55555596d260, routes_ad=0x555555a1c3c0, ts_port_addrs=0x7fffffffe790, nb_global=<optimized out>, ts_route_table=<optimized out>) at ic/ovn-ic.c:1615
        j = 0
        lrp = 0x5555559582f0
        i = 2
        lr = <optimized out>
        lr = <optimized out>
        i = <optimized out>
        nb_route = <optimized out>
        isb_uuid = <optimized out>
        rl = {token_bucket = {rate = 5, burst = 60000, tokens = 0, last_fill = -9223372036854775808}, first_dropped = 0, last_dropped = 0, n_dropped = 0, mutex = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 2, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}},
              __size = '\000' <repeats 16 times>, "\002", '\000' <repeats 22 times>, __align = 0}, where = 0x55555576f355 "<unlocked>"}}
        level__ = <optimized out>
        i = <optimized out>
        lrp = <optimized out>
        j = <optimized out>
        level__ = <optimized out>
#3  collect_lr_routes (ctx=0x7fffffffe6c0, ic_lr=0x55555596d260, routes_ad_by_ts=0x7fffffffe5a0) at ic/ovn-ic.c:1673
        i = 0
        lrp_name = <optimized out>
        ts_name = <optimized out>
        key = <optimized out>
        nb_global = 0x555555956040
        isb_pb = <optimized out>
        route_table = <optimized out>
        ts_port_addrs = {ea_s = "00:16:3e:1a:b7:85", ea = {{ea = "\000\026>\032\267\205", be16 = {5632, 6718, 34231}}}, n_ipv4_addrs = 1, ipv4_addrs = 0x555555a29690, n_ipv6_addrs = 1, ipv6_addrs = 0x5555559415b0}
        routes_ad = 0x555555a1c3c0
        t_sw = <optimized out>
        nb_global = <optimized out>
        __func__ = "collect_lr_routes"
        isb_pb = <optimized out>
        lrp_name = <optimized out>
        ts_name = <optimized out>
        route_table = <optimized out>
        ts_port_addrs = <optimized out>
        key = <optimized out>
        routes_ad = <optimized out>
        t_sw = <optimized out>
        i = <optimized out>
        rl = {token_bucket = {rate = 5, burst = 60000, tokens = 0, last_fill = -9223372036854775808}, first_dropped = 0, last_dropped = 0, n_dropped = 0, mutex = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 2, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}},
              __size = '\000' <repeats 16 times>, "\002", '\000' <repeats 22 times>, __align = 0}, where = 0x55555576f355 "<unlocked>"}}
        level__ = <optimized out>
#4  route_run (ctx=<optimized out>, az=<optimized out>) at ic/ovn-ic.c:1779
        ic_lr__iterator__ = 0x55555596d260
        ic_lr__iterator__next__ = 0x0
        ic_lrs = {buckets = 0x7fffffffe588, one = 0x55555596d260, mask = 0, n = 1}
        ic_lr = 0x55555596d260
        node = <optimized out>
        isb_pb = <optimized out>

I briefly exchanged about it with @fnordahl on irc.

fnordahl added a commit to fnordahl/ovn that referenced this issue Jan 7, 2025
Commit ac1dc2b introduced re-use of the add_to_routes_ad()
function for both connected and static routes, however it did
not add conditionals for handling warning level logging of
duplicate routes.

Reported-at: ovn-org#268
Fixes: ac1dc2b ("ic: prevent advertising/learning multiple same routes")
Signed-off-by: Frode Nordahl <[email protected]>
@fnordahl
Copy link
Member

fnordahl commented Jan 7, 2025

Thank you for the report! This appears to be an issue at main and as such is not specific to the Debian/Ubuntu packages.

While we should definitively avoid crashing, reproducing the issue appears to require multiple LRPs to be configured with the same IP address in the same AZ. And I wonder if this is a valid configuration in the first place?

Does this match what you see in your environment, and can you please elaborate a bit on the intended use case?

In any case, this patch should fix the crash: https://mail.openvswitch.org/pipermail/ovs-dev/2025-January/419460.html

ovsrobot pushed a commit to ovsrobot/ovn that referenced this issue Jan 7, 2025
Commit ac1dc2b introduced re-use of the add_to_routes_ad()
function for both connected and static routes, however it did
not add conditionals for handling warning level logging of
duplicate routes.

Reported-at: ovn-org#268
Fixes: ac1dc2b ("ic: prevent advertising/learning multiple same routes")
Signed-off-by: Frode Nordahl <[email protected]>
Signed-off-by: 0-day Robot <[email protected]>
@positiveEV
Copy link
Author

Thanks for the quick fix and reply.

I tried to reproduce the issue by reinstalling my two OVN test clusters. This led me to find small issues in my Ansible playbook, which resulted in a misconfiguration of the OVN IC database. I probably had two OVN IC DB leaders—one to which all other nodes connected and one alone—but all the OVN OC DB nodes were configured as northbound and southbound.

With the Ansible playbook fixed and the OVN clusters reinstalled, I have not been able to reproduce the issue so far, despite using the same Incus commands. I will try some more and keep you updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants