-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend Sockops handling to cgroups v1 #1749
Extend Sockops handling to cgroups v1 #1749
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1749 +/- ##
==========================================
+ Coverage 65.88% 67.09% +1.20%
==========================================
Files 219 219
Lines 22568 22567 -1
==========================================
+ Hits 14869 15141 +272
+ Misses 6934 6668 -266
+ Partials 765 758 -7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @markrogers95 for having a go at this again. Much appreciated.
I've left a few suggestions with regards to style and doing without the multiple filepath.Join()
operations in case you agree with them.
Thanks again for your contribution!
pkg/internal/ebpf/instrumenter.go
Outdated
|
||
enabled, err := v2.Enabled() | ||
if !enabled { | ||
if _, pathErr := os.Stat(filepath.Join(cgroupPath, "unified")); pathErr != nil { | ||
// Return the original error to the caller, pathErr is only required to set the Cgroup path. | ||
// Catch all errors here to capture permissions issues as well as existence errors. | ||
return cgroupPath, err | ||
// In a hybrid cgroup hierarchy the unified path can exist, so check before | ||
// attempting to bind to Cgroups V1. | ||
if _, pathErr := os.Stat(filepath.Join(cgroupPath, "unified")); pathErr == nil { | ||
return filepath.Join(cgroupPath, "unified"), nil | ||
} | ||
cgroupPath = filepath.Join(cgroupPath, "unified") | ||
// If pure Cgroups V1, attempt to bind optimistically to the network hierarchy. | ||
if _, pathErr := os.Stat(filepath.Join(cgroupPath, "net_cls,net_prio")); pathErr == nil { | ||
return filepath.Join(cgroupPath, "net_cls,net_prio"), nil | ||
} | ||
if _, pathErr := os.Stat(filepath.Join(cgroupPath, "net_cls")); pathErr == nil { | ||
return filepath.Join(cgroupPath, "net_cls"), nil | ||
} | ||
if _, pathErr := os.Stat(filepath.Join(cgroupPath, "net_prio")); pathErr == nil { | ||
return filepath.Join(cgroupPath, "net_prio"), nil | ||
} | ||
return "", errors.New("unable to determine a suitable controller for Cgroups V1") | ||
} | ||
// In a pure Cgroups V2 the /sys/fs/cgroup path is the unified hierarchy. | ||
return cgroupPath, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. Now that we have an early return:
return "", errors.New("unable to determine a suitable controller for Cgroups V1")
we can actually reverse the negated logic of enabled
to further improve readability, i.e.
if enabled {
// In a pure Cgroups V2 the /sys/fs/cgroup path is the unified hierarchy.
return "/sys/fs/cgroup", err
}
Another thing that may be worth doing is to collapse all of the paths onto an array - we can do without filepath.Join
altogether (specifically, we don't have to do it twice).
Something like:
// The ordering here matters
paths := []string {
// in a hybrid cgroup hierarchy the unified path can exist, so check
// before attempting to bind to Cgroups V1.
"/sys/fs/cgroup/unified",
// if pure Cgroups V1, attempt to bind optimistically to the network
// hierarchy.
"/sys/fs/cgroup/net_cls,net_prio",
"/sys/fs/cgroup/net_cls",
"/sys/fs/cgroup/net_prio",
}
we replicate the prefix but this is easy to spot as they are aligned, and we can just do a for loop on that instead having multiple similar if
statements and paying the cost of multuple filepath.Join()
.
I modified the code according to your commit, and after compiling, the error still occurred. cgroupPath := "/sys/fs/cgroup"
enabled, err := v2.Enabled()
if enabled {
return cgroupPath, err
}
// In a hybrid cgroup hierarchy the unified path can exist, so check before
// attempting to bind to Cgroups V1.
if _, pathErr := os.Stat(filepath.Join(cgroupPath, "unified")); pathErr == nil {
return filepath.Join(cgroupPath, "unified"), nil
}
// If pure Cgroups V1, attempt to bind optimistically to the network hierarchy.
if _, pathErr := os.Stat(filepath.Join(cgroupPath, "net_cls,net_prio")); pathErr == nil {
return filepath.Join(cgroupPath, "net_cls,net_prio"), nil
}
if _, pathErr := os.Stat(filepath.Join(cgroupPath, "net_cls")); pathErr == nil {
return filepath.Join(cgroupPath, "net_cls"), nil
}
if _, pathErr := os.Stat(filepath.Join(cgroupPath, "net_prio")); pathErr == nil {
return filepath.Join(cgroupPath, "net_prio"), nil
}
return "", errors.New("unable to determine a suitable controller for Cgroups V1")
|
Thanks @marunrun - I've gone deeper into this now and can see that |
@markrogers95 indeed, I had a look at sockops programs will only work with cgroupsv2 (apparently cgroupsv1 made the complexity of attaching and dealing with BPF_PROG_TYPE_SOCK_OPS explode, among other things, for what I could gather). So thank you for spotting this (and thank you @marunrun for raising the alarm). @markrogers95 moving forward, I think we should do what you are proposing and raise a loud and clear error if cgroupv2 is not available. Ultimately, I feel like refactoring this entire function to have it derive the cgroupv2 path from the list of currently mounted filesystems rather than heuristically assuming And finally, to answer your question,whilst I can't tell you for sure, it's completely plausible that the lack of TC tracer is what is causing your woes: the TC tracer is responsible for injecting/propagating trace parents, and if that is not working, you may end up with orphan spans. Feel free to reach out to me on the community slack, happy to set up a call and sit with you to discuss it in case that helps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the PR status based on our discussion.
thanks @rafaelroquetto for confirming - and absolutely agree, did have a look at what it might take to do this for v1 and quickly got lost in the complexity! Will join the community slack, will be super useful to hear about other people's experiences etc, and thanks for your help - might well take you up on that. In the meantime I've returned the hard error here with a debug line in the optimistic case we might have cgroup v2 network controllers in a unified hierarchy. Think looking at the mounted cgroups to ascertain whether the hybrid case works ahead of time will be an improvement too |
Awesome, thank you @markrogers95. This looks good! I think the only thing we need to do before merging this is rebasing (or merging main) this branch. |
nice, thanks @rafaelroquetto - all done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @markrogers95 !!!!
This is a revisit of an already attempted fix - see #1623.
Pulling in the latest changes I still see issues attaching the sockops on cgroups v1 - giving a bad link error, because we end up supplying
/sys/fs/cgroup
for a v1 hierarchy to the linker, which it doesn't like, instead we need to supply a valid v1 cgroup. This leads to thetctracer
being stopped early in the Beyla lifecycle (I believe it's loaded only once) and I believe this is leading to a problem tracking async operations through the system leading to lots of single span traces.The previous PR didn't work with the condition flipped - the detail I missed is because the linking library intelligently handles v2 cgroups where
/sys/fs/cgroup
(as in the current detected v2 case) is the true hierarchy, but the../unified
path can (often) exist as a symlink to the hierarchy. Using a hybrid setup, the /sys/fs/cgroup/unified file can often exist - and so if v2 cgroups is not enabled we still need to stat the path and check. In the case that it doesn't exist we target the cgroups v1 hierarchy instead, binging optimistically to each network controller.Again, not overly familiar with the actual socket operations themselves here so any and all feedback much appreciated. I believe the attachment type as in tctracer.go - namely
ebpf.AttachCGroupSockOps
is valid for cgroupsv1 but any steer appreciated.