Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lock to watcher hash map to prevent concurrent access panics #161

Merged
merged 1 commit into from
May 3, 2024

Conversation

jabdoa2
Copy link
Contributor

@jabdoa2 jabdoa2 commented May 2, 2024

I rolled out Wave 0.7 to a cluster with about 1k deployments. Migation went smooth. However, it crashed due to a concurrent access to the new watcher hashmap. Apparently, watchers are separate goroutines and run concurrently:

fatal error: concurrent map read and map write

goroutine 199 [running]:
github.com/wave-k8s/wave/pkg/core.(*enqueueRequestForWatcher).queueOwnerReconcileRequest(0xc00049c3e8, {0x1b49930?, 0xc00f2c5b80?}, {0x1b3ff10, 0xc000498160})

I fixed that by adding mutexes. Obviously, access to this hashmap can be optimized further.

However, even with this crash it performs well. Restarts are fast enough and the other controller will take over within a few seconds. Even though it crashes occasionally the overall CPU load is lower than before:

grafik

Same appears to be the case for memory usage:

grafik

Upgrade happened around 21:00 which explains the small CPU spike during that time.

@jabdoa2 jabdoa2 requested a review from a team as a code owner May 2, 2024 19:38
@toelke toelke force-pushed the fix_concurrent_map_access branch from d818fe6 to 761ba66 Compare May 3, 2024 07:12
@toelke toelke merged commit 4080453 into wave-k8s:master May 3, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants