Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scx_rusty: Temporary fix of duplicate active tptr #941

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vax-r
Copy link
Contributor

@vax-r vax-r commented Nov 19, 2024

Summary

Under severe load unbalance scenario such as mixtures of CPU-insensive workload and I/O-intensive worload, same tptr may be written into the same dom_active_tptrs's array.

It will lead to load balancer's failure because when the tptr task contains large enough load, it tends be to selected so warnings about same tptr being set in "lb_data" will continue to pop up.

Use a workaround for now , which is to keep a HashSet in userspace recording the current active tptr under a domain, and do not generate the same task repeatedly.

Test

The test ran on AMD Ryzen 7 5700X3D 8-Core Processor , architecture is x86_64

Reproduce the bug

Because my machine is small and doesn't have multiple NUMA nodes, we utilize the following command to create several logical domains

$ sudo ./build/debug/scx_rusty --cpumasks 0xf000 0x0f00 0x00f0 0x000f

After scx_rusty is loaded and running successfully, open another terminal and start to compile a kernel

$ make defconfig
$ make -j 4

And you should see many warnings starting to popping up

$ sudo ./build/debug/scx_rusty --cpumasks 0xf000 0x0f00 0x00f0 0x000f
13:48:22 [INFO] Running scx_rusty (build ID: 1.0.6-gb21a3d8-dirty x86_64-unknown-linux-gnu/debug)
13:48:22 [INFO] NODE[00] mask= ffff
13:48:22 [INFO]  DOM[00] mask= f000
13:48:22 [INFO]  DOM[01] mask= 0f00
13:48:22 [INFO]  DOM[02] mask= 00f0
13:48:22 [INFO]  DOM[03] mask= 000f
13:48:23 [WARN] libbpf: map 'rusty': BPF skeleton version is old, skipping map auto-attachment...

13:48:23 [INFO] Rusty scheduler started! Run `scx_rusty --monitor` for metrics.
13:48:35 [WARN] Failed to update lb_data map for tptr=18446617609326231552 error=Error: File exists (os error 17)
13:48:35 [WARN] Failed to update lb_data map for tptr=18446617609326231552 error=Error: File exists (os error 17)
13:48:35 [WARN] Failed to update lb_data map for tptr=18446617609326231552 error=Error: File exists (os error 17)
13:48:35 [WARN] Failed to update lb_data map for tptr=18446617606378771456 error=Error: File exists (os error 17)
13:48:37 [WARN] Failed to update lb_data map for tptr=18446617616985729536 error=Error: File exists (os error 17)
13:48:37 [WARN] Failed to update lb_data map for tptr=18446617616985729536 error=Error: File exists (os error 17)
13:48:37 [WARN] Failed to update lb_data map for tptr=18446617616985729536 error=Error: File exists (os error 17)
13:48:39 [WARN] Failed to update lb_data map for tptr=18446617609326231552 error=Error: File exists (os error 17)
13:48:39 [WARN] Failed to update lb_data map for tptr=18446617609326231552 error=Error: File exists (os error 17)
13:48:39 [WARN] Failed to update lb_data map for tptr=18446617589089850368 error=Error: File exists (os error 17)
13:48:39 [WARN] Failed to update lb_data map for tptr=18446617589089850368 error=Error: File exists (os error 17)
13:48:39 [WARN] Failed to update lb_data map for tptr=18446617589089850368 error=Error: File exists (os error 17)
13:48:39 [WARN] Failed to update lb_data map for tptr=18446617589089850368 error=Error: File exists (os error 17)
13:48:39 [WARN] Failed to update lb_data map for tptr=18446617589221118976 error=Error: File exists (os error 17)
13:48:39 [WARN] Failed to update lb_data map for tptr=18446617589221118976 error=Error: File exists (os error 17)
13:48:39 [WARN] Failed to update lb_data map for tptr=18446617608686818304 error=Error: File exists (os error 17)
13:48:39 [WARN] Failed to update lb_data map for tptr=18446617608686818304 error=Error: File exists (os error 17)
13:48:39 [WARN] Failed to update lb_data map for tptr=18446617602726777856 error=Error: File exists (os error 17)

If you try to add a debug print inside populate_tasks_by_load() like the following ( from line 682)

        for idx in ridx..widx {
            let tptr = active_tptrs.tptrs[(idx % MAX_TPTRS) as usize];
            let key = unsafe { std::mem::transmute::<u64, [u8; 8]>(tptr) };

            if dom.active_tptr_set.contains(&tptr) {
                warn!("Same task has already exist in the domain task array , tptr = {}", tptr);
            }

You should see the problem actually rise from this function.

13:50:59 [INFO] Rusty scheduler started! Run `scx_rusty --monitor` for metrics.
13:51:03 [WARN] Same task has already exist in the domain task array , tptr = 18446617608929968128
13:51:03 [WARN] Same task has already exist in the domain task array , tptr = 18446617589176401920
13:51:03 [WARN] Same task has already exist in the domain task array , tptr = 18446617589139493888
13:51:03 [WARN] Same task has already exist in the domain task array , tptr = 18446617596886592000
13:51:03 [WARN] Same task has already exist in the domain task array , tptr = 18446617608435400704
13:51:03 [WARN] Same task has already exist in the domain task array , tptr = 18446617608929978880
13:51:03 [WARN] Same task has already exist in the domain task array , tptr = 18446617589269757952
13:51:03 [WARN] Same task has already exist in the domain task array , tptr = 18446617608929978880
13:51:03 [WARN] Same task has already exist in the domain task array , tptr = 18446617589362633216
13:51:03 [WARN] Same task has already exist in the domain task array , tptr = 18446617608713851904
13:51:03 [WARN] Same task has already exist in the domain task array , tptr = 18446617602615585280
13:51:05 [WARN] Same task has already exist in the domain task array , tptr = 18446617606378771456
13:51:05 [WARN] Same task has already exist in the domain task array , tptr = 18446617606378771456
13:51:05 [WARN] Same task has already exist in the domain task array , tptr = 18446617589954379776
13:51:05 [WARN] Same task has already exist in the domain task array , tptr = 18446617616986046464
13:51:05 [WARN] Same task has already exist in the domain task array , tptr = 18446617589083996160
13:51:05 [WARN] Same task has already exist in the domain task array , tptr = 18446617606378771456
13:51:05 [WARN] Failed to update lb_data map for tptr=18446617606378771456 error=Error: File exists (os error 17)
13:51:05 [WARN] Failed to update lb_data map for tptr=18446617606378771456 error=Error: File exists (os error 17)

@vax-r
Copy link
Contributor Author

vax-r commented Nov 19, 2024

I think the problem actually comes from rusty_running, the same task might have more chance to be moved around different domains while more severe load unbalance. Maybe instead simple array, we should implement or take some mechanism similar to LRU for dom_active_tptrs, so we can keep the latest task on the top of the array/queue, without having multiple duplicants of it.

@hodgesds maybe you can suggest something ?
I think this workaround will work but maybe we can have something better to essentially fix the problem.

Under severe load unbalance scenario such as mixtures of CPU-insensive
workload and I/O-intensive worload, same tptr may be written into the
same dom_active_tptrs's array.

It will lead to load balancer's failure because when the tptr task
contains large enough load, it tends be to selected so warnings about
same tptr being set in "lb_data" will continue to pop up.

Use a workaround for now , which is to keep a HashSet in userspace
recording the current active tptr under a domain, and do not generate
the same task repeatedly.

Signed-off-by: I Hsin Cheng <[email protected]>
@vax-r vax-r force-pushed the rusty_repeated_active_tptrs branch from b21a3d8 to 72ecf3c Compare November 19, 2024 14:03
@hodgesds
Copy link
Contributor

I guess with this solution how would a task get removed from a active_tptr_set? Let me think about this a little more, thanks for looking into this!

@vax-r
Copy link
Contributor Author

vax-r commented Nov 20, 2024

I guess with this solution how would a task get removed from a active_tptr_set? Let me think about this a little more, thanks for looking into this!

@hodgesds -
My thought is that we do not need to remove task from active_tptr_set for the current mechanism, as everytime a load balancing operation start, every structures within a Node/Domain will be re-created. Nor do we remove task from domain.tasks ( because active_tptr_set represent tasks which exists in domain.tasks, if we remove task from active_tptr_set, it means we have to remove task from domain.tasks ,too, however , it's not implemented in current mechanism ) , so I don't think we have to remove anything in user space , at least for now. ( Will have to once we don't have to re-create everthing everytime. )

I am considering to use BPF_MAP_TYPE_LRU_HASH to re-implement dom_active_tptrs after this, so we can keep most recently running tasks within different domains without having multiple duplicants of the same task, what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants