-
Notifications
You must be signed in to change notification settings - Fork 116
[oneTBB] Add a function to create a set of NUMA bound task arenas #650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 10 commits
f819fc1
6caccd4
b9cc94d
43f749f
2438028
532fbb4
25159ee
94f1d5d
c08478f
d2be07c
c16ef5a
b730cab
52a15d7
e93ac74
5089b5a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -43,9 +43,9 @@ A class that represents an explicit, user-managed task scheduler arena. | |||||||||||||||||||
| }; | ||||||||||||||||||||
|
|
||||||||||||||||||||
| task_arena(int max_concurrency = automatic, unsigned reserved_for_masters = 1, | ||||||||||||||||||||
| priority a_priority = priority::normal); | ||||||||||||||||||||
| priority a_priority = priority::normal); | ||||||||||||||||||||
| task_arena(constraints a_constraints, unsigned reserved_for_masters = 1, | ||||||||||||||||||||
| priority a_priority = priority::normal); | ||||||||||||||||||||
| priority a_priority = priority::normal); | ||||||||||||||||||||
| task_arena(const task_arena &s); | ||||||||||||||||||||
| explicit task_arena(oneapi::tbb::attach); | ||||||||||||||||||||
| ~task_arena(); | ||||||||||||||||||||
|
|
@@ -71,6 +71,9 @@ A class that represents an explicit, user-managed task scheduler arena. | |||||||||||||||||||
| task_group_status task_arena::wait_for(task_group& tg); | ||||||||||||||||||||
| }; | ||||||||||||||||||||
|
|
||||||||||||||||||||
| std::vector<task_arena> create_numa_task_arenas(task_arena::constraints constraints = {}, | ||||||||||||||||||||
| unsigned reserved_slots = 0); | ||||||||||||||||||||
akukanov marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||||
|
|
||||||||||||||||||||
| } // namespace tbb | ||||||||||||||||||||
| } // namespace oneapi | ||||||||||||||||||||
|
|
||||||||||||||||||||
|
|
@@ -329,6 +332,23 @@ Member functions | |||||||||||||||||||
|
|
||||||||||||||||||||
| The behavior of this function is equivalent to ``this->execute([&tg]{ return tg.wait(); })``. | ||||||||||||||||||||
|
|
||||||||||||||||||||
| Non-member Functions | ||||||||||||||||||||
| -------------------- | ||||||||||||||||||||
|
|
||||||||||||||||||||
| .. cpp:function:: std::vector<task_arena> create_numa_task_arenas(task_arena::constraints constraints = {}, unsigned reserved_slots = 0) | ||||||||||||||||||||
|
|
||||||||||||||||||||
| Returns a ``std::vector`` of non-initialized ``task_arena`` objects, each bound to a separate NUMA node. | ||||||||||||||||||||
| The number of created ``task_arena`` is equal to the number of NUMA nodes on the system, | ||||||||||||||||||||
isaevil marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||||
| as determined by ``tbb::info::numa_nodes()``. If an error occurs during system information discovery, | ||||||||||||||||||||
| returns a ``std::vector`` containing a single ``task_arena`` object created as ``task_arena(constraints, reserved_slots)``. | ||||||||||||||||||||
akukanov marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||||
|
|
||||||||||||||||||||
| The ``constraints`` argument can be specified to further limit threads joining | ||||||||||||||||||||
isaevil marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||||
| the ``task_arena`` objects. The ``numa_id`` value in ``constraints`` is automatically | ||||||||||||||||||||
| replaced with a separate value for each created arena. | ||||||||||||||||||||
|
||||||||||||||||||||
| the ``task_arena`` objects. The ``numa_id`` value in ``constraints`` is automatically | |
| replaced with a separate value for each created arena. | |
| the ``task_arena`` objects. The ``numa_id`` value specified in ``constraints`` is disregarded, and instead, | |
| the corresponding ``numa_id`` from ``tbb::info::numa_nodes()`` is used during construction | |
| of each particular ``task_arena``. |
Also, should we specify this somehow if the error occurs during the system parsing as part of create_numa_task_arenas()? Will the numa_id be ignored in this case?
I understand that if the error occurs in create_numa_task_arenas, then it is hard to imagine that there is a valid numa_id in constrains, but may be it still make sense to specify the behavior explicitly somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using another value instead of an ignored parameter and replacing that parameter with the other value is essentially the same thing. Either wording does not mandate a specific implementation. The proposed possible implementation (https://github.com/uxlfoundation/oneTBB/blob/master/rfcs/proposed/numa_support/create-numa-arenas.md#possible-implementation) uses the replacement, I would say.
I wanted to avoid words like "ignored" - especially without saying what is used instead. But I can see how this might still be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is better to say that the numa_id parameter for constraints will be automatically set to a certain value:
| the ``task_arena`` objects. The ``numa_id`` value in ``constraints`` is automatically | |
| replaced with a separate value for each created arena. | |
| the ``task_arena`` objects. For each created arena, the ``numa_id`` value in ``constraints`` | |
| is automatically set to the corresponding NUMA node ID from ``tbb::info::numa_nodes()``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied suggestions from here and in the comment above from the different discussion.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern demonstrates the usage, but it would be better to execute in one task_arena after enqueuing in the others. Since none of the arenas reserve a slot for a master, the main thread might wait on a fully subscribed arena, and so cannot actively participate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember, it's the specification. The examples here are intended to illustrate usage. In the developer guide, we have a better example: https://uxlfoundation.github.io/oneTBB/main/tbb_userguide/Guiding_Task_Scheduler_Execution.html#set-numa-node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's not wise to include an anti-pattern anywhere. It's easy to make this mistake and then lose some parallelism as a result. But I won't insist if others think the simplicity of the example is more important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I never viewed that as an anti-pattern.
On the other hand, the example is not only for the NUMA function but a task_arena in general, so doing execute directly into one of the arenas makes some sense here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't say that this is an anti-pattern either primarily because the description of execute (I am referring to execute due to wait_for inheriting its semantics) doesn't state specifically, whether joining thread can or cannot push some worker out of arena and execute the work on its behalf. It only specifies its behavior in cases when it is possible to join or not.
But perhaps for more task_arena API coverage, we could rewrite the example to something like this:
#include "oneapi/tbb/task_group.h"
#include "oneapi/tbb/task_arena.h"
#include <vector>
int main() {
std::vector<oneapi::tbb::task_arena> arenas = oneapi::tbb::create_numa_task_arenas();
std::vector<oneapi::tbb::task_group> task_groups(arenas.size());
for (int i = 1; i < arenas.size(); i++) {
arenas[i].enqueue([]{
/* executed by a thread pinned to the specified NUMA node */
}, task_groups[i]);
}
arenas[0].execute([&task_groups] {
task_groups[0].run_and_wait([] { /* parallel work */});
});
for (int i = 1; i < arenas.size(); i++) {
arenas[i].wait_for(task_groups[i]);
}
return 0;
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't actually need a task_group at all for the execute, since you will call run_and_wait. Just do the parallel work directly in the body of the execute for arena[0], right? And since no work is in arena[0] it can't be fully populated already since there was no work to attract workers until the call to execute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. I just didn't want to break that 1:1 match of task_groups to task_arenas. Otherwise the example becomes the following:
#include "oneapi/tbb/task_group.h"
#include "oneapi/tbb/task_arena.h"
#include <vector>
int main() {
std::vector<oneapi::tbb::task_arena> arenas = oneapi::tbb::create_numa_task_arenas();
std::vector<oneapi::tbb::task_group> task_groups(arenas.size()-1);
for (int i = 1; i < arenas.size(); i++) {
arenas[i].enqueue([]{
/* executed by a thread pinned to the specified NUMA node */
}, task_groups[i-1]);
}
arenas[0].execute([] {
/* parallel work */
});
for (int i = 1; i < arenas.size(); i++) {
arenas[i].wait_for(task_groups[i-1]);
}
return 0;
}Or simply still create equal number of task_group and task_arena but never use first one in example:
#include "oneapi/tbb/task_group.h"
#include "oneapi/tbb/task_arena.h"
#include <vector>
int main() {
std::vector<oneapi::tbb::task_arena> arenas = oneapi::tbb::create_numa_task_arenas();
std::vector<oneapi::tbb::task_group> task_groups(arenas.size());
for (int i = 1; i < arenas.size(); i++) {
arenas[i].enqueue([]{
/* executed by a thread pinned to the specified NUMA node */
}, task_groups[i]);
}
arenas[0].execute([] {
/* parallel work */
});
for (int i = 1; i < arenas.size(); i++) {
arenas[i].wait_for(task_groups[i]);
}
return 0;
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either seems good enough to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose there's no particular reason to assume that arena[0] is the one that should be executed in. So you could create 1 less arena, enqueue into the first size()-1 and then execute in the last arena. Perhaps then the first option looks slightly better not needing to mismatch the indexes when calling enqueue. But I'd also be ok with either of the suggested examples as they are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have limit on the line width? Proper alignment will help avoiding unnecessary horizontal scrollings and make the rendering neater.
Consider:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HTML page looks like this:

Seems acceptable, and I am not sure if manual line break can do better in this case.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, you can't make indentation when using
.. cpp::functionwith line breaks, so at best it will end up like this:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can reduce the clutter by using shorter names. For example, I find shortening of
constraintsfunction parameter to justcattractive, because:task_arena::constraintstype conveys the semantics.ccould act similarly to constructs like loop counters. Meaning, that readers are unlikely to forget what thatcis about.Similarly, the
a_prioritycan be reduced to justpwithout loosing much of the readability.This should not be overused though. For example, I would not contract the
reserved_slotsbecause its type does not convey the meaning, so the name is essential here.