Skip to content
Merged
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have limit on the line width? Proper alignment will help avoiding unnecessary horizontal scrollings and make the rendering neater.

Consider:

Image Image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTML page looks like this:
image
Seems acceptable, and I am not sure if manual line break can do better in this case.

Copy link
Contributor Author

@isaevil isaevil Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, you can't make indentation when using .. cpp::function with line breaks, so at best it will end up like this:

std::vector<task_arena> create_numa_task_arenas(
task_arena::constraints constraints = {}, 
int reserved_slots = 0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can reduce the clutter by using shorter names. For example, I find shortening of constraints function parameter to just c attractive, because:

  1. The task_arena::constraints type conveys the semantics.
  2. The description from where the parameter is referred to mentions that these are constraints.
  3. The description of the function is short enough and scoped, so c could act similarly to constructs like loop counters. Meaning, that readers are unlikely to forget what that c is about.

Similarly, the a_priority can be reduced to just p without loosing much of the readability.

This should not be overused though. For example, I would not contract the reserved_slots because its type does not convey the meaning, so the name is essential here.

Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,9 @@ A class that represents an explicit, user-managed task scheduler arena.
};

task_arena(int max_concurrency = automatic, unsigned reserved_for_masters = 1,
priority a_priority = priority::normal);
priority a_priority = priority::normal);
task_arena(constraints a_constraints, unsigned reserved_for_masters = 1,
priority a_priority = priority::normal);
priority a_priority = priority::normal);
task_arena(const task_arena &s);
explicit task_arena(oneapi::tbb::attach);
~task_arena();
Expand All @@ -71,6 +71,9 @@ A class that represents an explicit, user-managed task scheduler arena.
task_group_status task_arena::wait_for(task_group& tg);
};

std::vector<task_arena> create_numa_task_arenas(task_arena::constraints constraints = {},
unsigned reserved_slots = 0);

} // namespace tbb
} // namespace oneapi

Expand Down Expand Up @@ -329,6 +332,23 @@ Member functions

The behavior of this function is equivalent to ``this->execute([&tg]{ return tg.wait(); })``.

Non-member Functions
--------------------

.. cpp:function:: std::vector<task_arena> create_numa_task_arenas(task_arena::constraints constraints = {}, unsigned reserved_slots = 0)

Returns a ``std::vector`` of non-initialized ``task_arena`` objects, each bound to a separate NUMA node.
The number of created ``task_arena`` is equal to the number of NUMA nodes on the system,
as determined by ``tbb::info::numa_nodes()``. If an error occurs during system information discovery,
returns a ``std::vector`` containing a single ``task_arena`` object created as ``task_arena(constraints, reserved_slots)``.

The ``constraints`` argument can be specified to further limit threads joining
the ``task_arena`` objects. The ``numa_id`` value in ``constraints`` is automatically
replaced with a separate value for each created arena.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second sentence is not 100% clear for me.
As far as I understand, the numa_id value in constraints is ignored and the corresponding numa_id` from tbb::info::numa_nodes`` is used instead.

May be something like this will be easier to understand?

Suggested change
the ``task_arena`` objects. The ``numa_id`` value in ``constraints`` is automatically
replaced with a separate value for each created arena.
the ``task_arena`` objects. The ``numa_id`` value specified in ``constraints`` is disregarded, and instead,
the corresponding ``numa_id`` from ``tbb::info::numa_nodes()`` is used during construction
of each particular ``task_arena``.

Also, should we specify this somehow if the error occurs during the system parsing as part of create_numa_task_arenas()? Will the numa_id be ignored in this case?

I understand that if the error occurs in create_numa_task_arenas, then it is hard to imagine that there is a valid numa_id in constrains, but may be it still make sense to specify the behavior explicitly somehow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using another value instead of an ignored parameter and replacing that parameter with the other value is essentially the same thing. Either wording does not mandate a specific implementation. The proposed possible implementation (https://github.com/uxlfoundation/oneTBB/blob/master/rfcs/proposed/numa_support/create-numa-arenas.md#possible-implementation) uses the replacement, I would say.

I wanted to avoid words like "ignored" - especially without saying what is used instead. But I can see how this might still be confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is better to say that the numa_id parameter for constraints will be automatically set to a certain value:

Suggested change
the ``task_arena`` objects. The ``numa_id`` value in ``constraints`` is automatically
replaced with a separate value for each created arena.
the ``task_arena`` objects. For each created arena, the ``numa_id`` value in ``constraints``
is automatically set to the corresponding NUMA node ID from ``tbb::info::numa_nodes()``.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied suggestions from here and in the comment above from the different discussion.


The ``reserved_slots`` argument allows reserving a specified number of slots in
each ``task_arena`` object for application threads. By default, no slots are reserved.

Example
-------

Expand All @@ -343,21 +363,16 @@ to the corresponding NUMA node.
#include <vector>

int main() {
std::vector<oneapi::tbb::numa_node_id> numa_nodes = oneapi::tbb::info::numa_nodes();
std::vector<oneapi::tbb::task_arena> arenas(numa_nodes.size());
std::vector<oneapi::tbb::task_group> task_groups(numa_nodes.size());

for (int i = 0; i < numa_nodes.size(); i++) {
arenas[i].initialize(oneapi::tbb::task_arena::constraints(numa_nodes[i]));
}
std::vector<oneapi::tbb::task_arena> arenas = oneapi::tbb::create_numa_task_arenas();
std::vector<oneapi::tbb::task_group> task_groups(arenas.size());

for (int i = 0; i < numa_nodes.size(); i++) {
for (int i = 0; i < arenas.size(); i++) {
arenas[i].enqueue([]{
/* executed by a thread pinned to the specified NUMA node */
}, task_groups[i]);
}

for (int i = 0; i < numa_nodes.size(); i++) {
for (int i = 0; i < arenas.size(); i++) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern demonstrates the usage, but it would be better to execute in one task_arena after enqueuing in the others. Since none of the arenas reserve a slot for a master, the main thread might wait on a fully subscribed arena, and so cannot actively participate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember, it's the specification. The examples here are intended to illustrate usage. In the developer guide, we have a better example: https://uxlfoundation.github.io/oneTBB/main/tbb_userguide/Guiding_Task_Scheduler_Execution.html#set-numa-node

Copy link

@vossmjp vossmjp Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not wise to include an anti-pattern anywhere. It's easy to make this mistake and then lose some parallelism as a result. But I won't insist if others think the simplicity of the example is more important.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I never viewed that as an anti-pattern.

On the other hand, the example is not only for the NUMA function but a task_arena in general, so doing execute directly into one of the arenas makes some sense here.

Copy link
Contributor Author

@isaevil isaevil Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say that this is an anti-pattern either primarily because the description of execute (I am referring to execute due to wait_for inheriting its semantics) doesn't state specifically, whether joining thread can or cannot push some worker out of arena and execute the work on its behalf. It only specifies its behavior in cases when it is possible to join or not.

But perhaps for more task_arena API coverage, we could rewrite the example to something like this:

#include "oneapi/tbb/task_group.h"
#include "oneapi/tbb/task_arena.h"

#include <vector>

int main() {
    std::vector<oneapi::tbb::task_arena> arenas = oneapi::tbb::create_numa_task_arenas();
    std::vector<oneapi::tbb::task_group> task_groups(arenas.size());

    for (int i = 1; i < arenas.size(); i++) {
        arenas[i].enqueue([]{
            /* executed by a thread pinned to the specified NUMA node */
        }, task_groups[i]);
    }

    arenas[0].execute([&task_groups] {
        task_groups[0].run_and_wait([] { /* parallel work */});
    });

    for (int i = 1; i < arenas.size(); i++) {
        arenas[i].wait_for(task_groups[i]);
    }

    return 0;
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't actually need a task_group at all for the execute, since you will call run_and_wait. Just do the parallel work directly in the body of the execute for arena[0], right? And since no work is in arena[0] it can't be fully populated already since there was no work to attract workers until the call to execute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. I just didn't want to break that 1:1 match of task_groups to task_arenas. Otherwise the example becomes the following:

#include "oneapi/tbb/task_group.h"
#include "oneapi/tbb/task_arena.h"

#include <vector>

int main() {
    std::vector<oneapi::tbb::task_arena> arenas = oneapi::tbb::create_numa_task_arenas();
    std::vector<oneapi::tbb::task_group> task_groups(arenas.size()-1);

    for (int i = 1; i < arenas.size(); i++) {
        arenas[i].enqueue([]{
            /* executed by a thread pinned to the specified NUMA node */
        }, task_groups[i-1]);
    }

    arenas[0].execute([] {
        /* parallel work */
    });

    for (int i = 1; i < arenas.size(); i++) {
        arenas[i].wait_for(task_groups[i-1]);
    }

    return 0;
}

Or simply still create equal number of task_group and task_arena but never use first one in example:

#include "oneapi/tbb/task_group.h"
#include "oneapi/tbb/task_arena.h"

#include <vector>

int main() {
    std::vector<oneapi::tbb::task_arena> arenas = oneapi::tbb::create_numa_task_arenas();
    std::vector<oneapi::tbb::task_group> task_groups(arenas.size());

    for (int i = 1; i < arenas.size(); i++) {
        arenas[i].enqueue([]{
            /* executed by a thread pinned to the specified NUMA node */
        }, task_groups[i]);
    }

    arenas[0].execute([] {
        /* parallel work */
    });

    for (int i = 1; i < arenas.size(); i++) {
        arenas[i].wait_for(task_groups[i]);
    }

    return 0;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either seems good enough to me.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose there's no particular reason to assume that arena[0] is the one that should be executed in. So you could create 1 less arena, enqueue into the first size()-1 and then execute in the last arena. Perhaps then the first option looks slightly better not needing to mismatch the indexes when calling enqueue. But I'd also be ok with either of the suggested examples as they are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the example.

arenas[i].wait_for(task_groups[i]);
}

Expand Down
Loading