-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Don't spin on the main mutex while waiting for new work (#8433)
* Don't spin on the main mutex while waiting for new work Once they run out of work to do, Halide worker threads spin for a bit checking if new work has been enqueued before calling cond_wait, which puts them to sleep until signaled. Job owners also spin waiting for their job to complete before going to sleep on a different condition variable. I hate this, but all previous attempts I have made at removing or reducing the spinning have made things slower. One problem with this approach is that spinning is done by releasing the work queue lock, yielding, reacquiring the work queue lock, and doing the somewhat involved check to see if there's something useful for this thread to do, either because new work was enqueued, the last item on a job completed, or a semaphore was released. This hammering of the lock by idle worker threads can starve the thread that actually completed the last task, delaying its ability to tell the job owner the job is done, and can also starve the job owner, causing it to take extra time to realize the job is all done and return back into Halide code. So this adds some wasted time at the end of every parallel for loop. This PR gets these idle threads to spin off the main mutex. I did this by combining a counter with a condition variable. Any time they are signaled, the counter is atomically incremented. Before they first release the lock, the idlers atomically capture the value of this counter. Then in cond_wait they spin for a bit doing atomic loads of the counter in between yields until it changes, in which case they grab the lock and return, or until they reach the spin count limit, in which case they go to sleep. This improved performance quite a bit over main for the blur app, which is a fast pipeline (~100us) with fine-grained parallelism. The speed-up was 1.2x! Not much effect on the more complex apps.
- Loading branch information
Showing
2 changed files
with
71 additions
and
35 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters