Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend AsyncSystem with support for throttling groups, prioritization, and cancelation #789

Open
kring opened this issue Jan 9, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@kring
Copy link
Member

kring commented Jan 9, 2024

Motivation

AsyncSystem::runInWorkerThread (as the name implies) runs a task in a worker thread and returns a Future that resolves when it completes. If this work is CPU bound - as it usually is - it's not desirable to run too many such tasks simultaneously, because the overhead of task switching will become high and all of the tasks will complete slowly.

In practice, though, runInWorkerThread is dispatched via some sort of thread pool. In the case of Cesium for Unreal, these tasks are dispatched to Unreal Engine's task graph, which is similar. The thread pool limits the number of tasks that run simultaneously; any extra tasks are added to a queue. As each task completes, the next task in the queue is dispatched.

This scheme is strictly first in, first out (FIFO). Once runInWorkerThread (or similarly, thenInWorkerThread) is called, the task will run eventually (process exit notwithstanding), and multiple tasks dispatched this way will start in the order in which these methods were called. There is no possibility of canceling or reprioritizing a task that hasn't started yet.

On top of this, not all tasks are CPU bound. Cesium Native also needs to do HTTP requests. Network bandwidth, like CPU time, is a limited resource; attempting to do a very large number of network requests simultaneously is inefficient. While thread pools allow AsyncSystem to use CPU time efficiently, there is no similar mechanism for HTTP requests, GPU time, or any other type of limited resource.

These are pretty big limitations when it comes to complicated asynchronous processes like loading 3D Tiles content. To load tile content, we need to:

  1. Do an HTTP GET for the tile content. But we don't want to do too many at once or performance will suffer.
  2. Parse the downloaded tile content and perform various CPU-intensive operations (image decoding, mesh decompression, creating physics meshes, generating normals, etc.) on it to prepare it for rendering. We don't want to do to many of these at once or we'll monopolize CPU cores or game engine task graph time.
  3. If the parsed content contains references to external content (such as a glTF external buffer or image), we may need to do further network requests. Followed by more CPU work.
  4. On the next frame, we may learn that this tile is now more or less important than it was last frame. Or maybe this tile isn't needed at all anymore and any further work should be canceled (for now).

We have ad-hoc ways of doing some approximation of this. Currently, there is a "number of simultaneous tile loads" per tileset. A tile that is doing any kind of loading - whether network or CPU - counts against this limit. This is inefficient in terms of both network and CPU utilization, as described in #473. We also can't cancel or reprioritize tile loads once they're started, as described in #564.

Proposal

This part is a work in progress! I don't think I have all the details right yet.

AsyncSystem should make this sort of thing easy. First, we define a throttling group:

class ThrottlingGroup {
public:
  ThrottlingGroup(int32_t numberOfSimultaneousTasks);
};

We'll have a ThrottlingGroup instance for network requests, and another instance for CPU-bound background work.

We also define a TaskController class that is used to cancel and prioritize an async "task", which is essentially a chain of Future continuations:

class TaskController {
public:
  TaskController(PriorityGroup initialPriorityGroup, float initialPriorityRank);

  void cancel();
  
  PriorityGroup getPriorityGroup() const;
  void setPriorityGroup(PriorityGroup value);

  float getPriorityRank() const;
  void setPriorityRank(float value);
};

The idea is that we can then write code like this:

AsyncSystem asyncSystem = ...;

IntrusivePointer<ThrottlingGroup> pNetworkRequests =
  new ThrottlingGroup(asyncSystem, 20);
IntrusivePointer<ThrottlingGroup> pCpuProcessing =
  new ThrottlingGroup(asyncSystem, 10);

IntrusivePointer<TaskController> pController =
        new TaskController(PriorityGroup::Normal, 1.0f);

AsyncSystem taskSystem = asyncSystem.withController(pController);

pAssetAccessor
    ->get(
        taskSystem,
        pNetworkRequests,
        "https://example.com/whatever.json",
        {})
    .beginThrottle(pCpuProcessing)
    .thenInWorkerThread([asyncSystem, pNetworkRequests, pAssetAccessor](
                            std::shared_ptr<IAssetRequest>&& pRequest) {
      if (doSomeCpuWorkOnResponse(pRequest->response()->data())) {
        return pAssetAccessor
            ->get(
                taskSystem,
                pNetworkRequests,
                "https://example.com/image.jpg",
                {})
            .thenInWorkerThread(
                [](std::shared_ptr<IAssetRequest>&& pRequest) {
                  doSomeMoreCpuWork(pRequest->response()->data());
                });
      }
      return asyncSystem.createResolvedFuture();
    })
    .endThrottle();

AsyncSystem::withController specializes the AsyncSystem for a given task. It allows the continuations created within it to be prioritized and canceled as a group.

beginThrottle returns a Future that resolves when the task should start. This may not happen right away if too many other tasks are already in progress within the throttling group. When the continuation chain reaches endThrottle, the throttled portion of the task is complete and other tasks waiting in the same throttling group may begin (beginning with the one that is now highest priority).

In this example, we do a network request. Then do throttled processing of the response in a worker thread. Depending on the result of some function call, we may need to do another network request, followed by more CPU work.

The overload of IAssetAccessor::get that takes a ThrottlingGroup looks like this:

Future<std::shared_ptr<IAssetRequest>> get(
      const CesiumAsync::AsyncSystem& asyncSystem,
      const IntrusivePointer<ThrottlingGroup>& pThrottlingGroup,
      const std::string& url,
      const std::vector<THeader>& headers) {
  std::shared_ptr<IAssetAccessor> pThis = this;
  return asyncSystem
      .beginThrottle(pThrottlingGroup)
      .thenImmediately([pThis, asyncSystem, url, headers]() {
        return pThis->get(asyncSystem, url, headers);
      })
      .endThrottle();
}

So the network requests happen in one throttling group, while the CPU processing happens in another. When a continuation chain reaches a beginThrottle, the task exits the current throttling group (if any), and enters the new one. When the continuation chain reaches the endThrottle, the previous throttling group is re-entered.

@kring kring added the enhancement New feature or request label Jan 9, 2024
@csciguy8
Copy link
Contributor

csciguy8 commented Feb 19, 2024

This idea seems sound, and first reaction is "Why not?".

If your code is arranged in such a way to take advantage of throttling groups, then go for it. But that "if" is really my only criticism. If this idea is built out, will it be useful?

From what I've learned in this PR, building a stage loading pipeline, the bulk of the work was refactoring. Code needs to be structured in such a way that it can be throttled. The actual "throttling" part wasn't all that sophisticated. For example, this function throttles a pending queue of content requests to the AssetAccessor:get call. Fairly small, easy to understand.

Would it be worth refactoring that to use ThrottlingGroups? I'm not sure. Would it be useful for someone else? Maybe.

@kring
Copy link
Member Author

kring commented May 9, 2024

@csciguy8 I'm way late getting back to you here, but I'm a little confused by your criticism.

From what I've learned in #779, building a stage loading pipeline, the bulk of the work was refactoring. Code needs to be structured in such a way that it can be throttled. The actual "throttling" part wasn't all that sophisticated. For example, this function throttles a pending queue of content requests to the AssetAccessor:get call. Fairly small, easy to understand.

The entire point of this proposal is that it avoids all that refactoring and allows the async system - which we're already using - to handle this all itself. So in the old implementation, the loading is done by TilesetContentManager::loadTileContent, right? Well, if we had something like the system I described above, all of the "refactoring" we'd need to do would be to replace the code in Tileset::_processWorkerThreadLoadQueue that calls it with something like this:

  for (TileLoadTask& task : queue) {
    TaskController* pController = task.pTile->getLoadController();
    if (pController) {
      // Load is already started (but might be throttled). Update its priority.
      pController->setPriorityGroup(task.group);
      pController->setPriorityRank(task.priority);
    } else {
      // Start a new load process for this tile.
      task.pTile->setLoadController(new TaskController(task.group, task.priorty));
      pController = task.pTile->getLoadController();
      asyncSystem
        .withController(pController)
        .beginThrottle(pCpuProcessing)
        .thenInWorkerThread([this, pTile = task.pTile]() {
          this->_pTilesetContentManager->loadTileContent(*pTile, this->_options);
        })
        .endThrottle();
    }
  }

That's not exactly right, because loadTileContent initially expects it is called in the main thread. So actually we'd want to do that little bit of main-thread checking that it does first and only do the actual background work inside that thenInWorkerThread. But I don't think that detracts from the main point here.

We would also need to pass the pNetworkRequests throttling group to any place we call IAssetAccessor::get. But that's it! That would allow each tile load pipeline to bounce back and forth between prioritized CPU-based throttling and network-based throttling (and potentially others, like GPU, in the future) with almost no modifications.

Compared to my (still early) understanding of #779, the main differences are:

  1. The prioritization / throttling is built into AsyncSystem / ThrottlingGroup, so there's nothing tile specific in there. Not to say priotization / throttling is super difficult or anything, but it's nice that we can write it once and use it everywhere.
  2. CesiumAsync::Future resolution, and the invocation of the attached continuation, is used as the "go" signal for throttled tasks, and Future continuations are used to chain together dependent tasks. As compared to the TileWorkChain / RasterWorkChain with their std::function that is invoked as the "go" signal.

The downside, though, is that this system is likely to be at least somewhat difficult to implement.

@csciguy8
Copy link
Contributor

I'm way late getting back to you here, but I'm a little confused by your criticism.

No worries about any confusion. My last comment was 3 months ago, but I think we're getting sync'd up better now :)

I'm glad you've looked at #779, and now that I've been working on other things for ~2 months now, I have some thoughts about it too. Overall, the end result works, and seems to produce better performance. The biggest issue I have is the general code direction and the extensive "refactoring" that I mentioned. Did my changes make sense with how we use our async system? Maybe they are even fighting it?

Back to this issue and the proposal for throttling groups... I'm more inclined to be favor of a solution like this. I like the simplicity of defining the groups and throttling behavior. I like how it's extending and working with our existing paradigms. If all the complexities of throttling can be tucked away and no extensive code refactoring is necessary, excellent!

The downside, though, is that this system is likely to be at least somewhat difficult to implement.

Can agree it would be difficult. And once the feature is implemented, how complex is it? Is it easy to verify correct functionality or maintain in the future? Hard to say without just building it first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants