[Proposal] Pinot Multistage Engine Express Mode #14640
Labels
multi-stage
Related to the multi-stage query engine
PEP-Request
Pinot Enhancement Proposal request to be reviewed.
Note: We will be sharing a design doc soon. We are working on testing this out in one of our clusters via a prototype to get a sense of the scalability characteristics of this approach.
Overview
Earlier today, we released an Engineering Blog on our use of Neutrino at Uber, and how that has helped us serve complex queries that can't be served by the V1 Engine at 100 and even 1000+ QPS.
We want to bring the same approach to Pinot's Multistage Engine, via a new mode which I am calling the "Express Mode" right now.
The idea is that instead of relying on shuffles, you try to run the maximal sub-plan that you can independently in the servers, and run the remaining plan in the broker.
Example: for a query plan such as follows, which can be common for window function queries that leverage an aggregation after the window function, with the express mode, Pinot will run as much of the plan as it can in the servers without any shuffles with the remaining plan being run in the broker.
So in the simple case, we would run the Leaf stage in the servers, and the rest of the plan in the broker. If we are able to support auto-colocation and the data is partitioned by the partition-key of the window function, then we may be able to run
Agg > Filter > Window > Sort Exch. > Leaf
independently in the servers and run just the final aggregation in the broker.Benefits
The current Multistage Engine enables Pinot to process a large amount of data in really complex queries. The goal of the Express Mode is to support relatively simpler queries, that process a relatively smaller amount of data, at lower latencies and higher QPS.
Challenges
There are several challenges in supporting something like this, and we outline some of them below (will be discussed in detail in the design doc):
The text was updated successfully, but these errors were encountered: