[Proposal] Pinot Multistage Engine Express Mode #14640

ankitsultana · 2024-12-11T21:48:09Z

Note: We will be sharing a design doc soon. We are working on testing this out in one of our clusters via a prototype to get a sense of the scalability characteristics of this approach.

Overview

Earlier today, we released an Engineering Blog on our use of Neutrino at Uber, and how that has helped us serve complex queries that can't be served by the V1 Engine at 100 and even 1000+ QPS.

We want to bring the same approach to Pinot's Multistage Engine, via a new mode which I am calling the "Express Mode" right now.

The idea is that instead of relying on shuffles, you try to run the maximal sub-plan that you can independently in the servers, and run the remaining plan in the broker.

Example: for a query plan such as follows, which can be common for window function queries that leverage an aggregation after the window function, with the express mode, Pinot will run as much of the plan as it can in the servers without any shuffles with the remaining plan being run in the broker.

So in the simple case, we would run the Leaf stage in the servers, and the rest of the plan in the broker. If we are able to support auto-colocation and the data is partitioned by the partition-key of the window function, then we may be able to run Agg > Filter > Window > Sort Exch. > Leaf independently in the servers and run just the final aggregation in the broker.

Benefits

The current Multistage Engine enables Pinot to process a large amount of data in really complex queries. The goal of the Express Mode is to support relatively simpler queries, that process a relatively smaller amount of data, at lower latencies and higher QPS.

Challenges

There are several challenges in supporting something like this, and we outline some of them below (will be discussed in detail in the design doc):

We want to rely as little as possible on Query Hints. Perhaps a single SET statement option should be all that's required to enable express mode, with a broker/server level config to set the default mode.
Since our goal is to avoid processing excessively large data, we need to find a way to limit the amount of data processed and avoid expensive queries. There are several different approaches for this: limiting the data returned by the Leaf stage, limiting the data returned by the servers, etc. A design doc is a better medium to discuss all of these in detail, but our guiding principle would be to avoid new configs and making the semantics as intuitive as possible for users to understand. As our blog calls out, this is one of the major limitations of our Neutrino based approach.
From a product perspective and even from a technical perspective, this mode should sit cohesively with the rest of the features in Pinot. The last thing we would like is to complicate the design and the offerings further.

The text was updated successfully, but these errors were encountered:

gortiz · 2024-12-13T08:16:36Z

AFAIK this includes two independent changes:

Push larger parts of the query into the leaf stage.
Execute the final reduce phase in the broker.

While I agree 1 would be better, I don't get why 2 is important. I think the fact that in MSQ brokers do not need to execute the final reduce phase is a feature. It means we can have smaller brokers. Although it may have an impact in the performance, it shouldn't be that large. Before #13303 MSQ spent a lot of time and resources during serialization/deserialization, which means that the cost of adding a new stage (like reducing in servers) was not trivial. But after that PR this cost should not be noticeable.

ankitsultana · 2024-12-20T05:35:09Z

That's a fair point. My gut feel is that sending data from other servers to a single server can potentially open up complicated scheduling problems. But it's best to test it out on a multi-zone cluster and compare the results. We'll try out both the approaches and attach the results in the doc.

gortiz · 2024-12-20T15:45:50Z

Another comment here. Can we find a different name? Express is not very descriptive of what it does and sounds so good that everyone would always want to run in this mode 😆. Maybe simple mode? Two-stage mode? broker-reduce mode? something like that would be more descriptive.

ankitsultana changed the title ~~[Tracker] Pinot Multistage Engine Express Mode~~ [tracker] Pinot Multistage Engine Express Mode Dec 11, 2024

ankitsultana changed the title ~~[tracker] Pinot Multistage Engine Express Mode~~ [Proposal] Pinot Multistage Engine Express Mode Dec 11, 2024

yashmayya added feature multi-stage Related to the multi-stage query engine labels Dec 12, 2024

Jackie-Jiang added PEP-Request Pinot Enhancement Proposal request to be reviewed. and removed feature labels Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Pinot Multistage Engine Express Mode #14640

[Proposal] Pinot Multistage Engine Express Mode #14640

ankitsultana commented Dec 11, 2024

gortiz commented Dec 13, 2024

ankitsultana commented Dec 20, 2024

gortiz commented Dec 20, 2024

[Proposal] Pinot Multistage Engine Express Mode #14640

[Proposal] Pinot Multistage Engine Express Mode #14640

Comments

ankitsultana commented Dec 11, 2024

Overview

Benefits

Challenges

gortiz commented Dec 13, 2024

ankitsultana commented Dec 20, 2024

gortiz commented Dec 20, 2024