Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zstd compression support between distributors and ingesters #8522

Closed
aallawala opened this issue Jun 25, 2024 · 16 comments · Fixed by #9322 · May be fixed by grafana/dskit#582
Closed

zstd compression support between distributors and ingesters #8522

aallawala opened this issue Jun 25, 2024 · 16 comments · Fixed by #9322 · May be fixed by grafana/dskit#582

Comments

@aallawala
Copy link
Contributor

Is your feature request related to a problem? Please describe.

Are there plans for zstd support between distributors and ingesters using the grpc_compression flag?

Enabling zstd compression would help significantly reduce the amount of cross-AZ traffic our org pays for running Mimir. gzip gets some wins, at the expense of CPU.

Now that there is a pure Go zstd implementation, are there any major blockers for introducing it to Mimir?

Describe the solution you'd like

Support for zstd when using the grpc_compression.
A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

@aknuds1
Copy link
Contributor

aknuds1 commented Jun 26, 2024

Are you specifically asking for the -ingester.client.grpc-compression flag to also support zstd?

@aallawala
Copy link
Contributor Author

Are you specifically asking for the -ingester.client.grpc-compression flag to also support zstd?

I'm specifically looking for compression support on the ingester-client, yes. That's our most expensive client.

Though, judging by how the config is laid out, I suspect it is a shared compressor for all the gRPC clients?

@aknuds1
Copy link
Contributor

aknuds1 commented Jun 26, 2024

Though, judging by how the config is laid out, I suspect it is a shared compressor for all the gRPC clients?

In effect yes. I just needed more clarification, as your description was relatively vague. I'm trying to see whether the team has any opinion on supporting zstd.

@aknuds1
Copy link
Contributor

aknuds1 commented Jun 26, 2024

So, I found out that the Mimir team already has considered zstd support, but the zstd implementation we considered is apparently not very performant.

Is that the same zstd implementation as you're referring to?

@aallawala
Copy link
Contributor Author

aallawala commented Jun 26, 2024

So, I found out that the Mimir team already has considered zstd support, but the zstd implementation we considered is apparently not very performant.

Is that the same zstd implementation as you're referring to?

Thanks for checking in. I'm looking at https://github.com/klauspost/compress/tree/master/zstd as a possible zstd implementation. It looks like what you linked is a wrapper on top of klauspost/compress.

What were the reasons why it was not performant? I'm surprised to hear it was not compared to the other offering, gzip. Surely it strikes in the middle of gzip and snappy but with a much higher compression ratio.

@aallawala
Copy link
Contributor Author

@aknuds1, checking in - are you able to share more on the reasons why that implementation was not performant?

@aknuds1
Copy link
Contributor

aknuds1 commented Jul 17, 2024

I just know that @bboreham considered the aforementioned zstd implementation, and found performance problems with it.

When we discussed trying another compression algorithm for cheaper AZ traffic internally, the other point made was it wasn't worth all the testing that comes along with it.

@aknuds1
Copy link
Contributor

aknuds1 commented Jul 31, 2024

I have some new context to share to shed hopefully enough light on why the Mimir team doesn't think the investment into a new compression algorithm for distributor -> ingester traffic is going to be worthwhile. I don't know if you're perhaps already familiar with this, but Mimir is moving towards a fundamentally revamped architecture, where distributors will no longer be writing directly to ingesters, but instead to a Kafka compatible back-end, which ingesters again read from. Please see [FEATURE] Experimental Kafka-based ingest storage in the changelog for reference.

@bboreham
Copy link
Contributor

bboreham commented Aug 2, 2024

I remarked on the amount of memory allocations it does.
Looks like this was improved recently, with the addition of a pool in mostynb/go-grpc-compression@629c44d.

Someone else commented mostynb/go-grpc-compression#25

@deniszh
Copy link
Contributor

deniszh commented Sep 5, 2024

@aallawala
@aknuds1
@bboreham

I decided to experiment with zstd and s2 GRPC compression from https://github.com/mostynb/go-grpc-compression.
So, I created patched dskit, with zstd and s2 compression and then built mimir 2.13.0 with that patched dskit - it's nothing fancy, just dskit pointed to patched version.

Then I ran this patched version in our test cluster, first with zstd compression, then s2. Please note, that before test test cluster already run Mimir 2.13.0 with Snappy GRPC compression enabled, so, comparison below is to snappy and NOT uncopressed:

Results:

Zstd results comparing to Snappy:

  • traffic dropped from 4MB/s per writer to 2.2MB/s per writer incoming, 3Mb/s outgoing to 1MB/s outgoing
    • i.e. 7MB to 3.2MB drop from snappy - 54% less traffic
  • cpu didn't change significantly
  • memory increased a lot - from 6.5G per writer to 15GB per writer - I think thats a show stopper
  • 99p write latency dropped from 90 to 50 ms.

Image

Image

So, results are good, but memory usage is tremendous. I checked heap, majority of memory consumed by zstd.EnsureBlock:

Image

But I also have a good news too, when I tried to switch to s2 compression, which is better version of snappy.

S2 results comparing to Snappy:

  • traffic almost similar to zstd - receive 2.4MB/s per writer, send 1.4MB/s per writer
    • i.e 7MB to 3.8MB drop from snappy - 45% less traffic
    • cpu didn't change significantly
    • memory even dropped comparing to snappy - from 6.5G per writer to 5.8G per writer
    • 99p write latency dropped from 90 to 60 ms

Image

(left is Snappy, middle - zstd, right part - S2)

So, after that I decided to drop zstd (first I wanted to test Datadog or another wrapper but then I realized that Mimir has no CGO enabled) and S2 looks really promising.

I know that Mimir architecture is revamped and will be migrated to Kafka, but it will take some time to implement that properly - and S2 patch can shave 50% of cross AZ data cost right now.

And I don't like run patched version and prefer to port my changes to upstream. I can clean up and submit PR for experimental S2 support in dskit. Should I or no sense?

@aallawala
Copy link
Contributor Author

@deniszh, thanks so much for trying it out and posting your results. The memory increase seems to match up with what @bboreham said earlier in his findings too.

Thanks for also trying out s2. The results seem much more favorable and it would be something I can also help drive in order to get this ported upstream. Do you have a PR available on the dskit side for s2?

@deniszh
Copy link
Contributor

deniszh commented Sep 5, 2024

@aallawala : grafana/dskit#582

But please take note that latest mimir is not ready to latest dskit. So, if you want to test you can build mimir from this branch https://github.com/deniszh/mimir/tree/2.13.0-grpc-s2

@deniszh
Copy link
Contributor

deniszh commented Sep 5, 2024

Tried S2 on bigger cluster - it's less impressive but I see 70MB/s instead of 105MB/s - 30% decrease.

@aknuds1
Copy link
Contributor

aknuds1 commented Sep 11, 2024

Thanks for your testing @deniszh! The results look very promising. I'm asking the Mimir team what they think.

@aknuds1
Copy link
Contributor

aknuds1 commented Sep 11, 2024

@deniszh the Mimir team is positive to using S2 for gRPC compression in Mimir. I guess we should start out by reviewing your dskit PR.

@deniszh
Copy link
Contributor

deniszh commented Sep 25, 2024

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants