Standardize compressed blob prefixes #135

s0me0ne-unkn0wn · 2025-01-06T12:00:47Z

Rendered

text/0135-compressed-blob-prefixes.md

bkchr

Generally I would follow the typical RFC format. The spec is currently outdated and not maintained by anyone.

text/0135-compressed-blob-prefixes.md

s0me0ne-unkn0wn · 2025-01-07T10:33:54Z

The spec is currently outdated and not maintained by anyone.

That's some bad news. I mean, if I want to create my implementation of Polkadot from scratch, where do I go?

bkchr · 2025-01-07T11:31:38Z

text/0135-compressed-blob-prefixes.md

+
+The current approach to compressing binary blobs is defined in [subsection 2.6.2](https://spec.polkadot.network/chap-state#sect-loading-runtime-code) of Polkadot spec. It involves using `zstd` compression, and the resulting compressed blob is prefixed with a unique 64-bit magic value specified in that subsection. Said subsection only defines the means of compressing Wasm code blobs; no other compression procedure is currently defined by the spec.
+
+However, in practice, the current de facto protocol uses the said procedure to compress not only Wasm code blobs but also proofs-of-validity. Such a usage is not stipulated by the spec. Currently, having solely a compressed blob, it's impossible to tell what's inside it without decompression, a Wasm blob, or a PoV. That doesn't cause any problems in the current de facto protocol, as Wasm blobs and PoV blobs take completely different execution paths, and it's impossible to mix them.


The argumentation "it's impossible to mix them" is not really that strong :P

Well, I'm just describing the current situation here, and that's something that should be fixed by implementing this RFC as well. Surely, we currently can mix them in the code, we just don't want to 🙂

text/0135-compressed-blob-prefixes.md

s0me0ne-unkn0wn · 2025-01-08T22:05:26Z

@bkchr, returning to that "let's follow the typical RFC format" part and explaining why I'd like to keep it that formal.

According to the Polkadot Whitepaper, "A system of RFCs, not unlike the Python Enhancement Proposals, will allow a means of publicly collaborating over protocol changes and upgrades". Thus, the RFCs are proposals to change the protocol. But what protocol? We must understand what we're changing. I understand that currently, we're more or less in limbo between the Polkadot Whitepaper and the JAM Gray paper, and that would be acceptable if we found ourselves in this situation, say, for a couple of months. But JAM is not arriving in a couple of months, and if the spec is not currently maintained, that means we're flying in the Bitcoin mode with an implementation-defined spec. What protocol is changed by any given RFC, then? That one which had been implemented by the time when the RFC was first published? Or will we count from the latest stable release?

Again, if a team wants to start its very own implementation of Polkadot tomorrow, where does it start? What should they implement? Should they just read our Rust code and do the same? They may not even know Rust, after all. Or should they take the latest outdated spec and apply all the RFCs over it? Even the infamous EU bureaucracy doesn't treat you like that, you can always get a full text of legislation that is in force right now instead of getting the very first one from 1990s and a hundred amendments to compile everything yourself.

That's why I believe the spec is super important. The spec should be a representation of the current state, and RFCs should propose changes to the spec, not the code. Yes, we're not living in a perfect world, and it may become more or less stale sometimes, but if we drop its support altogether, a lot of things lose sense, including alternative implementations.

Alternatively, if we wanted to follow the IETF's RFC model, we could have a set of "root RFCs" defining the protocol and other RFCs amending the root ones, but I believe for a protocol as complex as Polkadot it's a counterproductive approach as the set of root RFCs would just become the spec on its own. The IETF's TCP protocol had 10 RFCs on its way before it was rewritten from scratch to include all the amendments, and that happened over a time span of 40+ years. We in Polkadot, on the other hand, have already had 60+ RFCs since Fellowship was formed, roughly in a year and a half. The IETF's approach wouldn't work here.

I'll go ahead and tag @gavofyork here 'cause I believe it's a really bad situation. If the spec is not maintained, we should change it today or tomorrow at the latest.

bkchr · 2025-01-08T22:47:18Z

There are two things, the spec not being maintained and then there are like the standard RFC format. While I'm also not happy that the spec is not maintained and I would like to change this, it is complicated right now and I don't think this will change until we have JAM. Better to accept the reality instead of denying it.
For the RFC I would just follow the format that everyone is using and not trying to push for the special way for no real benefit right now.

s0me0ne-unkn0wn · 2025-01-09T10:20:54Z

Polkadot Fellowship does not accept reality, it forms it! 😁 Fellowship is a resourceful organization and could acquire both internal and external resources to get that job done. What happens now is just a lack of understanding of the importance of the task. Having an up-to-date spec is a basic need, not a luxury.

Otherwise, we don't hold our guarantees. One of them is "Polkadot is intended to be a free and open project, the protocol specification being under a Creative Commons license". And under that, we encourage people to create alternative implementations to incentivize decentralization. But if anyone wanted to do that right now, he would be fast to learn that the only way to do it is in "just-ask-Basti-how-it-works" mode.

I heard you about this very RFC, and I'll fix it to be more typical, but I'm also willing to keep this discussion going and bring it up at the next in-person Fellowship meeting as I believe it's a huge design flaw in the whole process. RFCs should propose changes to the spec, and implemented RFCs MUST be reflected in the spec. That MUST just be a requirement for the implementation to pass the review. Just imagine yourself implementing the initial Polkadot with only outdated and unsupported Wasm specs in your hands. That would be a hell of an implementation.

Do we have political parties in the Fellowship yet? I'm ready to establish one. "Witnesses of the Spec" or something.

bkchr · 2025-01-09T15:39:00Z

Polkadot Fellowship does not accept reality, it forms it! 😁 Fellowship is a resourceful organization and could acquire both internal and external resources to get that job done. What happens now is just a lack of understanding of the importance of the task. Having an up-to-date spec is a basic need, not a luxury.

You are free to take this on and make it happen :)

bkchr

Generally looks good.

Just need to be even more conservative when it comes to the time lines as this is not only affecting the relay chain or the collators, but every node.

text/0135-compressed-blob-prefixes.md

s0me0ne-unkn0wn · 2025-01-09T15:59:10Z

You are free to take this on and make it happen :)

I don't think a single person can do it because I don't believe there exists a single person who knows all the aspects of the current implementation in depth. I think I could design a pipeline that streamlines the spec updates. But first we should communicate and agree on the procedure anyway. Sounds like one more RFC by itself.

bkchr · 2025-01-09T21:35:40Z

I did not meant that you should keep the spec updated on your own. Someone needs to setup the process, find the people to update the spec, find out what is outdated etc.

eskimor · 2025-01-10T11:23:47Z

text/0135-compressed-blob-prefixes.md

+
+Currently, the only requirement for a compressed blob prefix is not to coincide with Wasm magic bytes (as stated in code comments). Changes proposed here increase prefix collision risk, given that arbitrary data may be compressed in the future. However, it must be taken into account that:
+* Collision probability per arbitrary blob is ≈5,4×10⁻²⁰ for a single random 64-bit prefix (current situation) and ≈2,17×10⁻¹⁹ for the proposed set of four 64-bit prefixes (proposed situation), which is still low enough;
+* The current de facto protocol uses the current compression implementation to compress PoVs, which are arbitrary binary data, so the collision risk already exists and is not introduced by changes proposed here.


Would it make sense to de-risk further, by deprecating blank/unprefixed payloads entirely?

Add RFC 135

7f03ea7

s0me0ne-unkn0wn changed the title ~~Add RFC 135~~ Standardize compressed blob prefixes Jan 6, 2025

eskimor reviewed Jan 6, 2025

View reviewed changes

text/0135-compressed-blob-prefixes.md Outdated Show resolved Hide resolved

bkchr reviewed Jan 6, 2025

View reviewed changes

text/0135-compressed-blob-prefixes.md Outdated Show resolved Hide resolved

Fix motivation

6207262

bkchr reviewed Jan 7, 2025

View reviewed changes

anaelleltd added the Proposed Is awaiting 3 formal reviews. label Jan 8, 2025

koute reviewed Jan 8, 2025

View reviewed changes

text/0135-compressed-blob-prefixes.md Outdated Show resolved Hide resolved

text/0135-compressed-blob-prefixes.md Outdated Show resolved Hide resolved

s0me0ne-unkn0wn added 2 commits January 8, 2025 21:44

Split single prefix into legacy and PoV

8df7820

Fix wording regarding compression methods

47b734d

Hopefully this makes it a typical RFC 🤷‍♂️

0281827

bkchr approved these changes Jan 9, 2025

View reviewed changes

text/0135-compressed-blob-prefixes.md Outdated Show resolved Hide resolved

Fix deprecation timeline

ea85b6c

eskimor reviewed Jan 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardize compressed blob prefixes #135

Standardize compressed blob prefixes #135

s0me0ne-unkn0wn commented Jan 6, 2025 •

edited

Loading

bkchr left a comment

s0me0ne-unkn0wn commented Jan 7, 2025

bkchr Jan 7, 2025

s0me0ne-unkn0wn Jan 7, 2025

s0me0ne-unkn0wn commented Jan 8, 2025

bkchr commented Jan 8, 2025

s0me0ne-unkn0wn commented Jan 9, 2025

bkchr commented Jan 9, 2025

bkchr left a comment

s0me0ne-unkn0wn commented Jan 9, 2025

bkchr commented Jan 9, 2025

eskimor Jan 10, 2025


		The current approach to compressing binary blobs is defined in [subsection 2.6.2](https://spec.polkadot.network/chap-state#sect-loading-runtime-code) of Polkadot spec. It involves using `zstd` compression, and the resulting compressed blob is prefixed with a unique 64-bit magic value specified in that subsection. Said subsection only defines the means of compressing Wasm code blobs; no other compression procedure is currently defined by the spec.

		However, in practice, the current de facto protocol uses the said procedure to compress not only Wasm code blobs but also proofs-of-validity. Such a usage is not stipulated by the spec. Currently, having solely a compressed blob, it's impossible to tell what's inside it without decompression, a Wasm blob, or a PoV. That doesn't cause any problems in the current de facto protocol, as Wasm blobs and PoV blobs take completely different execution paths, and it's impossible to mix them.

Standardize compressed blob prefixes #135

Are you sure you want to change the base?

Standardize compressed blob prefixes #135

Conversation

s0me0ne-unkn0wn commented Jan 6, 2025 • edited Loading

bkchr left a comment

Choose a reason for hiding this comment

s0me0ne-unkn0wn commented Jan 7, 2025

bkchr Jan 7, 2025

Choose a reason for hiding this comment

s0me0ne-unkn0wn Jan 7, 2025

Choose a reason for hiding this comment

s0me0ne-unkn0wn commented Jan 8, 2025

bkchr commented Jan 8, 2025

s0me0ne-unkn0wn commented Jan 9, 2025

bkchr commented Jan 9, 2025

bkchr left a comment

Choose a reason for hiding this comment

s0me0ne-unkn0wn commented Jan 9, 2025

bkchr commented Jan 9, 2025

eskimor Jan 10, 2025

Choose a reason for hiding this comment

s0me0ne-unkn0wn commented Jan 6, 2025 •

edited

Loading