-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom hash for intoto payload #1018
Comments
My read of the DigestSet spec is that it contains a lot of things that don't belong in modern cryptography, like GOST, SHA1, and MD5. I understand their rationale for doing this in the context of a generic file-matching tool, but exposing anything (roughly) weaker than SHA-2/256 in the context of digital signatures is IMO risky. So I don't want to remove the list outright, but I would be fine with adding to it, so long as the amendments are strong general-purpose cryptographic digests 🙂. Do you have specific additions that you want to make? |
I definitely understand that. In fact they also allow non-cryptographic hashes, so long as the "digest" is "immutable".
Our use case is as follows: for model signing, we speed up hash calculation by splitting large files into multiple "shards" of X bytes each. Then we compute the hash on each shard in parallel; and aggregate them via What would be a good way forward to support this "family" of hashes? I could see a "subtle" API like in Tink, ie an API for people who know what they are doing (this way you need not be involved if someone wants to use a custom hash, eg blake, a serialization constuction using blake, etc). Or we simply add the |
I think this unfortunately ends up being somewhat complex 😅 -- sigstore-python is trying to closely align with other Sigstore clients in terms of the signature schemes/algorithmic suites we support, so I think we'd need But to take a step back: one of the "advantages" to DSSE is that the payload is flexible. So rather than encoding a new hash at the in-toto layer, maybe you could communicate this at the payload layer? In other words: take a final image by doing {
"_type": "https://in-toto.io/Statement/v1",
"subject": [
{
"name": "blahblahblahmodel",
"digest": {"sha256": "dgst"},
"x-sha256p-params": {
"version": 1,
"shard-size": 4096,
},
},
],
"predicateType": "blahblahblah",
"predicate": { ... }
} From there, your verification procedure would be:
I believe this would accomodate your use case, without requiring a new algorithmic selection at the Sigstore layer. The only "gotcha" would be that this won't work with the current (Longer term however, we could possibly make it compatible with the sigstore-python CLI by allowing |
Yes this solution would also work. I don't have a strong preference of what it ends up looking like, so long as the parameters and hash function are available. What you describe works, but feels like a hack to disguise the hash as sha256 while actually being something else. Also, it does not make it any safer, ie if the construction using sha256 is wrong, you end up with something weaker than sha256. I'll let @haydentherapper comment on the Sigstore side and the protobuf inclusion. I don't think this is needed, but I'll let the maintainers decide. |
Yep, it's certainly a hack. I also don't know if it's a good idea 🙂 -- it's just what came to mind as a solution that doesn't require Sigstore to be aware of the underlying construction. (Unless I'm mistaken, the only way this can be weaker than SHA256 itself is if SHA256p's construction has a weakness, or an attacker could select weak parameters. But this would be true even if it wasn't wrapped in SHA256 at the end, I think?) |
Agreed with all this.
Sorry, I just meant that in general, one way it could be weaker is if the construction is wrong. I think sha256p works as a construction, but I'm sure someone could come up with one that is not. |
Ah, gotcha! Yeah, that's a great point -- this is sort of opening the gates/setting the precedent for people building hidden constructions of unknown quality. That's not great... As another random thought: there are strong, modern hash constructions that are embarrassingly parallel, like BLAKE3. Could you use one of these? I think it'd be much easier to make a strong argument for BLAKE3's inclusion as a supported digest, especially since it has a wealth of public implementations. |
BLAKE3: great idea. We've had this tracking issue for a while sigstore/model-transparency#13, did not realize BLAKE3 does sort of what we're doing, also similar to dm-crypt. (tree-based hashing) |
One thing to be aware of is that blake2 is not a NIST hash, and blake3 seems based on a similar idea (reduced number of rounds in the compression function). It also fixes the shard size to 1KB, which we would likely want to be parameterizable in practice. If those were possible I think it'd do exactly what we want |
Yeah, my recollection is that Blake2 was part of the SHA-3 competition but wasn't selected, while Blake3 postdates the competition by a few years. But both are widely adopted and considered strong constructions.
Just for my understanding: do you need to parametrize the shard size for performance reasons, or something else? (I don't know your performance constraints, but I'm wondering if the perfect might be the enemy of the good here 🙂 -- even Blake2b is really fast and much easier to add here + get integrated into the specs versus a new custom parameter set.) |
yes. We suspect some teams will want to optimize it for their own machines. We need to benchmark blake3. Their evaluation seems to be on a small 16KB stream and does not read from disk so they are only CPU bounded in their tests. Maybe blake3 is good enough for a first release. But we may still need a way to allow for custumizing parameters in the future, so having a clear path for it would be useful. Thanks again for the tip / suggestions |
We should specify the hash in the protobuf-specs repo, because this is effectively the client spec-as-code. Ideally, each client would support every set of hashes specified in the protobuf-specs repo, or at least we would have a way to determine what features each client support (sigstore/sigstore-conformance#144). Specifying this in the repo is also our way to drive consensus across clients, again because I'd like to see eventual consistency across clients for the algorithms (signature and hash) they support. Personally, I would prefer to see more hashes supported rather than the digest wrapping. Also this wasn't asked, but note that Rekor should be fine with any intoto payload. As an aside, there is an argument for hashedrekord being the only structure across Sigstore clients, and each client should provide some sort of plugin system that allows users to specify how to canonicalize their input and how to verify the artifact, with clients providing defaults for binaries and DSSEs (or jars, helm charts, etc). But for now, hashed artifacts + DSSEs is a good state to be in. |
Another consideration is that the model signing library won't only support sigstore. I think we want to be able to support other digests that companies might want to use internally (even if we don't know about them), while allowing them to re-use all the rest of the model-signing library. That probably just means that Sigstore will only support a pre-defined set of hashes, but that other signing (PKISigner, etc) should be able to use their own (advanced user mode)
That's great! |
Closing in favor of #982 🙂 |
The current implement hardcodes a list of hashes https://github.com/sigstore/sigstore-python/blob/main/sigstore/dsse.py#L35.
Intoto allows any hashes, so I think we need to support use cases where users use other hashes.
Would removing the list of hardcoded values be enough?
I'm happy to work on this.
/cc @mihaimaruseac
The text was updated successfully, but these errors were encountered: