Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add host functions to support BLS12-381 and the required field arithmetics #779

Closed
tomerweller opened this issue Apr 27, 2023 · 28 comments
Closed
Labels
hostfunction Work on specific host functions

Comments

@tomerweller
Copy link

Pairing friendly elliptic curves are required for advanced zero knowledge proof systems including privacy preserving constructions and zkrollups. The Soroban ecosystem has already expressed interest in these type of systems but can not experiment with them due to the lack of host side support for pairing friendly curves.

The most widely used pairing friendly curve is BN254 (ake BN256 aka alt-bn128) due to the fact that ethereum has support for. However, BLS12-381 has been proven to be more secure and various projects in the zk space already use it or have signaled support for it and announced that they will migrate to it once available in Ethereum (See EIP-2537). There are other curves available like BLS12-377 but these appear to be less widely used.

I propose that we add the relevant host functions to support BLS12-381 and the underlying fields arithmetics.

Questions

  1. does anyone disagree?
  2. what is the exact set of required host functions?
  3. what implementation should we use?

cc @kwantam

@kwantam
Copy link

kwantam commented May 23, 2023

A couple other things to note: BLS12-381 is the curve that's used by the Eth beacon chain, and there's at least one very high quality implementation (blst, which has been thoroughly vetted / formally verified by Galois (not 100% sure the verification conditions, but I joined some of the early calls and it seems like they were proving low-level safety properties and conformation with the spec).

I'd definintely implement all of the EIP-2537 functionality. I would also consider implementing mappings from arbitrary strings to Fp and Fp2 ("hash to field" functions), which together with the BLS12_MAP_FP_TO_G1 and BLS12_MAP_FP2_TO_G2 functions from EIP-2537 give the full hash-to-curve functionality. If you go this way, the hash-to-field mappings should be the ones specified in the IETF hash-to-curve draft (which will soon be an RFC). Happy to give more details.

The only downside of using BLS12-381 instead of BN254 is that existing proof statements will need to be rewritten. Sometimes that's easy, but in general there's real porting effort involved. So, if you think it's important for existing proof systems already running on Eth to port directly over, it would make sense to also support BN254. But it's not obvious to me that's important...

@tomerweller
Copy link
Author

I'm comfortable with not having BN254 for now given that:

  1. BLS12-381 has been proven to be more secure
  2. Major ZK projects on ethereum have signaled support for BLS12-381 and indicated they will migrate once that's available
  3. I don't expect in the short term to see EVM constructions migrate as is to Soroban
  4. BN254 should be easy to add down the road

@tomerweller
Copy link
Author

List of host functions to be implemented

Collected from EIP-2537

  • BLS12_G1ADD
  • BLS12_G1MUL
  • BLS12_G1MULTIEXP
  • BLS12_G2ADD
  • BLS12_G2MUL
  • BLS12_G2MULTIEXP
  • BLS12_PAIRING
  • BLS12_MAP_FP_TO_G1
  • BLS12_MAP_FP2_TO_G2

Hash-to-curve functionality (see @kwantam's notes above)

  • BLS12_HASH_TO_G1
  • BLS12_HASH_TO_G2

All of the above are present in blst. See blst.h.

@kwantam
Copy link

kwantam commented May 26, 2023

I think what you've listed above is the right set of primitives.

One potential nit: BLS12_MAP_FP_TO_G1 and BLS12_HASH_TO_G1 aren't perfectly orthogonal: HASH_TO_Gx is a combination of HASH_TO_Fx and MAP_Fx_TO_Gx. So in principle it would be possible to just have the latter two and skip HASH_TO_Gx.

But realistically almost everyone will just want HASH_TO_Gx, so it makes sense to have that. And MAP_Fx_TO_Gx is really annoying to implement, so it's friendly to give that to users. Meanwhile, HASH_TO_Fx is reasonably easy to implement given hash function and bigint primitives. So even though it's not perfectly orthogonal, it seems like what you've listed above are the right choices.

(Happy to revisit this once the hash and field-op primitives are solidified to make sure that HASH_TO_Fx ends up as easy to implement as I'm hoping it will.)

@graydon
Copy link
Contributor

graydon commented May 26, 2023

@tomerweller this is .. a not completely trivial amount of work -- we can do it but it'll take a little bit of time to do well -- how would you prioritize this relative to other things on the near-term feature list for host functions, eg. the SHA3 function or the secp256k1 function or such?

@tomerweller
Copy link
Author

Sha3/keccak256 and secp256k1 are higher priority (also smaller in size because you don't have all the field arithmetics, I think).

@PlamenHristov
Copy link
Contributor

PlamenHristov commented Dec 11, 2023

@graydon / @kwantam
Is anyone working on this. Would you guys considering merging the implementation, if I open a PR?

@tomerweller
Copy link
Author

@PlamenHristov we'd love to see external contributions. Keep in mind that a review might take some time. New host functions are a protocol change and will be added to the next major version

@PlamenHristov
Copy link
Contributor

@tomerweller , got it. Will try and get something done in the coming couple of weeks.

@namankumar
Copy link

namankumar commented Dec 22, 2023

@PlamenHristov how's it going? Looking to understand how can we support you.

@PlamenHristov
Copy link
Contributor

PlamenHristov commented Dec 22, 2023

Hey @namankumar ,

Thank you for reaching out!

Nothing blocking in particular. Just need to get some free time, which hopefully I'll be able to get during the holidays. Will let you know how it goes.

Happy holidays!

@PlamenHristov
Copy link
Contributor

@namankumar / @tomerweller / @kwantam

Here is a draft PR that implements:

  • BLS12_G1ADD
  • BLS12_G1MUL
  • BLS12_G1MULTIEXP
  • BLS12_G2ADD
  • BLS12_G2MUL
  • BLS12_G2MULTIEXP

The idea behind this preliminary PR is to let me know if the approach is reasonable and there isn't a huge discrepancy between what it is and what it should be?

Once you've let me know the approach is reasonable, I'm going to implement the rest.

@namankumar
Copy link

@PlamenHristov thanks for the PR, this is fantastic. See dev comments in the PR and let us know how we can support you.

@PlamenHristov
Copy link
Contributor

@namankumar , yeah saw the response from the team. I think I have a good idea of what needs to be done to have it review ready. Will let you know once that's done.

@namankumar
Copy link

@PlamenHristov there is a soroban protocol call tomorrow, if you can make it: https://discord.com/events/897514728459468821/1194688312569495552

Btw, tried reaching you via discord and socials as well. I'm @namanthekumar and also at [email protected].

@PlamenHristov
Copy link
Contributor

Hey @namankumar ,

Yeah I'll make the call. Apologies - I'm not super active on Discord.

@namankumar
Copy link

namankumar commented Jan 11, 2024

No worries @PlamenHristov , chat soon! feel free to message me if you'd like so as to not pollute this thread.

@namankumar
Copy link

I've started a CAP for adding BLS12-381 host functions.

@PlamenHristov @kwantam I've made best effort to capture the interface and limited the math to what is absolutely required for the user to know. I've put the interface in the main CAP and the math + implementation details in supporting docs. Looking forward to collaborating with you!

stellar/stellar-protocol#1427

@namankumar
Copy link

namankumar commented Jan 15, 2024

Two (potentially three) ecosystem projects want to use these primitives right away (for confidentiality, MPC, rollups). I'll map out their requirements.

@PlamenHristov what sort of encoding/serialization does the implementation expect? I made some assumptions in the CAP, would be great if you could review for correctness. You can add comments or submit a PR, whatever convenient.

@kwantam would love your comments on the size of the data (modulus, field elements, points on the curve). There's some difference between our discussion, EIP, and the implementation. The CAP currently follows the implementation.

@PlamenHristov
Copy link
Contributor

PlamenHristov commented Jan 16, 2024

@namankumar so hopefully I've interpreted your question correctly, but any type of big endian byte encoding would work (or any one that is convertiable to such - hex, base58, base64).
If you ask whether the points should be passed around compressed or uncompressed for ergonomics, maybe it makes sense to keep them uncompressed, although obviously that comes with extra/storage transfer costs.

I should have the comments addressed on the PR and the rest of the implementation finished by the end of this week.

Let me now if I'd missed addressing something.

@rsinha
Copy link

rsinha commented Jan 17, 2024

Great to see this activity -- support for pairings really opens up a lot of applications!

BLS12-381 is great. I was wondering if you have plans for additional curves. For instance, members of the BW6 family of curves form 2-chains with some members in the BLS family (e.g. BLS12-377 with BW6-761, or BLS12-381 with BW6-767). Having BW6 would enable recursive SNARK proofs, and other applications explored in ZEXE (as developed by Aleo).

@rsinha
Copy link

rsinha commented Jun 13, 2024

There's a few projects underway that are looking to verify Groth16 proofs in a Soroban contract, that need host functions for BLS12-381 (or alt-bn128 curve). Without host functions, the contract exhausts resource budgets before completing a single pairing function.

@jayz22
Copy link
Contributor

jayz22 commented Jun 26, 2024

@PlamenHristov thanks for all the work making a working prototype! there are still many technical implementation details to address but this is a great start!

The fact blst is mostly assembly and C code via Rust FFI bindings is a bit uncomfortable. In general we would much prefer pure Rust libraries for safety and maintainability among other reasons.

I'm wondering if anyone has experience with, or have recommendations for any existing Rust libraries for BLS12-381. The two popular ones are ark-bls12-381 and zkcrypto/bls12_381, but apparently neither have been audited for production.
The other consideration is if we needed to expand to other ZK/cryptography primitives beyond bls12-381, an existing ecosystem like arkworks maybe a better choice for future proof (and maintainability).
@kwantam @PlamenHristov @rsinha

@rsinha
Copy link

rsinha commented Jun 27, 2024

The advantage with arkworks, as you point out, is the relative ease of supporting a family of curves. For instance, we can have jubjub curves for Pedersen hasing, MNT curves for two-cycle ZK proof systems, etc. Supporting these additional curves opens up several choices for ZK proof systems.

@jayz22
Copy link
Contributor

jayz22 commented Jul 11, 2024

I've started a discussion stellar/stellar-protocol#1500 on some of the higher level considerations such as curve choice

@jayz22
Copy link
Contributor

jayz22 commented Sep 5, 2024

@kwantam thanks for your suggestion of adding the hash-to-curve functions.

As we are close to the home stretch of the implementation effort, I have a few questions regarding the domain separation tag (DST) requirements. After reading the RFC a few times, there are still a few details unclear to me (which could very possibly due to my lack of cryptography background, and I apologize ahead of time).

From my understanding, the general recommended structure of the DST is as follows (detailed in section 3.1)

<appID>-V<xx>-CS<yy>-<suiteID>-<enc>

in which suiteID is already decided and hardwired by the implementation, which in our case, is BLS12381G1_XMD:SHA-256_SSWU_RO_ for bls12-381 G1 curve using expand_msg_xmd with sha256 hash function and hash-to-field as encoding. This part is clear I think.

So the smart contract application needs to be able to set the first part <appID>-V<xx>-CS<yy>-, in order to satisfy random oracle requirement. Which means our host function implementation for hash-to-curve must expose the DST as a parameter, i.e. (for G1):

fn bls12_381_hash_to_g1(msg: Bytes, dst: Bytes) -> G1

My main questions are

  1. should we expose the DST as a host function parameter? (I believe so but would like to confirm)
  2. what level of restrains should we enforce at the protocol level on the DST provided by the contract application?

From the RFC, it appears the only hard requirements are

  • length(dst_bytes) > 0
  • length(dst_bytes) < 256

but would you recommend implementing more constrains such as:

  • dst only contains ASCII characters?
  • dst should end with suiteID that matches exactly with BLS12381G1_XMD:SHA-256_SSWU_RO_ (the ciphersuite implementation of the host)?
  • is the encoding type <enc> still relevant, given the suiteID have contains _RO_ as the encoding var?

How can we find the balance point between steering applications to follow the recommended standard while not making the interface too restrictive. Would really appreciate your inputs!

@anupsdf
Copy link
Contributor

anupsdf commented Sep 23, 2024

PR is merged.

@anupsdf anupsdf closed this as completed Sep 23, 2024
@silence48
Copy link

silence48 commented Sep 26, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hostfunction Work on specific host functions
Projects
None yet
Development

No branches or pull requests

10 participants