-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compilation to Triton VM #200
Comments
Very nice work. Did you already see the attempt by @sshine and myself on writing a compiler from Rust to TASM (Triton VM assembly)? It' s a bit more of a "brute force" attempt than yours, as we are translating from the RUST AST, to our own AST, and then directly to TASM, we don't go through WASM as you do. This makes me think that your attempt could be a better generic solution than what we started on. Something that will be more extensible. We, for example, only plan to support a rather small subset of Rust, whereas I think your solution could support the entire Rust language, as you use a well-known intermediate language. I tried running your tests, and 2 out of 15 fail. Maybe because your test conditions are too strict? The way we test our compiler is by running its code on the actual Triton VM and inspecting its output. The wrapper functions for that test functionality was a bit cumbersome to write, but once you have it set up, it's a very convenient way of testing the generated code. Have a look here: Feel free to just copy the code or use it to write your own testers, if you're interested. You might also be interested in the TASM standard library that we wrote. That should make it fairly easy for you to add FYI I had to run this command before I could run your tests: |
Cool stuff, @greenhat!
Thanks for the suggestion. There are some vague plans to add a few registers to Triton VM, which could behave like local variables. However, due to how arithmetization of registers works in zkVMs, there would have to be a finite (and likely rather small) number of them, so I'm not sure they could be used for general compilation from WASM. Is there a limit to the number of local variables that can be declared in a WASM function? Is using the stack infeasible for local variables? |
It depends on how far you can get with it. I'd guess that complex types would be quite challenging. That was the first thing that came to my mind.
Oh, I was not prepared that someone will try to run my tests. :) It was most likely a rustc mangling symbols issue that I have not found time to fix until today (it's way too easy to re-generate with I employ a similar approach to test generated code, running it on the Triton VM and comparing the output with the one I got from running either Rust or Wasm code natively. See https://github.com/greenhat/wasm2zk/blob/15d6c02ab55db8c34bbaaf127c42cd15780e0c29/crates/codegen-tritonvm/src/codegen/sem_tests/fib.rs#L8-L20
Great! I'll use it for sure.
Thanks! I added it to the README. |
Thank you!
I don't think there is a limit besides that it's u32. Actually, when I asked for it I probably thought of Miden procedures that have local vars easily "mappable" from Wasm. My bad. I'm adding Miden to OmniZK now and got confused. I somehow forgot that in Triton VM functions are more like labels, so there is no notion of local space for functions besides jump stack which is not fit for this kind of task. If so, then please nevermind,
I suspect Wasm codegen in LLVM uses locals for almost all the SSA vars. I thought of putting them on the stack but the amount of tracking in order to access them in the function code was overwhelming so I opt out to mapping locals to memory at the end of the range so it's fewer chances to conflict with native Wasm memory ops. |
Awesome stuff, hats off ^^ To get a feel for useful programs that need to run on Triton VM, here is a snippet of rust code that we are struggling to compile to TASM. fn verify_raw(public_input: &[BFieldElement], secret_witness: &[BFieldElement]) {
let removal_records_integrity_witness =
*RemovalRecordsIntegrityWitness::decode(secret_witness).unwrap();
let items = removal_records_integrity_witness
.input_utxos
.iter()
.map(Hash::hash)
.collect_vec();
let mut digests_of_derived_index_sets = items
.iter()
.zip(removal_records_integrity_witness.membership_proofs.iter())
.map(|(utxo, msmp)| {
AbsoluteIndexSet::new(&get_swbf_indices::<Hash>(
&Hash::hash(utxo),
&msmp.sender_randomness,
&msmp.receiver_preimage,
msmp.auth_path_aocl.leaf_index,
))
.encode()
})
.map(|x| Hash::hash_varlen(&x))
.collect_vec();
digests_of_derived_index_sets.sort();
let mut digests_of_claimed_index_sets = removal_records_integrity_witness
.kernel
.inputs
.iter()
.map(|input| input.absolute_indices.encode())
.map(|e| Hash::hash_varlen(&e))
.collect_vec();
digests_of_claimed_index_sets.sort();
assert_eq!(digests_of_derived_index_sets, digests_of_claimed_index_sets);
assert!(items
.iter()
.zip(removal_records_integrity_witness.membership_proofs.iter())
.map(|(item, msmp)| {
(
commit::<Hash>(
item,
&msmp.sender_randomness,
&msmp.receiver_preimage.hash::<Hash>(),
),
&msmp.auth_path_aocl,
)
})
.all(|(cc, mp)| {
mp.verify(
&removal_records_integrity_witness
.mutator_set_accumulator
.kernel
.aocl
.get_peaks(),
&cc.canonical_commitment,
removal_records_integrity_witness
.mutator_set_accumulator
.kernel
.aocl
.count_leaves(),
)
.0
}));
assert_eq!(
removal_records_integrity_witness
.mutator_set_accumulator
.hash(),
removal_records_integrity_witness.kernel.mutator_set_hash
);
} |
Adding a bit of context to @aszepieniec's code example pub fn get_swbf_indices<H: AlgebraicHasher>(
item: &Digest,
sender_randomness: &Digest,
receiver_preimage: &Digest,
aocl_leaf_index: u64,
) -> [u128; NUM_TRIALS as usize] {
let batch_index: u128 = aocl_leaf_index as u128 / BATCH_SIZE as u128;
let batch_offset: u128 = batch_index * CHUNK_SIZE as u128;
let leaf_index_bfes = aocl_leaf_index.encode();
let leaf_index_bfes_len = leaf_index_bfes.len();
let input = [
item.encode(),
sender_randomness.encode(),
receiver_preimage.encode(),
leaf_index_bfes,
// Pad with zeros until length is a multiple of RATE; according to spec
vec![BFieldElement::zero(); DIGEST_LENGTH - leaf_index_bfes_len],
]
.concat();
assert_eq!(input.len() % DIGEST_LENGTH, 0);
let mut sponge = <H as SpongeHasher>::init();
H::absorb_repeatedly(&mut sponge, input.iter());
H::sample_indices(&mut sponge, WINDOW_SIZE, NUM_TRIALS as usize)
.into_iter()
.map(|sample_index| sample_index as u128 + batch_offset)
.collect_vec()
.try_into()
.unwrap()
}
pub struct AbsoluteIndexSet([u128; NUM_TRIALS as usize]);
pub fn commit<H: AlgebraicHasher>(
item: &Digest,
sender_randomness: &Digest,
receiver_digest: &Digest,
) -> AdditionRecord {
let canonical_commitment =
H::hash_pair(&H::hash_pair(item, sender_randomness), receiver_digest);
AdditionRecord::new(canonical_commitment)
} The I'll probably work on adding an implementation to |
Thank you, @aszepieniec and @Sword-Smith! I was unsure of what to tackle next after the Fibonacci sequence. It seems quite challenging, so I plan to break it into pieces and work my way up. |
Just so you know: we just pushed (tasm) code snippets implementing higher-order functions map and zip to tasm-lib, in case you're interested. We also have memcpy. And hash_varlen. |
@greenhat: Out of curiosity: how are you (planning to) represent lists? Here's how we do it in tasm-lib:
The reason why you might prefer safe lists to unsafe lists is because you might want to use memory located after the list. But if the list grows without a capacity to limit it, you risk overwriting that memory. So there is a lot more bounds checking going on in safe lists than in unsafe lists but in exchange you get better assurances of non-interfering data structures. In addition to safe and unsafe lists, we also use a static and a dynamic allocator. They simulate a heap, so yo can ask them for addresses in RAM that correspond to x-many contiguous words. When you build a new list, you allocate the requisite memory using one of these allocators. The reason why we are curious is because using the same list format would mean that algorithms and data structures building on top of that would be compatible across compilers. |
@aszepieniec Thanks for the detailed explanation! Good question! Wasm does not have lists - https://webassembly.github.io/spec/core/syntax/instructions.html# |
Hi! I've been working on a compiler framework for zero-knowledge VMs as a side project. It's called OmniZK, and the idea behind it is to help build compilers from high-level languages to various ZK VMs. Its design resembles the MLIR (LLVM) architecture, where IR transformations are implemented generically and reused with different custom IR dialects.
I've implemented a proof-of-concept Wasm frontend (parser, IR dialect, etc.) and Triton VM backend (IR dialect, codegen, etc.) with Wasm -> Triton VM compilation.
Check out how the following Fibonacci example in Rust:
is compiled to the following fully executable Triton VM code. Keep in mind that it's in the proof-of-concept stage, there are no optimizations yet so the code is quite verbose.
The most challenging part was to convert Wasm blocks/loops and branching into functions, jumps, and recursion. I used
skiz
,recurse
, and global var for tracking branching targets.Wasm global and local variables are converted into direct memory access. If you could add local variable support to Triton VM, converted to direct memory access in a predefined memory region under the hood, that would significantly reduce the number of emitted Triton VM instructions since local vars are widely used in Wasm codegen for Rust.
I/O is implemented via extern functions in Rust (Wasm imports) for now, with the idea of using the Wasm component model and WASI in the future.
I plan to implement the rest of Wasm and Triton VM instructions and compile more complex Rust programs.
I'm looking for feedback, and I'd be happy to answer any questions.
The text was updated successfully, but these errors were encountered: