You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Very preliminary outline. Subject to serious change as things develop.
General overview:
The idea is to define an intermediate language (tentatively called "Miden IR") as an easy compilation target for any high-level language we want to support. Miden IR subsequently gets compiled down to Miden Assembly:
Source lang 1 ->
Source lang 2 -> Miden IR -> ... -> MASM
Source lang 3 ->
...
Features of Miden IR:
Structured: While loops with break and continue, if-then-else, no jumps
Functions, anonymous lambdas (Lambdas are convenient for CPS transformation, and can be eliminated during defunctionalization).
Recursion.
Variable-based, no explicit stack. Assignments.
Extensive arithmetic operations (needed to capture source language semantics - checked/unchecked, integer sizes, implicit casts, etc.). Need u8, u16, u32, u64 (all of which exist both in Sway and Move), u128, u256 (both exist in Move), maybe others.
Code blocks (for scoping).
Some notion of "program to be executed" (script? begin function? main function?)
Arrays, vectors (Move does not have arrays, so this will be needed), structs, tuples.
References, dereferencing.
Enums, probably in the form of inductive algebraic datatypes. (Convenient during defunctionalization and debugging - it's possible to use recursive structs for defunctionalization, but they would be type unsafe (in a C-like type system you would need a void*) and thus difficult to understand and debug)
Match statements/expressions. (Convenient during defunctionalization)
Constants.
Hashes, cryptographic functions, calls to stdlib.
MASM-like libraries
Future:
Some yet-to-be-decided notion of native assets.
Some yet-to-be-decided notion of contracts+methods. This also includes blockchain network addresses, and Sway's notion of ContractID.
Some yet-to-be-decided notion of storage/state
Miden assembly blocks. Not obvious how to design this without it being variable-based.
Compiler pipeline, Miden IR -> MASM:
Eliminate recursion and anonymous lambdas: CPS-transform, defunctionalize, tail-call optimize.
Eliminate break and continue.
Eliminate non-MASM integer sizes and arithmetic. Introduce explicit checks and modulo operations as needed.
Eliminate match statements. MidenVM's only branch mechanism is if-then-else, so matches must be translated to sequences of if-then-elses. Enums should be eliminated in the same pass, since matches on non-enum values may be open-ended and require default behavior, e.g., errors. Note that enums introduced by defunctionalization of continuations can be safely stored on the stack.
Eliminate structured values: Map structs, arrays, vectors, tuples to memory/stack, introduce pointers to same. Eliminate references and dereferencing.
Convert to SSA: This may not be needed, but probably helps when eliminating variables and for some optimizations.
Eliminate variables: Map variables to stack/memory. Inline code blocks. (Assembly blocks must be inlined here). Liveness analysis will help here.
Optimizations:
Booleans: Bitwise ops potentially expensive. Optimize as possible.
Arithmetic: Analyze arithmetic ops to convert to field or u32 ops if possible. Can be approximated using abstract interpretation with intervals. May be easier to perform if code is converted to SSA.
Constant folding.
Liveness analysis. Maybe easier to do if we make all variable names unique or convert to SSA.
Variables and stack: Approximate optimal memory usage.
Optimize conditionals: Eliminate overhead induced by if-then-else. We will see sequences of if-then-else from match elimination, so this will be a focus point. When possible and optimal, execute both branches then conditionally drop unneeded result.
Sway -> Miden IR
Sway IR is unstructured, so start with reimplementation of sway-core/src/ir-generation for Miden target
Scripts (and predicates) provide convenient notions of "the program to be executed".
Ensure arithmetic operations map to the correct MidenIR operations (I think Sway uses wrapping arithmetic, but the documentation doesn't specify).
Mapping of object-like types (structs+methods) to Miden IR types, probably using MidenIR libraries. Deal with traits. Deal with generics.
Deal with downcast operation. (Might not be necessary until contracts are supported)
Mapping of Sway stdlib to Miden IR stdlib, especially for cryptographic functions.
Mapping of Sway libraries to MidenIR libraries. Deal with function visibility if necessary.
Future:
Storage declaration and usage.
Mapping of contracts to Miden IR contracts. Deal with ABI declarations. Deal with addresses and ContractID.
Compiler intrinsics. Not obvious we need this, or how to do it. Some of the functions look very FuelVM specific, others very generic.
Sway assembly blocks. Not obvious we need this either, or how to do it. Sway is due to get conditional compilation based on target architecture, in which case we can safely choose not to support this.
Add Miden assembly blocks to Sway language. Not obvious how to do it, and not needed until we know how to deal with this from MidenIR.
Move -> Miden IR
Move IR and Move Assembly are unstructured, so start from language/move-compiler/src/hlir and reimplement language/move-compiler/src/cfgir/translate.rs to target Miden. Note that Jump and JumpIf are listed as AST node types, but they are not allowed to occur in HLIR (causes the Move compiler to panic when translating to CFGIR).
Scripts provide a convenient notion of "the program to be executed"
Ensure arithmetic operations map to the correct MidenIR operation. Move arithmetic is checked, and over- and underflows cause the program to abort. Integer downcasts are also checked, and cause aborts if truncation is needed. No implicit casts are performed.
Translate options to a suitably generated enum. Standard library functions should be mapped to suitable Miden behavior.
NOTE!!!: It seems that structs are not allowed in scripts, so structs seems to depend on modules, which in turn depend on addresses. This means that without modules vectors are the only structured value. This makes the language somewhat limited. We may need to allow for "dummy" addresses and modules so that some module features can be supported.
Map the standard library vector functions to suitable Miden IR operations. Ensure that errors are translated correctly.
Map the standard library error functions to suitable Miden IR behavior.
NOTE!!!: There appear to be no cryptographic primitives in Move, and no hashing functions in the standard library. This is consistent with what Kostas from the Move team told us on 7 Feb that Move relies on a number of libraries, some of which are written in Rust. We will need to translate the use of those libraries, but how to deal with the libraries themselves is unclear to me.
Future:
"The purpose of Move programs is to read from and write to tree-shaped persistent global storage. " We will run into this issue very quickly, so we'll need to figure out how Miden interacts with global storage.
Mapping of modules and storage to Miden IR contract and storage. Deal with addresses and named addresses.
The Signer type: I have no idea to what extend this is needed, but presumably it is only necessary to support global storage.
fixed_point32. Not sure we really want this.
Solidity -> Miden IR:
TODO
WASM -> Miden IR:
It might not be necessary to translate via Miden IR, though it's probably easy enough to embed WASM in Miden IR.
TODO
Development plan:
Two overall options:
Focus on supporting a large subset of one language before starting to support another one. In this case the plan should follow a path defined by source language features.
Focus on supporting increasing subsets of several languages simultaneously. In this case the plan should follow a path defined by Miden IR features. (This approach seems to be the consensus right now).
Tasks:
Milestones 1+2 (Test harness, pure code (no side-effects), simple datatypes, simple control flow)):
(Milestones 1 and 2 have been combined, since a test harness needs something to test)
Test harness. Incorporate source language execution to establish expected outcome from Miden execution.
MidenVM-supported integers (u32 and u64). Arithmetic operations.
Booleans. Supported bitwise operations.
Control flow operations. If-then-else, loops without break and continue.
Variables. Introduce mapping of variables to stack/memory.
Non-recursive (static) functions.
Milestone 3 (Common high-level features):
Assignments, code blocks. Possibly introduce SSA.
Small non-Miden integers (u8, u16). Arithmetic operations. Introduce phase to translate non-Miden arithmetic.
break, continue. Introduce phase to eliminate break and continue.
Enums, match statements/expressions. Introduce phase to eliminate enums and match statements/expressions.
Constants.
Milestone 4 (Compound data structures, code structure):
Structs, tuples, vectors, arrays. Introduce phase to map data structures to Miden VM memory/stack.
Referencing, dereferencing.
Recursion, anonymous lambdas. Introduce phase to eliminate recursion.
Milestone 5 (Libraries, hashing):
(I don't understand all the implications here, but we almost certainly won't be able to support hashing without support for source language libraries)
User-defined libraries.
Hashing functions, crypto primitives.
Standard library support. This one is pretty open-ended, since standard libraries differ wildly between source languages.
Additional integer sizes. Integer sizes above u64 require due care and attention, but are probably needed at this stage, if not before.
Future milestones:
Some yet-to-be-decided notion of native assets.
Some yet-to-be-decided notion of contracts+methods. This also includes blockchain network addresses, and Sway's notion of ContractID.
Some yet-to-be-decided notion of storage/state
Some yet-to-be-decided notion of gas.
Miden assembly blocks. Not obvious how to design this without it being variable-based.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Very preliminary outline. Subject to serious change as things develop.
General overview:
The idea is to define an intermediate language (tentatively called "Miden IR") as an easy compilation target for any high-level language we want to support. Miden IR subsequently gets compiled down to Miden Assembly:
Source lang 1 ->
Source lang 2 -> Miden IR -> ... -> MASM
Source lang 3 ->
...
Features of Miden IR:
Future:
Compiler pipeline, Miden IR -> MASM:
Optimizations:
Sway -> Miden IR
Future:
Move -> Miden IR
Future:
Solidity -> Miden IR:
TODO
WASM -> Miden IR:
TODO
Development plan:
Two overall options:
Tasks:
Milestones 1+2 (Test harness, pure code (no side-effects), simple datatypes, simple control flow)):
(Milestones 1 and 2 have been combined, since a test harness needs something to test)
Milestone 3 (Common high-level features):
Milestone 4 (Compound data structures, code structure):
Milestone 5 (Libraries, hashing):
(I don't understand all the implications here, but we almost certainly won't be able to support hashing without support for source language libraries)
Future milestones:
Beta Was this translation helpful? Give feedback.
All reactions