Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a Developer, I need to have a way to encode/decode Mina messages in Rust so I can implement an independent Mina peer #1

Closed
12 tasks done
akoptelov opened this issue Jul 20, 2022 · 27 comments
Assignees

Comments

@akoptelov
Copy link
Contributor

akoptelov commented Jul 20, 2022

Definition of "done"

A Rust library developed that contains type definitions for all Mina gossip and RPC messages with possibility to encode/decode using bin prot schema.

@akoptelov akoptelov self-assigned this Jul 20, 2022
@akoptelov
Copy link
Contributor Author

akoptelov commented Jul 20, 2022

There are some crates available that take care of bin prot encoding:

  • bin-prot by ChainSafe uses serde
  • binprot is a standalone library that claims fine-grained control over encoding
  • serde-binprot is the serde-based variant of binprot above.

Initially we can use one of these.

@akoptelov
Copy link
Contributor Author

akoptelov commented Jul 21, 2022

It is possible to get sexp-based shapes for all versioned types used by Mina, by using the following command:

mina.exe internal dump-type-shapes

Here is the output of that command, and this one is pretty-printed version of Block type shape.
https://github.com/name-placeholder/mina-p2p-messages-rs/blob/main/block-shape-formatted.sexp

@akoptelov
Copy link
Contributor Author

ChainSafe uses this JSON file as a schema for block, or transition, message, but it might be out of date with actual data structure.

@akoptelov
Copy link
Contributor Author

I'm in the progress of code generation basing on JSON from above, but it looks like I need to use that sexp generated basing on up-to-date Mina sources instead. @tizoc do you know per chance how hard would it be to modify that command so sexp output includes more information, like type module name, and recursion markers?

@akoptelov
Copy link
Contributor Author

I think it should be possible to use the current structure of that shapes after all. The idea is like this: each type has the "whole" shape, extended up to primitive types (or ones without shape), but we need to detect what versioned type is used e.g. as a record field type. We can do that basing on the part of the shape representing that field type, by searching for a type with exactly the same shape.

Consider the following shapes:

src/lib/transaction_snark/transaction_snark.ml:Transaction_snark.Pending_coinbase_stack_state.Stable.V1.t, 01b6150b8dbee028561cd4e372263a3f, (Exp
 (Record
  ((source
    (Exp
     (Record
      ((data (Exp (Base kimchi_backend_bigint_32_V1 ())))
       (state
        (Exp
         (Record
          ((init (Exp (Base kimchi_backend_bigint_32_V1 ())))
           (curr (Exp (Base kimchi_backend_bigint_32_V1 ())))))))))))
   (target
    (Exp
     (Record
      ((data (Exp (Base kimchi_backend_bigint_32_V1 ())))
       (state
        (Exp
         (Record
          ((init (Exp (Base kimchi_backend_bigint_32_V1 ())))
           (curr (Exp (Base kimchi_backend_bigint_32_V1 ()))))))))))))))
...
src/lib/mina_base/pending_coinbase.ml:Mina_base__Pending_coinbase.Stack_versioned.Stable.V1.t, 4c1a055e7620944ec41531887dfe7d6f, (Exp
 (Record
  ((data (Exp (Base kimchi_backend_bigint_32_V1 ())))
   (state
    (Exp
     (Record
      ((init (Exp (Base kimchi_backend_bigint_32_V1 ())))
       (curr (Exp (Base kimchi_backend_bigint_32_V1 ()))))))))))

Here we can detect that the both fields of the Transaction_snark.Pending_coinbase_stack_state.Stable.V1.t have the type Mina_base__Pending_coinbase.Stack_versioned.Stable.V1.t.

@tizoc
Copy link

tizoc commented Jul 21, 2022

If needed some extra processing can be added here to compare the shapes at each level to detect if some child is an already-seen shape https://github.com/MinaProtocol/mina/blob/1765ba6bdfd7c454e5ae836c49979fa076de1bea/src/app/cli/src/cli_entrypoint/mina_cli_entrypoint.ml#L1404

But in the end it is not very different from doing the same detection from the rendered result, there is really no extra information at this point that you don't have in the rendered version.

@akoptelov
Copy link
Contributor Author

Implementing RPC decoding.

@akoptelov
Copy link
Contributor Author

RPC get_epoch_ledger is implemented. Working towards other RPCs and a kind of a registry for defined method.

@akoptelov
Copy link
Contributor Author

akoptelov commented Sep 12, 2022

The list of V1 RPCs:

  • get_some_initial_peers
  • get_staged_ledger_aux_and_pending_coinbases_at_hash
  • answer_sync_ledger_query
  • get_transition_chain
  • get_transition_chain_proof
  • Get_transition_knowledge
  • get_ancestry
  • ban_notify
  • get_best_tip
  • get_node_status
  • get_epoch_ledger

@akoptelov
Copy link
Contributor Author

@akoptelov
Copy link
Contributor Author

akoptelov commented Sep 19, 2022

V2 Gossip messages are done. Now working on RPCs for V2.

@akoptelov
Copy link
Contributor Author

For RPCs we can't have get_transition_chain_proof for Mina V1 and V2 at the same time. The name and the version is the same while bin_prot encoding is different.

MinaProtocol/mina#11860

@akoptelov
Copy link
Contributor Author

Adding bencmarking, for both native and wasm32.

@akoptelov
Copy link
Contributor Author

Implementing memory consumption benchmarks.

@akoptelov
Copy link
Contributor Author

Performance test
Decoding of input messages, incoming RPCs captured during catch-up process in berkeleynet (~80M), x10 times.

Native:

mean: 0.26560021159999997, stdev: 0.0003018581737673776 

Wasm32 (Firefox):

    mean: 0.457054, stdev: 0.030821437443052

Wasm32 (Chrome):

    mean: 0.4464769999, stdev: 0.055822759584718405

Wasm32 (Node):

mean: 0.4218, stdev: 0.011541959999999999

The summary is that there is a slight slow down when running as Wasm32, but it is less than x2.

@akoptelov
Copy link
Contributor Author

Memory Consumption Test

Decoding of input messages, incoming RPCs captured during catch-up process in berkeleynet (~80M).

Native:

Ratio: 1.5793496061764019
Encoded size (N): 64115
Currently allocated (B): 0
Maximum allocated (B): 101260
Total amount of claimed memory (B): 101260
Total number of allocations: (N): 2376
Reallocations (N): 0

Wasm32 (Firefox):

Decoded bytes: 89779897
Ratio: 1.3257730848142988
Total Allocated (Bytes): 208338833
Currently Allocated (Bytes): 1024
Allocation Peak (Bytes): 119027771
Number of reallocations (N): 24

Still in-memory representation is 1.3-1.5 times bigger than encoded size. That is mostly because of enums with variants of different size -- all variants (even empty) require amount of memory suitable for the biggest variant.

@akoptelov
Copy link
Contributor Author

BTW, before this fix memory consumption ratio was ~8 (runtime representation was >8 times bigger than encoded size), and wasm32 tests were 50% slower.

@tizoc
Copy link

tizoc commented Oct 4, 2022

@akoptelov how did that change help? unless there is something I am missing, that code still allocates the same memory as before, plus space for the pointer to the bytes (which were inlined before), right?

@akoptelov
Copy link
Contributor Author

akoptelov commented Oct 4, 2022

@tizoc For simplicity, imagine this:

struct BigInt([u8;32]);

enum Option {
    None,
    Some(BigInt),
}

Each instance of the Option will be 32 bytes, while in encoded form Empty will be only 1 byte. So for this case we will have x32 increment. Now, if it would be like this,

struct BigInt(Box<[u8;32]>);

enum Option {
    None,
    Some(BigInt),
}

the Option will be only 8 bytes long, so it will be 8x increment for the Option::None decoding.

@tizoc
Copy link

tizoc commented Oct 4, 2022

@akoptelov ahh I see. Thanks.

@akoptelov
Copy link
Contributor Author

#18 is for make run-time size closer to encoded one.
#19 is to make it simpler.

@akoptelov
Copy link
Contributor Author

An update: #19 (comment)

@akoptelov
Copy link
Contributor Author

V2 wire types are generated without parameters now.
Working on boxing enum variants if needed.

@akoptelov
Copy link
Contributor Author

@akoptelov
Copy link
Contributor Author

akoptelov commented Oct 19, 2022

Heap allocations while decoding a GetStagedLedgerAuxAndPendingCoinbasesAtHashV2 RPC response.

Before boxing alts:

Ratio: 10.53691387685571
Encoded size (N): 19003707
Currently allocated (B): 199575280
Maximum allocated (B): 199575280
Total amount of claimed memory (B): 200240424
Total number of allocations: (N): 831141
Reallocations (N): 2897

With boxed alts:

Ratio: 2.063837071367181
Encoded size (N): 19003707
Currently allocated (B): 38649787
Maximum allocated (B): 38649787
Total amount of claimed memory (B): 39220555
Total number of allocations: (N): 840462
Reallocations (N): 2895

In a nutshell, the peaked (maximal allocated) size went from almost 200M down to 39M, what is only 2x times more than encoded size.

With preallocated vectors in binprot-rs:

Ratio: 1.8691436886497987
Encoded size (N): 19003707
Currently allocated (B): 35513747
Maximum allocated (B): 35513747
Total amount of claimed memory (B): 35520659
Total number of allocations: (N): 837591
Reallocations (N): 24

@akoptelov
Copy link
Contributor Author

Moved out security/safety items, they can be tracked elsewhere.

@akoptelov
Copy link
Contributor Author

The fully functional library for Mina types is developed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants