Celo: Serialization changes #130

ValarDragon · 2020-12-13T07:25:34Z

Description

Copies celo's serialization changes from #127

The key change is it adds serialize_uncompressed_unchecked.

Currently the implication of serialize_unchecked is to serialize in the uncompressed form, so this PR makes sense in the context of needing to serialize unchecked, compressed data.

The usecase for that is not clear to me, as the check needed for deserializing an affine curve point on a prime order curve is effectively the same as getting the affine representation of the point. (We currently don't implement arithmetic in montgomery form AFAIK)

Is the usecase for skipping subgroup checks for compressed curve points?

@jon-chuang, @Pratyush, is the above correct for where serializing unchecked compressed comes in handy?

I'll handle fixing the merge conflicts, and copying over the changes to make elliptic curves implement this if we agree on the need for the change.

We also have the question of whether the default should be for serialize_unchecked to be compressed or uncompressed. The current form of this PR is a breaking change that will fail silently in many places, by changing the implementation of serialize_unchecked.

Not sure what the best API to be using here right now is.

Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

Targeted PR against correct branch (main)
Linked to Github issue with discussion and accepted design OR have an explanation in the PR that describes this work.
Wrote unit tests
Updated relevant documentation in the code
Added a relevant changelog entry to the Pending section in CHANGELOG.md
Re-reviewed Files changed in the Github PR explorer

Co-authored-by: jon-chuang <[email protected]>

Pratyush · 2020-12-13T17:15:41Z

Is the usecase for skipping subgroup checks for compressed curve points?

Yes, this is the intended usecase, I believe. I’m not sure what the overhead of decompression vs subgroup checks is for common curves

Pratyush · 2020-12-14T17:24:38Z

@jon-chuang Could you elaborate further on the applications/potential use cases for this? Isn't taking the square root pretty expensive?

jon-chuang · 2020-12-15T02:05:10Z

Hi, yes, the use case is indeed to skip the subgroup check. We can use a batched subgroup check instead which is about 30-75x faster, iirc.

jon-chuang · 2020-12-15T02:06:43Z

It would be nice to specialise the impl for vec or [T] where T: AffineCurve etc. to perform the batch subgroup check. Perhaps a change for the future.

Pratyush · 2020-12-16T03:08:42Z

Hmm since it's useful, I propose renaming (de)serialize_unchecked to (de)serialize_compressed_unchecked. I am undecided on whether we should also rename serialize to serialize_compressed in the process. WDYT @ValarDragon @jon-chuang @kobigurk?

jon-chuang · 2020-12-16T03:26:11Z

Upon further thought, I personally dont like the idea of having checked and unchecked in the API, and am more in favour of one of the two solutions:

Remove such checks entirely, then the responsibility/manner of checking falls on the programmer
create a trait for each type representing the batch version of checking functions, with a default impl doing nothing, and impl for curves doing a batch check, and have this function invoked for the slice and vec data types. Well, this is probably pretty orthogonal to the question though.

Pratyush · 2020-12-16T03:29:30Z

Remove such checks entirely, then the responsibility/manner of checking falls on the programmer

I feel like this is probably going to lead to protocol problems elsewhere, and furthermore breaks the contracts of the curve types (which assume that the input is in the prime order subgroup)

Pratyush · 2020-12-16T03:32:46Z

We could try playing around with adding a default parameter to the serialize and serialize_uncompressed methods, so that by default it performs the checks, but if you change the default type parameter to a different one (which avoids the checks), then it does not perform on-curve checks.

jon-chuang · 2020-12-16T03:37:19Z

Hmm yeah, I guess the solution proposed is meant to allow for minimal breakage.

I guess compression and data integrity checks do form a reasonable quadrant on the type of (de)serialisation one would like generically speaking. One however only cares about toggling data integrity checks if they are relatively expensive, so they matter little to e.g. a bool deserialization.

Would you mind giving an example of the type parameter method?

ValarDragon · 2020-12-16T03:37:34Z

we could separate the unchecked serialization methods into its own trait, that is auto-implemented if canonical serialize is implemented

ValarDragon · 2020-12-16T03:41:32Z

Im in favor of the approach of having 4 methods

* serialize_compressed
* serialize_uncompressed
* serialize_compressed_unchecked
* serialize_uncompressed_unchecked

I think its fine to expose the four options here, and let downstream applications decide which method they want, and which methods they want to expose.

jon-chuang · 2020-12-16T03:43:41Z

Hmm since it's useful, I propose renaming (de)serialize_unchecked to (de)serialize_compressed_unchecked. I am undecided on whether we should also rename serialize to serialize_compressed in the process. WDYT @ValarDragon @jon-chuang @kobigurk?

Wrt to this question, are you proposing to have e.g. deserialize_compressed.. alongside deserialize_uncompressed.., only for the unchecked version of the methods?

jon-chuang · 2020-12-16T03:47:51Z

Im in favor of the approach of having 4 methods
* serialize_compressed
* serialize_uncompressed
* serialize_compressed_unchecked
* serialize_uncompressed_unchecked
I think its fine to expose the four options here, and let downstream applications decide which method they want, and which methods they want to expose.

Hmm, well barring curves, most types do not have an option to compress, so shouldn't we have a default that leads to minimal breakage?

Pratyush · 2020-12-16T03:51:05Z

Here's some example code, that allows incorporating the batch checks as well:

trait CanonicalDeserialize {
	Specifies how the check should be performed
	type Validator: Checker<Self>;
	
	fn deserialize<R: Read, C: ShouldCheck = AlwaysCheck>(reader: R) -> Self {
		// get item from R
		if C::should_check() {
			assert!(Self::Validator::check(&[item]));
		}
		item
	}

	
	fn deserialize_uncompressed<R: Read, C: ShouldCheck = AlwaysCheck>(reader: R) -> Self {
		// get uncompressed item from R
		if C::should_check() {
			assert!(Self::Validator::check(&[item]));
		}
		item
	}
}

/// Actually performs the check
trait Checker<T> {
	fn check(&[T]) -> bool;
}

trait ShouldCheck {
	fn should_check() -> bool;
}

struct AlwaysCheck;

impl ShouldCheck for AlwaysCheck {
	fn should_check() -> bool {
		true
	}
}

struct NoCheck;

impl ShouldCheck for NoCheck {
	fn never_check() -> bool {
		false
	}
}

Here's how you would use it to do both checked and unchecked serialization:

let item = T::deserialize(&mut reader)?;
let item = T::deserialize::<_, NoCheck>(&mut reader)?;

and here's how you would use the Checker API to implement batch checking without specialization

impl<T: CanonicalDeserialize> CanonicalDeserialize for Vec<T> {
	type Validator = VecChecker<T>;
	
	fn deserialize<R: Read, C: ShouldCheck = AlwaysCheck>(reader: R) -> Self {
		// get items from R
		if C::should_check() {
			assert!(T::Validator::check(items.as_slice()));
		}
		item
	}
}

struct VecChecker<T>(PhantomData<T>);
impl<T: CanonicalDeserialize> Checker<Vec<T>> for VecChecker<T> {
	fn check(items:&[Vec<T>]) -> bool {
		for item in items {
			T::check(item.as_slice())
		}
	}
}

The idea would be to have different implementations of checkers for different types.

Pratyush · 2020-12-16T03:54:27Z

Note that the changes relating to the Validator associated type are completely orthogonal to the changes related to the ShouldCheck trait; we could adopt either, neither, or both.

jon-chuang · 2020-12-16T03:55:13Z

Here's some example code, that allows incorporating the batch checks as well:

I like this strategy as it exposes optional fine grained control with fewer breaking changes and more separation of duties.

I don't, however, like the use of type parameters as an optional argument...

jon-chuang · 2020-12-16T04:07:32Z

Instead of the above approach, I propose instead to have a trait with overridable default impls

trait ValidData {
    fn validate(value: Self) -> Result {
        Ok(())
    }
    fn validate_batch(values: &[Self]) -> Result {
        for value in values {
            Self::check(value)?;
        }
        Ok(())
    }
}

Then one writes for impl<T: ValidData> CanonicalDeserialize for Vec<T>

fn deserialize<.., C: ..>(...) {
   for ... {
       deserialize(..); // use default
   }
   if C::should_validate() {
       T::validate_batch(values)?;
   }
}

Also validate sounds more precise.

jon-chuang · 2020-12-21T02:15:49Z

Hi, @Pratyush @ValarDragon, could I get permission to push to non-protected branches?

jon-chuang · 2020-12-21T02:44:30Z

Btw, where is serialize_derive actually used? I can't seem to find it used anywhere. For instance, derive_canonical_deserialize isn't.

weikengchen · 2020-12-21T02:49:28Z

The serialize-derive repo is often used in a project by adding a dependency on serialize with the feature derive.
https://github.com/arkworks-rs/algebra/blob/master/serialize/src/lib.rs#L19

This allows the Rust to derive the canonical serialization and deserialization.
e.g., https://github.com/arkworks-rs/poly-commit/blob/master/src/data_structures.rs#L107

weikengchen · 2020-12-21T02:50:58Z

By the way, thanks a lot for the serialization system by Celo. It has helped one of my projects significantly!

jon-chuang · 2020-12-21T03:11:27Z

Hi @weikengchen thanks, I guess it's not used in algebra itself, with arrays and simple structs. I see that it's necessary for cases where there are nested structs.

That's great, but according to the above, though, the API seems like it's about to change again.

jon-chuang · 2020-12-21T03:18:26Z

@Pratyush unfortunately it seems that your strategy doesn't work, becuase:
defaults for type parameters are only allowed in struct, enum, type, or trait definitions.
So I'm still looking for a nice workaround to specify default behaviour without introducing breaking changes and/or clutter.

One workaround is to have the traits Canonical(de)Serialize take a type parameter. However, this is obviously not flexible enough as within the same file, one might want to invoke e.g. serialize both with and without validation.

While ugly, it does seem that exposing the 4 functions with _unchecked is the most straitforward way to implement the desired functionality. Alternatively, while introducing breaking changes, a cleaner method might be to create an optional enum selector Option<SerializationMode> (encompassing both compression and validation procedures). While on the semantic level, rustc understands this as invoking a mode at runtime, I believe LLVM will inline the branches based on the mode invoked.

jon-chuang · 2020-12-21T03:29:46Z

@weikengchen actually with regards to your example https://github.com/arkworks-rs/poly-commit/blob/master/src/data_structures.rs#L107, it does seem one can get rid of the ugly code bloat by implementing canonical (de)serialize for the String type.

weikengchen · 2020-12-21T04:26:11Z

Yes. I will take a look!

jon-chuang · 2020-12-21T07:48:42Z

Things I am currently unsure about: how to impl derive to incorporate (de)serialize_uncompressed_unchecked.

weikengchen · 2020-12-21T07:51:46Z

Is the problem due to whether it should default to (de)serialize_uncompressed or (de)serialize_unchecked when a specific implementation is not given?

Pratyush · 2020-12-21T07:59:01Z

btw Jon, an alternative to prevent downstream users from having to implement all methods manually would be to have a "Mode", as outlined in #145

The idea is to have a mode enum:

enum Compress {
	Yes,
	No,
}
enum Validate {
	Yes,
	No
}

And this enum is passed into a single method

trait Serialize {
	fn serialize_with_mode<R: Read>(self, reader: R, compression: Compression, validation: Validation) -> ioResult;
	
	fn serialize_compressed(...) {
		self.serialize_with_mode(reader, Compress::Yes, Validate::Yes)
	}
	
	fn serialize_compressed_unchecked(...) {
		self.serialize_with_mode(reader, Compress::Yes, Validate::No)
	}
	
	fn serialize_uncompressed(...) {
		self.serialize_with_mode(reader, Compress::No, Validate::Yes)
	}
	
	fn serialize_uncompressed_unchecked(...) {
		self.serialize_with_mode(reader, Compress::No, Validate::No)
	}
}

So users only have to implement the single method, and everything else is implemented by default?

This should also make writing the Derive macro implementation easier.

(This can be combined with the above Validator associated type idea, or your ValidData trait idea)

Pratyush · 2021-07-13T15:43:24Z

Incorporating the ideas above, let me propose the following design:

pub enum Compress {
	Yes,
	No,
}
pub enum Validate {
	Yes,
	No,
}

pub trait Valid {
	fn check(&self) -> Result<(), SerializationError>;
	fn batch_check(batch: &[self]) -> Result<(), SerializationError> {
		for item in batch {
			item.check()?;
		}
		Ok(())
	}
}

pub trait CanonicalSerialize {
	fn deserialize_with_mode<W: Write>(writer: W, compress: Compress) -> Result<(), SerializationError> {
		// write item to W, handling compression as per `compress`
	}
	
	fn serialize_compressed(...) {
		self.serialize_with_mode(reader, Compress::Yes)
	}
	
	fn serialize_uncompressed(...) {
		self.serialize_with_mode(reader, Compress::No)
	}
}

pub trait CanonicalDeserialize: Valid {
	fn deserialize_with_mode<R: Read>(reader: R, compress: Compress, validate: Validate) -> Result<Self, SerializationError> {
		// get item from R, handling compression as per `compress`
		if let Validate::Yes = validate {
			Self::check(&item)?
		}
		item
	}
	
	fn deserialize_compressed(...) -> ... {
		Self::serialize_with_mode(reader, Compress::Yes, Validate::Yes)
	}
	
	fn deserialize_compressed_unchecked(...) -> ... {
		self.serialize_with_mode(reader, Compress::Yes, Validate::No)
	}
	
	fn deserialize_uncompressed(...) -> ... {
		Self::deserialize_with_mode(reader, Compress::No, Validate::Yes)
	}
	
	fn deserialize_uncompressed_unchecked(...) -> ... {
		Self::deserialize_with_mode(reader, Compress::No, Validate::No)
	}
}

impl<T: Valid> CanonicalDeserialize for Vec<T> {
	fn deserialize_with_mode<R: Read>(reader: R, compress: Compress, validate: Validate) -> Self {
		// get items: Vec<T> from R, handling compression as per `compress`
		if let Validate::Yes = validate {
			T::batch_check(&items)?
		}
		items
	}
}

Copy celo serialization code

dbcb86c

Co-authored-by: jon-chuang <[email protected]>

ValarDragon marked this pull request as draft December 13, 2020 07:27

ValarDragon changed the title ~~Celo: Serialization~~ Celo: Serialization changes Dec 13, 2020

Pratyush mentioned this pull request Dec 17, 2020

Serialization with modes #145

Closed

jon-chuang mentioned this pull request Dec 21, 2020

serialize tests Dummy is poorly written #152

Closed

Pratyush added P-medium Priority: medium T-feature Type: new features T-performance Type: performance improvements T-refactor Type: cleanup/refactor labels Sep 9, 2021

Pratyush mentioned this pull request Oct 27, 2021

Add unchecked deserialization to Canonical{Serialize/Deserialize} AleoNet/snarkVM#459

Closed

ghost mentioned this pull request Apr 5, 2022

Batch subgroup checks anoma/ferveo#80

Open

Pratyush mentioned this pull request Aug 30, 2022

More flexible and deduplicated serialization #463

Merged

6 tasks

Pratyush closed this in #463 Sep 2, 2022

Pratyush deleted the celo_serialize_rebased branch October 26, 2022 20:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Celo: Serialization changes #130

Celo: Serialization changes #130

ValarDragon commented Dec 13, 2020 •

edited

Loading

Pratyush commented Dec 13, 2020

Pratyush commented Dec 14, 2020

jon-chuang commented Dec 15, 2020

jon-chuang commented Dec 15, 2020

Pratyush commented Dec 16, 2020 •

edited

Loading

jon-chuang commented Dec 16, 2020 •

edited

Loading

Pratyush commented Dec 16, 2020

Pratyush commented Dec 16, 2020

jon-chuang commented Dec 16, 2020 •

edited

Loading

ValarDragon commented Dec 16, 2020

ValarDragon commented Dec 16, 2020 •

edited

Loading

jon-chuang commented Dec 16, 2020 •

edited

Loading

jon-chuang commented Dec 16, 2020

Pratyush commented Dec 16, 2020 •

edited

Loading

Pratyush commented Dec 16, 2020 •

edited

Loading

jon-chuang commented Dec 16, 2020 •

edited

Loading

jon-chuang commented Dec 16, 2020 •

edited

Loading

jon-chuang commented Dec 21, 2020

jon-chuang commented Dec 21, 2020

weikengchen commented Dec 21, 2020

weikengchen commented Dec 21, 2020

jon-chuang commented Dec 21, 2020

jon-chuang commented Dec 21, 2020 •

edited

Loading

jon-chuang commented Dec 21, 2020

weikengchen commented Dec 21, 2020

jon-chuang commented Dec 21, 2020

weikengchen commented Dec 21, 2020 •

edited

Loading

Pratyush commented Dec 21, 2020 •

edited

Loading

Pratyush commented Jul 13, 2021 •

edited

Loading

Celo: Serialization changes #130

Celo: Serialization changes #130

Conversation

ValarDragon commented Dec 13, 2020 • edited Loading

Description

Pratyush commented Dec 13, 2020

Pratyush commented Dec 14, 2020

jon-chuang commented Dec 15, 2020

jon-chuang commented Dec 15, 2020

Pratyush commented Dec 16, 2020 • edited Loading

jon-chuang commented Dec 16, 2020 • edited Loading

Pratyush commented Dec 16, 2020

Pratyush commented Dec 16, 2020

jon-chuang commented Dec 16, 2020 • edited Loading

ValarDragon commented Dec 16, 2020

ValarDragon commented Dec 16, 2020 • edited Loading

jon-chuang commented Dec 16, 2020 • edited Loading

jon-chuang commented Dec 16, 2020

Pratyush commented Dec 16, 2020 • edited Loading

Pratyush commented Dec 16, 2020 • edited Loading

jon-chuang commented Dec 16, 2020 • edited Loading

jon-chuang commented Dec 16, 2020 • edited Loading

jon-chuang commented Dec 21, 2020

jon-chuang commented Dec 21, 2020

weikengchen commented Dec 21, 2020

weikengchen commented Dec 21, 2020

jon-chuang commented Dec 21, 2020

jon-chuang commented Dec 21, 2020 • edited Loading

jon-chuang commented Dec 21, 2020

weikengchen commented Dec 21, 2020

jon-chuang commented Dec 21, 2020

weikengchen commented Dec 21, 2020 • edited Loading

Pratyush commented Dec 21, 2020 • edited Loading

Pratyush commented Jul 13, 2021 • edited Loading

ValarDragon commented Dec 13, 2020 •

edited

Loading

Pratyush commented Dec 16, 2020 •

edited

Loading

jon-chuang commented Dec 16, 2020 •

edited

Loading

jon-chuang commented Dec 16, 2020 •

edited

Loading

ValarDragon commented Dec 16, 2020 •

edited

Loading

jon-chuang commented Dec 16, 2020 •

edited

Loading

Pratyush commented Dec 16, 2020 •

edited

Loading

Pratyush commented Dec 16, 2020 •

edited

Loading

jon-chuang commented Dec 16, 2020 •

edited

Loading

jon-chuang commented Dec 16, 2020 •

edited

Loading

jon-chuang commented Dec 21, 2020 •

edited

Loading

weikengchen commented Dec 21, 2020 •

edited

Loading

Pratyush commented Dec 21, 2020 •

edited

Loading

Pratyush commented Jul 13, 2021 •

edited

Loading