Why does the key generation performance of p521_mlkem1024 drop significantly when OQS_KEM_ENCODERS is set to ON? #726
-
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 3 replies
-
|
Thanks for the report @youer0219 : That's an interesting question indeed. As I also cannot immediately think of a logical reason for this given this should only involve presence/absence of encoders that should not run (at least I can't think of a reason why they should during keygen), can I ask whether you see the same behaviour with
The next question is whether you already tried to determine which functions most time is spent in in both cases using some profiler? |
Beta Was this translation helpful? Give feedback.
-
|
Hi, I reproduced this on Apple M-series (arm64, macOS 15.3.1, Benchmark results
Summary
All P-curve hybrids lose ~51-53% keygen throughput. Root cause from profiler Profiling with macOS
Every P-curve hybrid keygen triggers a full encoder provider x25519 hybrids avoid this because The likely fix is caching the encoder context in the key or @baentsch — if the root cause analysis looks right, I'd be happy to work on a fix. Would caching the encoder context in the keygen context be the right direction, or are there thread safety considerations I should be aware of first? |
Beta Was this translation helpful? Give feedback.
-
|
Hi @Vishnu2707 ! Thank you for the detailed report. Based on your analysis, the root cause is
But all KEM hybrids call that procedure (see here) so maybe I am understanding something incorrectly. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @RodriM11 , pubkeylen = i2d_PublicKey(pkey, &pubkey_enc);
privkeylen = i2d_PrivateKey(pkey, &privkey_enc);
EVP_PKEY *ck2 = d2i_PrivateKey_ex(...); // selftest round-trip
EVP_PKEY_free(ck2);This serialise → deserialise selftest fires OSSL_ENCODER_do_all_provided internally on every keygen call — which is what my profiler showed spending time in ossl_parse_property and pthread_rwlock operations. nids_ecx: third field = 1 for x25519 and x448 So the regression is the P-curve serialisation selftest running per-keygen. The fix direction would be caching the encoder context or moving the selftest outside the keygen hot path. |
Beta Was this translation helpful? Give feedback.
-
|
Implemented a fix in PR #778 @RodriM11 @baentsch. The P-curve KEM keygen path now uses i2o_ECPublicKey() and i2d_ECPrivateKey() directly, bypassing the provider encoder scan entirely. Benchmarks with OQS_KEM_ENCODERS=ON show p256_mlkem512 recovering from 194 to 21,470 keygen/s , matching the no-encoder baseline. |
Beta Was this translation helpful? Give feedback.










Implemented a fix in PR #778 @RodriM11 @baentsch. The P-curve KEM keygen path now uses i2o_ECPublicKey() and i2d_ECPrivateKey() directly, bypassing the provider encoder scan entirely. Benchmarks with OQS_KEM_ENCODERS=ON show p256_mlkem512 recovering from 194 to 21,470 keygen/s , matching the no-encoder baseline.