Why does the key generation performance of p521_mlkem1024 drop significantly when OQS_KEM_ENCODERS is set to ON? #726

youer0219 · 2025-11-23T04:21:20Z

youer0219
Nov 23, 2025

I don't know if this is normal behavior or a bug.

If this is normal, then which piece of data is 'correct,' especially for projects that need to test TLS 1.3 handshake times? Also, should this be noted in the documentation?

no OQS_KEM_ENCODERS:

with OQS_KEM_ENCODERS:

For ML-KEM1024, its performance degradation is at least not noticeable and may have no impact:
no OQS_KEM_ENCODERS:

with OQS_KEM_ENCODERS:

Testing on another device:
no OQS_KEM_ENCODERS：

with OQS_KEM_ENCODERS:

Thank you for reviewing and replying!

Answered by Vishnu2707

May 2, 2026

Implemented a fix in PR #778 @RodriM11 @baentsch. The P-curve KEM keygen path now uses i2o_ECPublicKey() and i2d_ECPrivateKey() directly, bypassing the provider encoder scan entirely. Benchmarks with OQS_KEM_ENCODERS=ON show p256_mlkem512 recovering from 194 to 21,470 keygen/s , matching the no-encoder baseline.

View full answer

baentsch · 2025-11-23T07:17:16Z

baentsch
Nov 23, 2025
Maintainer

Thanks for the report @youer0219 : That's an interesting question indeed. As I also cannot immediately think of a logical reason for this given this should only involve presence/absence of encoders that should not run (at least I can't think of a reason why they should during keygen), can I ask whether you see the same behaviour with

other PQ hybrid KEM components, say Frodo
other classic hybrids, say p384 or x25519
?

The next question is whether you already tried to determine which functions most time is spent in in both cases using some profiler?

2 replies

youer0219 Nov 23, 2025
Author

Additional Algorithm Tests:

I also tested hybrid combinations of x25519/p256 with mlkem/frodo640aes/bikel1.
Enabling OQS_KEM_ENCODERS has a significant impact on the key generation performance of algorithms hybridized with P256, while its impact on those hybridized with x25519 is not significant.

no KEM_ENCODERS:

with KEM_ENCODERS:

Regarding Profilers:

I have limited knowledge in this area.

baentsch Nov 24, 2025
Maintainer

Thanks for these additional checks. This seems to "teach" that

this is independent of PQ alg
independent of oqsprovider KEM encoding logic (which is identical for X and P EC)
somewhat related to classic EC P code.

I have limited knowledge in this area.

Very well, then someone else should take a look using such tooling -- with a focus on EC P code paths as per the above. I myself currently only have very limited time for the project but will take a look as time permits.

Vishnu2707 · 2026-04-08T02:12:45Z

Vishnu2707
Apr 8, 2026

Hi, I reproduced this on Apple M-series (arm64, macOS 15.3.1,
OpenSSL 3.6.1, oqs-provider commit 334f9fc) and profiled the keygen
path using macOS sample. Here's the full picture.

Benchmark results

Summary

Algorithm	Without encoders (keygens/s)	With encoders (keygens/s)	Drop
p521_mlkem1024	394.1	185.2	53.0%
p256_mlkem512	406.9	191.5	52.9%
p384_mlkem768	383.2	188.9	50.7%
x25519_mlkem512	23727.0	23329.6	1.7%

All P-curve hybrids lose ~51-53% keygen throughput.
x25519 is unaffected. Encaps and decaps are unaffected
across all algorithms — confirming the issue is isolated
to keygen on EC P-curve paths.

Root cause from profiler

Profiling with macOS sample shows the bottleneck in the
WITH encoders path:

Every P-curve hybrid keygen triggers a full encoder provider
scan via oqsx_key_gen_evp_key_kem → i2d_provided → OSSL_ENCODER_CTX_new_for_pkey → OSSL_ENCODER_do_all_provided,
spending significant time in ossl_parse_property, property
string hash lookups, and pthread_rwlock operations — on
every single keygen call.

x25519 hybrids avoid this because oqsx_key_gen takes a
different code path that does not invoke
oqsx_key_gen_evp_key_kem.

The likely fix is caching the encoder context in the key or
provider context so the full provider scan runs once rather
than per-keygen.

@baentsch — if the root cause analysis looks right, I'd be happy to work on a fix. Would caching the encoder context in the keygen context be the right direction, or are there thread safety considerations I should be aware of first?

0 replies

RodriM11 · 2026-04-16T11:05:28Z

RodriM11
Apr 16, 2026
Maintainer

Hi @Vishnu2707 ! Thank you for the detailed report. Based on your analysis, the root cause is

x25519 hybrids avoid this because oqsx_key_gen takes a
different code path that does not invoke
oqsx_key_gen_evp_key_kem.

But all KEM hybrids call that procedure (see here) so maybe I am understanding something incorrectly.

0 replies

Vishnu2707 · 2026-04-28T18:53:36Z

Vishnu2707
Apr 28, 2026

Hi @RodriM11 ,
You're right to push back — let me correct this with the full source.
All three key types (ECP, ECBP, ECX) do call oqsx_key_gen_evp_key_kem. The divergence is inside that function, not before it.
When ctx->evp_info->raw_key_support = 0 (P-curves), the else branch at the bottom of oqsx_key_gen_evp_key_kem runs:

pubkeylen = i2d_PublicKey(pkey, &pubkey_enc);
privkeylen = i2d_PrivateKey(pkey, &privkey_enc);

EVP_PKEY *ck2 = d2i_PrivateKey_ex(...);  // selftest round-trip
EVP_PKEY_free(ck2);

This serialise → deserialise selftest fires OSSL_ENCODER_do_all_provided internally on every keygen call — which is what my profiler showed spending time in ossl_parse_property and pthread_rwlock operations.
When raw_key_support = 1 (x25519/ECX), this entire else block is skipped. Raw byte extraction via EVP_PKEY_get_raw_public_key / EVP_PKEY_get_raw_private_key is used instead — no serialisation, no selftest round-trip, no encoder scan.
The struct definitions confirm raw_key_support:

nids_ecx: third field = 1 for x25519 and x448
nids_ecp: third field = 0 for all P-curves

So the regression is the P-curve serialisation selftest running per-keygen. The fix direction would be caching the encoder context or moving the selftest outside the keygen hot path.
— @Vishnu2707

1 reply

RodriM11 Apr 29, 2026
Maintainer

Thanks for the detail @Vishnu2707 ! Good you be willing to provide a candidate solution to the problem?

Vishnu2707 · 2026-05-02T14:40:23Z

Vishnu2707
May 2, 2026

Implemented a fix in PR #778 @RodriM11 @baentsch. The P-curve KEM keygen path now uses i2o_ECPublicKey() and i2d_ECPrivateKey() directly, bypassing the provider encoder scan entirely. Benchmarks with OQS_KEM_ENCODERS=ON show p256_mlkem512 recovering from 194 to 21,470 keygen/s , matching the no-encoder baseline.

0 replies

Why does the key generation performance of p521_mlkem1024 drop significantly when OQS_KEM_ENCODERS is set to ON? #726

Uh oh!

youer0219 Nov 23, 2025

Replies: 5 comments · 3 replies

Uh oh!

baentsch Nov 23, 2025 Maintainer

Uh oh!

youer0219 Nov 23, 2025 Author

Additional Algorithm Tests:

Regarding Profilers:

Uh oh!

baentsch Nov 24, 2025 Maintainer

Uh oh!

Uh oh!

Vishnu2707 Apr 8, 2026

Uh oh!

RodriM11 Apr 16, 2026 Maintainer

Uh oh!

Uh oh!

Vishnu2707 Apr 28, 2026

Uh oh!

RodriM11 Apr 29, 2026 Maintainer

Uh oh!

Vishnu2707 May 2, 2026

youer0219
Nov 23, 2025

Replies: 5 comments 3 replies

baentsch
Nov 23, 2025
Maintainer

youer0219 Nov 23, 2025
Author

baentsch Nov 24, 2025
Maintainer

Vishnu2707
Apr 8, 2026

RodriM11
Apr 16, 2026
Maintainer

Vishnu2707
Apr 28, 2026

RodriM11 Apr 29, 2026
Maintainer

Vishnu2707
May 2, 2026