Merge BoringSSL through 167f1760ddfeaea1ee1a671ca88aafcccfe30ee0 #1645

briansmith · 2023-09-17T19:17:27Z

This is a no-op merge. More merges are coming.

This partially fixes a bug where, on x86_64, BN_mod_exp_mont_consttime would sometimes return m, the modulus, when it should have returned zero. Thanks to Guido Vranken for reporting it. It is only a partial fix because the same bug also exists in the "rsaz" codepath. That will be fixed in the subsequent CL. (See the commented out test.) The bug only affects zero outputs (with non-zero inputs), so we believe it has no security impact on our cryptographic functions. BoringSSL calls BN_mod_exp_mont_consttime in the following cases: - RSA private key operations - Primality testing, raising the witness to the odd part of p-1 - DSA keygen and key import, pub = g^priv (mod p) - DSA signing, r = g^k (mod p) - DH keygen, pub = g^priv (mod p) - Diffie-Hellman, secret = peer^priv (mod p) It is not possible in the RSA private key operation, provided p and q are primes. If using CRT, we are working modulo a prime, so zero output with non-zero input is impossible. If not using CRT, we work mod n. While there are nilpotent values mod n, none of them hit zero by exponentiating. (Both p and q would need to divide the input, which means n divides the input.) In primality testing, this can only be hit when the input was composite. But as the rest of the loop cannot then hit 1, we'll correctly report it as composite anyway. DSA and DH work modulo a prime, where this case cannot happen. Analysis: This bug is the result of sloppiness with the looser bounds from "almost Montgomery multiplication", described in https://eprint.iacr.org/2011/239. Prior to upstream's ec9cc70f72454b8d4a84247c86159613cee83b81, I believe x86_64-mont5.pl implemented standard Montgomery reduction (the left half of figure 3 in the paper). Though it did not document this, ec9cc70f7245 changed it to implement the "almost" variant (the right half of the figure.) The difference is that, rather than subtracting if T >= m, it subtracts if T >= R. In code, it is the difference between something like our bn_reduce_once, vs. subtracting based only on T's carry bit. (Interestingly, the .Lmul_enter branch of bn_mul_mont_gather5 seems to still implement normal reduction, but the .Lmul4x_enter branch is an almost reduction.) That means none of the intermediate values here are bounded by m. They are only bounded by R. Accordingly, Figure 2 in the paper ends with step 10: REDUCE h modulo m. BN_mod_exp_mont_consttime is missing this step. The bn_from_montgomery call only implements step 9, AMM(h, 1). (x86_64-mont5.pl's bn_from_montgomery only implements an almost reduction.) The impact depends on how unreduced AMM(h, 1) can be. Remark 1 of the paper discusses this, but is ambiguous about the scope of its 2^(n-1) < m < 2^n precondition. The m+1 bound appears to be unconditional: Montgomery reduction ultimately adds some 0 <= Y < m*R to T, to get a multiple of R, and then divides by R. The output, pre-subtraction, is thus less than m + T/R. MM works because T < mR => T' < m + mR/R = 2m. A single subtraction of m if T' >= m gives T'' < m. AMM works because T < R^2 => T' < m + R^2/R = m + R. A single subtraction of m if T' >= R gives T'' < R. See also Lemma 1, Section 3 and Section 4 of the paper, though their formulation is more complicated to capture the word-by-word algorithm. It's ultimately the same adjustment to T. But in AMM(h, 1), T = h*1 = h < R, so AMM(h, 1) < m + R/R = m + 1. That is, AMM(h, 1) <= m. So the only case when AMM(h, 1) isn't fully reduced is if it outputs m. Thus, our limited impact. Indeed, Remark 1 mentions step 10 isn't necessary because m is a prime and the inputs are non-zero. But that doesn't apply here because BN_mod_exp_mont_consttime may be called elsewhere. Fix: To fix this, we could add the missing step 10, but a full division would not be constant-time. The analysis above says it could be a single subtraction, bn_reduce_once, but then we could integrate it into the subtraction already in plain Montgomery reduction, implemented by uppercase BN_from_montgomery. h*1 = h < R <= m*R, so we are within bounds. Thus, we delete lowercase bn_from_montgomery altogether, and have the mont5 path use the same BN_from_montgomery ending as the non-mont5 path. This only impacts the final step of the whole exponentiation and has no measurable perf impact. In doing so, add comments describing these looser bounds. This includes one subtlety that BN_mod_exp_mont_consttime actually mixes bn_mul_mont (MM) with bn_mul_mont_gather5/bn_power5 (AMM). But this is fine because MM is AMM-compatible; when passed AMM's looser inputs, it will still produce a correct looser output. Ideally we'd drop the "almost" reduction and stick to the more straightforward bounds. As this only impacts the final subtraction in each reduction, I would be surprised if it actually had a real performance impact. But this would involve deeper change to x86_64-mont5.pl, so I haven't tried this yet. I believe this is basically the same bug as golang/go#13907 from Go. Change-Id: I06f879777bb2ef181e9da7632ec858582e2afa38 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52825 Commit-Queue: David Benjamin <[email protected]> Reviewed-by: Adam Langley <[email protected]>

RSAZ has a very similar bug to mont5 from https://boringssl-review.googlesource.com/c/boringssl/+/52825 and may return the modulus when it should return zero. As in that CL, there is no security impact on our cryptographic primitives. RSAZ is described in the paper "Software Implementation of Modular Exponentiation, Using Advanced Vector Instructions Architectures". The bug comes from RSAZ's use of "NRMM" or "Non Reduced Montgomery Multiplication". This is like normal Montgomery multiplication, but skips the final subtraction altogether (whereas mont5's AMM still subtracts, but replaces MM's tigher bound with just the carry bit). This would normally not be stable, but RSAZ picks a larger R > 4M, and maintains looser bounds for modular arithmetic, a < 2M. Lemma 1 from the paper proves that NRMM(a, b) preserves this 2M bound. It also claims NRMM(a, 1) < M. That is, conversion out of Montgomery form with NRMM is fully reduced. This second claim is wrong. The proof shows that NRMM(a, 1) < 1/2 + M, which only implies NRMM(a, 1) <= M, not NRMM(a, 1) < M. RSAZ relies on this to produce a reduced output (see Figure 7 in the paper). Thus, like mont5 with AMM, RSAZ may return the modulus when it should return zero. Fix this by adding a bn_reduce_once_in_place call at the end of the operation. Change-Id: If28bc49ae8dfbfb43bea02af5ea10c4209a1c6e6 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52827 Reviewed-by: Adam Langley <[email protected]> Commit-Queue: David Benjamin <[email protected]>

The unrolled loops appear to have negligible perf impact: Before: Did 18480 RSA 2048 signing operations in 10005085us (1847.1 ops/sec) Did 2720 RSA 4096 signing operations in 10056337us (270.5 ops/sec) After: Did 18480 RSA 2048 signing operations in 10012218us (1845.7 ops/sec) [-0.1%] Did 2700 RSA 4096 signing operations in 10003972us (269.9 ops/sec) [-0.2%] Change-Id: I29073c373a03a9798f6e04016626e6ab910e893a Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52826 Commit-Queue: David Benjamin <[email protected]> Reviewed-by: Adam Langley <[email protected]>

Both implementations need to compute the first 32 powers of a. There's a commented out naive version in rsaz_exp.c that claims to be smaller, but 1% slower. (It doesn't use squares when it otherwise could.) Instead, we can write out the square-based strategy as a loop. (I wasn't able to measure a difference between any of the three versions, but this one's compact enough and does let us square more and gather5 less.) Change-Id: I7015f2a78584cd97f29b54d0007479bdcc3a01ba Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52828 Reviewed-by: Adam Langley <[email protected]> Commit-Queue: David Benjamin <[email protected]>

(It's a pain because we don't have it setup in CMake, but perhaps we should have a builder for the configuration that doesn't have bcm.c.) Change-Id: Ic408f0a86c9d42346244d6a7b7e9e664b58fc70c Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52845 Reviewed-by: David Benjamin <[email protected]> Commit-Queue: David Benjamin <[email protected]>

https://boringssl-review.googlesource.com/c/boringssl/+/52825 lost a tmp.width = top line. Without it, tmp.width was set by bn_one_to_montgomery. Since we always size modular arithmetic by the modulus, tmp.width (and am.width) will actually always be top, and there's actually no need to zero pad it. We don't capture this in the type system or BIGNUM width convention, so better to set the width explicitly. The original code did it at the end, but I think doing it right when we zero pad it is better, as that's when the size gets set. But we can go a step further. The manual zero padding code came from OpenSSL, which still had the bn_correct_top invariant. Our BIGNUMs are resizable, so just call bn_resize_words, immediately after the computation. (bn_resize_words will not reallocate the data because the BIGNUMs have the STATIC_DATA flag set. bn_wexpand will internally allow expanding up to dmax, or top.) Change-Id: I2403afa7381b8a407615c6730fba9edaa41125c6 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52906 Reviewed-by: Adam Langley <[email protected]> Commit-Queue: David Benjamin <[email protected]>

OpenSSL 1.1.1 added HKDF support, but by sticking it into EVP_PKEY_derive, the API meant for Diffie-Hellman-like primitives. Implement it for OpenSSL compatibility. This does unfortunately mean anything using EVP now pulls in HKDF. HKDF isn't much code, but we should make EVP more static-linker-friendly. (Filed https://crbug.com/boringssl/497) Change-Id: I90b9b0d918129829eb36ba9d50ff4d8580346ff0 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52829 Reviewed-by: Adam Langley <[email protected]> Commit-Queue: David Benjamin <[email protected]>

Signed-off-by: Rebecca Chang Swee Fun <[email protected]> Change-Id: If6424a3b268943a5e2dc847f94b509d4b509df79 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52765 Commit-Queue: Adam Langley <[email protected]> Reviewed-by: Adam Langley <[email protected]>

These are the X509_REQ analogs to https://boringssl-review.googlesource.com/c/boringssl/+/43784 and https://boringssl-review.googlesource.com/c/boringssl/+/47525. The one difference is CSRs do not have the redundant copy of the signature algorithm, so X509_REQ_set1_signature_algo is a little simpler. Change-Id: I5fe27b22b30520c71542d51044b0b16712021f59 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52945 Reviewed-by: Bob Beck <[email protected]> Commit-Queue: Bob Beck <[email protected]>

I don't think these are all UB by C's rules, but it's easier not to think about the pointers. Still more to go, but these were some easy ones. Bug: 301 Change-Id: Icdcb7fb40f85983cbf566786c5f7dbfd7bb06571 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52905 Reviewed-by: Bob Beck <[email protected]> Commit-Queue: Bob Beck <[email protected]>

Change-Id: Iddc34918dcd3e4a6e80c79e5b8efa11e846c73d1 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52885 Reviewed-by: David Benjamin <[email protected]>

This is required to run SHA3 tests otherwise we get an error of unknown algorithm. Change-Id: I085da2b6757ba1f452f33abc7f1bafc4a404e025 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52926 Reviewed-by: Adam Langley <[email protected]> Reviewed-by: Corie Pressley <[email protected]>

This split in methods probably needs rearranging. This means most callers (i.e. ones that don't use EVP_PKEY_CTX_new_id) will no longer pick up the HKDF EVP method. Bug: 497 Change-Id: I72ef255a472c458e8db62461f7b3ac65fa488535 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52830 Reviewed-by: Adam Langley <[email protected]>

Not that we get much type-checking from this, as an assembly function. Change-Id: I21643444cfc577e2d68f11891e602724ded52e7f Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52831 Reviewed-by: Adam Langley <[email protected]>

bind uses this function. Change-Id: I97ba86d9f75597bff125ae0b56952effc397e6b8 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/53010 Commit-Queue: David Benjamin <[email protected]> Reviewed-by: Bob Beck <[email protected]> Commit-Queue: Bob Beck <[email protected]>

I'm not sure what the history of this is, but it seems to work just fine in MSVC now. Change-Id: Iebdc365486bb30a61a1001f705aef7dcaa2a9fcd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52985 Reviewed-by: Adam Langley <[email protected]>

MSVC 2022's C4191 warns on most function pointer casts. Fix and/or silence them: connect.c is an unavoidable false positive. We're casting the pointer to the correct type. The problem was that the caller is required to first cast it to the wrong type in OpenSSL's API, due to the BIO_callback_ctrl calling convention. Suppress it. LHASH_OF(T) and STACK_OF(T)'s defintions also have a false positive. Suppress that warning. Calling the functions through the casted types would indeed be UB, but we don't do that. We use them as goofy type-erased types. The problem is there is no function pointer equivalent of void*. (Using actual void* instead trips a GCC warning.) The sk_sort instance is a true instance of UB. The problem is qsort lacks a context parameter. I've made sk_sort call qsort_s on _MSC_VER, to silence the warning. Ideally we'd fix it on other platforms, but qsort_r and qsort_s are a disaster. See https://stackoverflow.com/a/39561369 Fixed: 495 Change-Id: I0dca80670c74afaa03fc5c8fd7059b4cfadfac72 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/53005 Reviewed-by: Adam Langley <[email protected]> Commit-Queue: Adam Langley <[email protected]>

…end of RSAZ. The test case that is enabled in the merged BoringSSL change was already enabled in *ring*, so nothing changed regarding the test cases. *ring* doesn't use RSAZ. ``` git difftool \ 801a801:crypto/fipsmodule/bn/bn_tests.txt \ .\src\arithmetic\bigint_elem_exp_consttime_tests.txt ```

*ring*'s Rust code already did this.

*ring* doesn't use RSAZ. *ring* already uses this simpler loop.

None of the changes seem useful to *ring* due to previous oxidation.

This was already done in *ring*'s Rust code.

davidben and others added 25 commits June 3, 2022 20:03

More includes for builds that don't use bcm.c

572c416

Change-Id: Iddc34918dcd3e4a6e80c79e5b8efa11e846c73d1 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52885 Reviewed-by: David Benjamin <[email protected]>

Const-correct bn_gather5.

efd09b7

Not that we get much type-checking from this, as an assembly function. Change-Id: I21643444cfc577e2d68f11891e602724ded52e7f Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52831 Reviewed-by: Adam Langley <[email protected]>

No-op merge of BoringSSL 'c7de4fe'.

73d9069

*ring*'s Rust code already did this.

No-op merge of BoringSSL 'b8a6514'.

e11e97e

*ring* doesn't use RSAZ. *ring* already uses this simpler loop.

Ignore BoringSSL 8ba90d1..f575d9b.

ce97384

Skip BoringSSL '1e469e4': Replace some more C unions.

98ae327

None of the changes seem useful to *ring* due to previous oxidation.

Ignore BoringSSL `572c416..d27a01e'

c8bbdf9

No-op merge of BoringSSL 'efd09b7'.

29b441c

This was already done in *ring*'s Rust code.

Ignore BoringSSL '0ebd69b..167f176'.

ba41827

briansmith self-assigned this Sep 17, 2023

briansmith merged commit e48fdcf into main Sep 17, 2023
266 of 267 checks passed

briansmith deleted the b/merge-boringssl-8 branch September 17, 2023 19:18

briansmith restored the b/merge-boringssl-8 branch September 19, 2023 00:07

briansmith deleted the b/merge-boringssl-8 branch October 6, 2023 19:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge BoringSSL through 167f1760ddfeaea1ee1a671ca88aafcccfe30ee0 #1645

Merge BoringSSL through 167f1760ddfeaea1ee1a671ca88aafcccfe30ee0 #1645

briansmith commented Sep 17, 2023

Merge BoringSSL through 167f1760ddfeaea1ee1a671ca88aafcccfe30ee0 #1645

Merge BoringSSL through 167f1760ddfeaea1ee1a671ca88aafcccfe30ee0 #1645

Conversation

briansmith commented Sep 17, 2023