Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge BoringSSL through 167f1760ddfeaea1ee1a671ca88aafcccfe30ee0 #1645

Merged
merged 25 commits into from
Sep 17, 2023

Conversation

briansmith
Copy link
Owner

This is a no-op merge. More merges are coming.

davidben and others added 25 commits June 3, 2022 20:03
This partially fixes a bug where, on x86_64, BN_mod_exp_mont_consttime
would sometimes return m, the modulus, when it should have returned
zero. Thanks to Guido Vranken for reporting it. It is only a partial fix
because the same bug also exists in the "rsaz" codepath. That will be
fixed in the subsequent CL. (See the commented out test.)

The bug only affects zero outputs (with non-zero inputs), so we believe
it has no security impact on our cryptographic functions. BoringSSL
calls BN_mod_exp_mont_consttime in the following cases:

- RSA private key operations
- Primality testing, raising the witness to the odd part of p-1
- DSA keygen and key import, pub = g^priv (mod p)
- DSA signing, r = g^k (mod p)
- DH keygen, pub = g^priv (mod p)
- Diffie-Hellman, secret = peer^priv (mod p)

It is not possible in the RSA private key operation, provided p and q
are primes. If using CRT, we are working modulo a prime, so zero output
with non-zero input is impossible. If not using CRT, we work mod n.
While there are nilpotent values mod n, none of them hit zero by
exponentiating. (Both p and q would need to divide the input, which
means n divides the input.)

In primality testing, this can only be hit when the input was composite.
But as the rest of the loop cannot then hit 1, we'll correctly report it
as composite anyway.

DSA and DH work modulo a prime, where this case cannot happen.

Analysis:

This bug is the result of sloppiness with the looser bounds from "almost
Montgomery multiplication", described in
https://eprint.iacr.org/2011/239. Prior to upstream's
ec9cc70f72454b8d4a84247c86159613cee83b81, I believe x86_64-mont5.pl
implemented standard Montgomery reduction (the left half of figure 3 in
the paper).

Though it did not document this, ec9cc70f7245 changed it to implement
the "almost" variant (the right half of the figure.) The difference is
that, rather than subtracting if T >= m, it subtracts if T >= R. In
code, it is the difference between something like our bn_reduce_once,
vs. subtracting based only on T's carry bit. (Interestingly, the
.Lmul_enter branch of bn_mul_mont_gather5 seems to still implement
normal reduction, but the .Lmul4x_enter branch is an almost reduction.)

That means none of the intermediate values here are bounded by m. They
are only bounded by R. Accordingly, Figure 2 in the paper ends with
step 10: REDUCE h modulo m. BN_mod_exp_mont_consttime is missing this
step. The bn_from_montgomery call only implements step 9, AMM(h, 1).
(x86_64-mont5.pl's bn_from_montgomery only implements an almost
reduction.)

The impact depends on how unreduced AMM(h, 1) can be. Remark 1 of the
paper discusses this, but is ambiguous about the scope of its 2^(n-1) <
m < 2^n precondition. The m+1 bound appears to be unconditional:

Montgomery reduction ultimately adds some 0 <= Y < m*R to T, to get a
multiple of R, and then divides by R. The output, pre-subtraction, is
thus less than m + T/R. MM works because T < mR => T' < m + mR/R = 2m.
A single subtraction of m if T' >= m gives T'' < m. AMM works because
T < R^2 => T' < m + R^2/R = m + R. A single subtraction of m if T' >= R
gives T'' < R. See also Lemma 1, Section 3 and Section 4 of the paper,
though their formulation is more complicated to capture the word-by-word
algorithm. It's ultimately the same adjustment to T.

But in AMM(h, 1), T = h*1 = h < R, so AMM(h, 1) < m + R/R = m + 1. That
is, AMM(h, 1) <= m. So the only case when AMM(h, 1) isn't fully reduced
is if it outputs m. Thus, our limited impact. Indeed, Remark 1 mentions
step 10 isn't necessary because m is a prime and the inputs are
non-zero. But that doesn't apply here because BN_mod_exp_mont_consttime
may be called elsewhere.

Fix:

To fix this, we could add the missing step 10, but a full division would
not be constant-time. The analysis above says it could be a single
subtraction, bn_reduce_once, but then we could integrate it into
the subtraction already in plain Montgomery reduction, implemented by
uppercase BN_from_montgomery. h*1 = h < R <= m*R, so we are within
bounds.

Thus, we delete lowercase bn_from_montgomery altogether, and have the
mont5 path use the same BN_from_montgomery ending as the non-mont5 path.
This only impacts the final step of the whole exponentiation and has no
measurable perf impact.

In doing so, add comments describing these looser bounds.  This includes
one subtlety that BN_mod_exp_mont_consttime actually mixes bn_mul_mont
(MM) with bn_mul_mont_gather5/bn_power5 (AMM). But this is fine because
MM is AMM-compatible; when passed AMM's looser inputs, it will still
produce a correct looser output.

Ideally we'd drop the "almost" reduction and stick to the more
straightforward bounds. As this only impacts the final subtraction in
each reduction, I would be surprised if it actually had a real
performance impact. But this would involve deeper change to
x86_64-mont5.pl, so I haven't tried this yet.

I believe this is basically the same bug as
golang/go#13907 from Go.

Change-Id: I06f879777bb2ef181e9da7632ec858582e2afa38
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52825
Commit-Queue: David Benjamin <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
RSAZ has a very similar bug to mont5 from
https://boringssl-review.googlesource.com/c/boringssl/+/52825 and may
return the modulus when it should return zero. As in that CL, there is
no security impact on our cryptographic primitives.

RSAZ is described in the paper "Software Implementation of Modular
Exponentiation, Using Advanced Vector Instructions Architectures".

The bug comes from RSAZ's use of "NRMM" or "Non Reduced Montgomery
Multiplication". This is like normal Montgomery multiplication, but
skips the final subtraction altogether (whereas mont5's AMM still
subtracts, but replaces MM's tigher bound with just the carry bit). This
would normally not be stable, but RSAZ picks a larger R > 4M, and
maintains looser bounds for modular arithmetic, a < 2M.

Lemma 1 from the paper proves that NRMM(a, b) preserves this 2M bound.
It also claims NRMM(a, 1) < M. That is, conversion out of Montgomery
form with NRMM is fully reduced. This second claim is wrong. The proof
shows that NRMM(a, 1) < 1/2 + M, which only implies NRMM(a, 1) <= M, not
NRMM(a, 1) < M. RSAZ relies on this to produce a reduced output (see
Figure 7 in the paper).

Thus, like mont5 with AMM, RSAZ may return the modulus when it should
return zero. Fix this by adding a bn_reduce_once_in_place call at the
end of the operation.

Change-Id: If28bc49ae8dfbfb43bea02af5ea10c4209a1c6e6
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52827
Reviewed-by: Adam Langley <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
The unrolled loops appear to have negligible perf impact:

Before:
Did 18480 RSA 2048 signing operations in 10005085us (1847.1 ops/sec)
Did 2720 RSA 4096 signing operations in 10056337us (270.5 ops/sec)

After:
Did 18480 RSA 2048 signing operations in 10012218us (1845.7 ops/sec) [-0.1%]
Did 2700 RSA 4096 signing operations in 10003972us (269.9 ops/sec) [-0.2%]

Change-Id: I29073c373a03a9798f6e04016626e6ab910e893a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52826
Commit-Queue: David Benjamin <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
Both implementations need to compute the first 32 powers of a. There's a
commented out naive version in rsaz_exp.c that claims to be smaller, but
1% slower. (It doesn't use squares when it otherwise could.)

Instead, we can write out the square-based strategy as a loop. (I wasn't
able to measure a difference between any of the three versions, but this
one's compact enough and does let us square more and gather5 less.)

Change-Id: I7015f2a78584cd97f29b54d0007479bdcc3a01ba
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52828
Reviewed-by: Adam Langley <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
(It's a pain because we don't have it setup in CMake, but perhaps we
should have a builder for the configuration that doesn't have bcm.c.)

Change-Id: Ic408f0a86c9d42346244d6a7b7e9e664b58fc70c
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52845
Reviewed-by: David Benjamin <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
https://boringssl-review.googlesource.com/c/boringssl/+/52825 lost a
tmp.width = top line. Without it, tmp.width was set by
bn_one_to_montgomery. Since we always size modular arithmetic by the
modulus, tmp.width (and am.width) will actually always be top, and
there's actually no need to zero pad it.

We don't capture this in the type system or BIGNUM width convention, so
better to set the width explicitly. The original code did it at the end,
but I think doing it right when we zero pad it is better, as that's when
the size gets set.

But we can go a step further. The manual zero padding code came from
OpenSSL, which still had the bn_correct_top invariant. Our BIGNUMs are
resizable, so just call bn_resize_words, immediately after the
computation.

(bn_resize_words will not reallocate the data because the BIGNUMs have
the STATIC_DATA flag set. bn_wexpand will internally allow expanding up
to dmax, or top.)

Change-Id: I2403afa7381b8a407615c6730fba9edaa41125c6
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52906
Reviewed-by: Adam Langley <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
OpenSSL 1.1.1 added HKDF support, but by sticking it into
EVP_PKEY_derive, the API meant for Diffie-Hellman-like primitives.
Implement it for OpenSSL compatibility.

This does unfortunately mean anything using EVP now pulls in HKDF. HKDF
isn't much code, but we should make EVP more static-linker-friendly.
(Filed https://crbug.com/boringssl/497)

Change-Id: I90b9b0d918129829eb36ba9d50ff4d8580346ff0
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52829
Reviewed-by: Adam Langley <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
Signed-off-by: Rebecca Chang Swee Fun <[email protected]>
Change-Id: If6424a3b268943a5e2dc847f94b509d4b509df79
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52765
Commit-Queue: Adam Langley <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
These are the X509_REQ analogs to
https://boringssl-review.googlesource.com/c/boringssl/+/43784 and
https://boringssl-review.googlesource.com/c/boringssl/+/47525.

The one difference is CSRs do not have the redundant copy of the
signature algorithm, so X509_REQ_set1_signature_algo is a little
simpler.

Change-Id: I5fe27b22b30520c71542d51044b0b16712021f59
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52945
Reviewed-by: Bob Beck <[email protected]>
Commit-Queue: Bob Beck <[email protected]>
I don't think these are all UB by C's rules, but it's easier not to
think about the pointers. Still more to go, but these were some easy
ones.

Bug: 301
Change-Id: Icdcb7fb40f85983cbf566786c5f7dbfd7bb06571
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52905
Reviewed-by: Bob Beck <[email protected]>
Commit-Queue: Bob Beck <[email protected]>
Change-Id: Iddc34918dcd3e4a6e80c79e5b8efa11e846c73d1
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52885
Reviewed-by: David Benjamin <[email protected]>
This is required to run SHA3 tests otherwise we get an error of unknown
algorithm.

Change-Id: I085da2b6757ba1f452f33abc7f1bafc4a404e025
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52926
Reviewed-by: Adam Langley <[email protected]>
Reviewed-by: Corie Pressley <[email protected]>
This split in methods probably needs rearranging. This means most
callers (i.e. ones that don't use EVP_PKEY_CTX_new_id) will no longer
pick up the HKDF EVP method.

Bug: 497
Change-Id: I72ef255a472c458e8db62461f7b3ac65fa488535
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52830
Reviewed-by: Adam Langley <[email protected]>
Not that we get much type-checking from this, as an assembly function.

Change-Id: I21643444cfc577e2d68f11891e602724ded52e7f
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52831
Reviewed-by: Adam Langley <[email protected]>
bind uses this function.

Change-Id: I97ba86d9f75597bff125ae0b56952effc397e6b8
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/53010
Commit-Queue: David Benjamin <[email protected]>
Reviewed-by: Bob Beck <[email protected]>
Commit-Queue: Bob Beck <[email protected]>
I'm not sure what the history of this is, but it seems to work
just fine in MSVC now.

Change-Id: Iebdc365486bb30a61a1001f705aef7dcaa2a9fcd
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52985
Reviewed-by: Adam Langley <[email protected]>
MSVC 2022's C4191 warns on most function pointer casts. Fix and/or
silence them:

connect.c is an unavoidable false positive. We're casting the pointer to
the correct type. The problem was that the caller is required to first
cast it to the wrong type in OpenSSL's API, due to the BIO_callback_ctrl
calling convention. Suppress it.

LHASH_OF(T) and STACK_OF(T)'s defintions also have a false positive.
Suppress that warning. Calling the functions through the casted types
would indeed be UB, but we don't do that. We use them as goofy
type-erased types. The problem is there is no function pointer
equivalent of void*. (Using actual void* instead trips a GCC warning.)

The sk_sort instance is a true instance of UB. The problem is qsort
lacks a context parameter. I've made sk_sort call qsort_s on _MSC_VER,
to silence the warning. Ideally we'd fix it on other platforms, but
qsort_r and qsort_s are a disaster. See
https://stackoverflow.com/a/39561369

Fixed: 495
Change-Id: I0dca80670c74afaa03fc5c8fd7059b4cfadfac72
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/53005
Reviewed-by: Adam Langley <[email protected]>
Commit-Queue: Adam Langley <[email protected]>
…end of RSAZ.

The test case that is enabled in the merged BoringSSL change was already enabled in
*ring*, so nothing changed regarding the test cases. *ring* doesn't use RSAZ.

```
git difftool \
    801a801:crypto/fipsmodule/bn/bn_tests.txt \
    .\src\arithmetic\bigint_elem_exp_consttime_tests.txt
```
*ring*'s Rust code already did this.
*ring* doesn't use RSAZ. *ring* already uses this simpler loop.
None of the changes seem useful to *ring* due to previous oxidation.
This was already done in *ring*'s Rust code.
@briansmith briansmith self-assigned this Sep 17, 2023
@briansmith briansmith merged commit e48fdcf into main Sep 17, 2023
266 of 267 checks passed
@briansmith briansmith deleted the b/merge-boringssl-8 branch September 17, 2023 19:18
@briansmith briansmith restored the b/merge-boringssl-8 branch September 19, 2023 00:07
@briansmith briansmith deleted the b/merge-boringssl-8 branch October 6, 2023 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants