From 13c9d5c69d04485a7a8840c12185c832026c8315 Mon Sep 17 00:00:00 2001 From: David Benjamin Date: Wed, 1 Jun 2022 14:42:33 -0400 Subject: [PATCH 01/17] Always end BN_mod_exp_mont_consttime with normal Montgomery reduction. This partially fixes a bug where, on x86_64, BN_mod_exp_mont_consttime would sometimes return m, the modulus, when it should have returned zero. Thanks to Guido Vranken for reporting it. It is only a partial fix because the same bug also exists in the "rsaz" codepath. That will be fixed in the subsequent CL. (See the commented out test.) The bug only affects zero outputs (with non-zero inputs), so we believe it has no security impact on our cryptographic functions. BoringSSL calls BN_mod_exp_mont_consttime in the following cases: - RSA private key operations - Primality testing, raising the witness to the odd part of p-1 - DSA keygen and key import, pub = g^priv (mod p) - DSA signing, r = g^k (mod p) - DH keygen, pub = g^priv (mod p) - Diffie-Hellman, secret = peer^priv (mod p) It is not possible in the RSA private key operation, provided p and q are primes. If using CRT, we are working modulo a prime, so zero output with non-zero input is impossible. If not using CRT, we work mod n. While there are nilpotent values mod n, none of them hit zero by exponentiating. (Both p and q would need to divide the input, which means n divides the input.) In primality testing, this can only be hit when the input was composite. But as the rest of the loop cannot then hit 1, we'll correctly report it as composite anyway. DSA and DH work modulo a prime, where this case cannot happen. Analysis: This bug is the result of sloppiness with the looser bounds from "almost Montgomery multiplication", described in https://eprint.iacr.org/2011/239. Prior to upstream's ec9cc70f72454b8d4a84247c86159613cee83b81, I believe x86_64-mont5.pl implemented standard Montgomery reduction (the left half of figure 3 in the paper). Though it did not document this, ec9cc70f7245 changed it to implement the "almost" variant (the right half of the figure.) The difference is that, rather than subtracting if T >= m, it subtracts if T >= R. In code, it is the difference between something like our bn_reduce_once, vs. subtracting based only on T's carry bit. (Interestingly, the .Lmul_enter branch of bn_mul_mont_gather5 seems to still implement normal reduction, but the .Lmul4x_enter branch is an almost reduction.) That means none of the intermediate values here are bounded by m. They are only bounded by R. Accordingly, Figure 2 in the paper ends with step 10: REDUCE h modulo m. BN_mod_exp_mont_consttime is missing this step. The bn_from_montgomery call only implements step 9, AMM(h, 1). (x86_64-mont5.pl's bn_from_montgomery only implements an almost reduction.) The impact depends on how unreduced AMM(h, 1) can be. Remark 1 of the paper discusses this, but is ambiguous about the scope of its 2^(n-1) < m < 2^n precondition. The m+1 bound appears to be unconditional: Montgomery reduction ultimately adds some 0 <= Y < m*R to T, to get a multiple of R, and then divides by R. The output, pre-subtraction, is thus less than m + T/R. MM works because T < mR => T' < m + mR/R = 2m. A single subtraction of m if T' >= m gives T'' < m. AMM works because T < R^2 => T' < m + R^2/R = m + R. A single subtraction of m if T' >= R gives T'' < R. See also Lemma 1, Section 3 and Section 4 of the paper, though their formulation is more complicated to capture the word-by-word algorithm. It's ultimately the same adjustment to T. But in AMM(h, 1), T = h*1 = h < R, so AMM(h, 1) < m + R/R = m + 1. That is, AMM(h, 1) <= m. So the only case when AMM(h, 1) isn't fully reduced is if it outputs m. Thus, our limited impact. Indeed, Remark 1 mentions step 10 isn't necessary because m is a prime and the inputs are non-zero. But that doesn't apply here because BN_mod_exp_mont_consttime may be called elsewhere. Fix: To fix this, we could add the missing step 10, but a full division would not be constant-time. The analysis above says it could be a single subtraction, bn_reduce_once, but then we could integrate it into the subtraction already in plain Montgomery reduction, implemented by uppercase BN_from_montgomery. h*1 = h < R <= m*R, so we are within bounds. Thus, we delete lowercase bn_from_montgomery altogether, and have the mont5 path use the same BN_from_montgomery ending as the non-mont5 path. This only impacts the final step of the whole exponentiation and has no measurable perf impact. In doing so, add comments describing these looser bounds. This includes one subtlety that BN_mod_exp_mont_consttime actually mixes bn_mul_mont (MM) with bn_mul_mont_gather5/bn_power5 (AMM). But this is fine because MM is AMM-compatible; when passed AMM's looser inputs, it will still produce a correct looser output. Ideally we'd drop the "almost" reduction and stick to the more straightforward bounds. As this only impacts the final subtraction in each reduction, I would be surprised if it actually had a real performance impact. But this would involve deeper change to x86_64-mont5.pl, so I haven't tried this yet. I believe this is basically the same bug as https://github.com/golang/go/issues/13907 from Go. Change-Id: I06f879777bb2ef181e9da7632ec858582e2afa38 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52825 Commit-Queue: David Benjamin Reviewed-by: Adam Langley --- crypto/fipsmodule/bn/asm/x86_64-mont5.pl | 197 ----------------------- crypto/fipsmodule/bn/bn_test.cc | 9 -- crypto/fipsmodule/bn/bn_tests.txt | 70 ++++++++ crypto/fipsmodule/bn/exponentiation.c | 74 +++++---- crypto/fipsmodule/bn/internal.h | 19 ++- 5 files changed, 126 insertions(+), 243 deletions(-) diff --git a/crypto/fipsmodule/bn/asm/x86_64-mont5.pl b/crypto/fipsmodule/bn/asm/x86_64-mont5.pl index 54335cc9a8..012e7aa509 100755 --- a/crypto/fipsmodule/bn/asm/x86_64-mont5.pl +++ b/crypto/fipsmodule/bn/asm/x86_64-mont5.pl @@ -2088,194 +2088,6 @@ .size __bn_post4x_internal,.-__bn_post4x_internal ___ } -{ -$code.=<<___; -.globl bn_from_montgomery -.type bn_from_montgomery,\@abi-omnipotent -.align 32 -bn_from_montgomery: -.cfi_startproc - testl \$7,`($win64?"48(%rsp)":"%r9d")` - jz bn_from_mont8x - xor %eax,%eax - ret -.cfi_endproc -.size bn_from_montgomery,.-bn_from_montgomery - -.type bn_from_mont8x,\@function,6 -.align 32 -bn_from_mont8x: -.cfi_startproc - .byte 0x67 - mov %rsp,%rax -.cfi_def_cfa_register %rax - push %rbx -.cfi_push %rbx - push %rbp -.cfi_push %rbp - push %r12 -.cfi_push %r12 - push %r13 -.cfi_push %r13 - push %r14 -.cfi_push %r14 - push %r15 -.cfi_push %r15 -.Lfrom_prologue: - - shl \$3,${num}d # convert $num to bytes - lea ($num,$num,2),%r10 # 3*$num in bytes - neg $num - mov ($n0),$n0 # *n0 - - ############################################################## - # Ensure that stack frame doesn't alias with $rptr+3*$num - # modulo 4096, which covers ret[num], am[num] and n[num] - # (see bn_exp.c). The stack is allocated to aligned with - # bn_power5's frame, and as bn_from_montgomery happens to be - # last operation, we use the opportunity to cleanse it. - # - lea -320(%rsp,$num,2),%r11 - mov %rsp,%rbp - sub $rptr,%r11 - and \$4095,%r11 - cmp %r11,%r10 - jb .Lfrom_sp_alt - sub %r11,%rbp # align with $aptr - lea -320(%rbp,$num,2),%rbp # future alloca(frame+2*$num*8+256) - jmp .Lfrom_sp_done - -.align 32 -.Lfrom_sp_alt: - lea 4096-320(,$num,2),%r10 - lea -320(%rbp,$num,2),%rbp # future alloca(frame+2*$num*8+256) - sub %r10,%r11 - mov \$0,%r10 - cmovc %r10,%r11 - sub %r11,%rbp -.Lfrom_sp_done: - and \$-64,%rbp - mov %rsp,%r11 - sub %rbp,%r11 - and \$-4096,%r11 - lea (%rbp,%r11),%rsp - mov (%rsp),%r10 - cmp %rbp,%rsp - ja .Lfrom_page_walk - jmp .Lfrom_page_walk_done - -.Lfrom_page_walk: - lea -4096(%rsp),%rsp - mov (%rsp),%r10 - cmp %rbp,%rsp - ja .Lfrom_page_walk -.Lfrom_page_walk_done: - - mov $num,%r10 - neg $num - - ############################################################## - # Stack layout - # - # +0 saved $num, used in reduction section - # +8 &t[2*$num], used in reduction section - # +32 saved *n0 - # +40 saved %rsp - # +48 t[2*$num] - # - mov $n0, 32(%rsp) - mov %rax, 40(%rsp) # save original %rsp -.cfi_cfa_expression %rsp+40,deref,+8 -.Lfrom_body: - mov $num,%r11 - lea 48(%rsp),%rax - pxor %xmm0,%xmm0 - jmp .Lmul_by_1 - -.align 32 -.Lmul_by_1: - movdqu ($aptr),%xmm1 - movdqu 16($aptr),%xmm2 - movdqu 32($aptr),%xmm3 - movdqa %xmm0,(%rax,$num) - movdqu 48($aptr),%xmm4 - movdqa %xmm0,16(%rax,$num) - .byte 0x48,0x8d,0xb6,0x40,0x00,0x00,0x00 # lea 64($aptr),$aptr - movdqa %xmm1,(%rax) - movdqa %xmm0,32(%rax,$num) - movdqa %xmm2,16(%rax) - movdqa %xmm0,48(%rax,$num) - movdqa %xmm3,32(%rax) - movdqa %xmm4,48(%rax) - lea 64(%rax),%rax - sub \$64,%r11 - jnz .Lmul_by_1 - - movq $rptr,%xmm1 - movq $nptr,%xmm2 - .byte 0x67 - mov $nptr,%rbp - movq %r10, %xmm3 # -num -___ -$code.=<<___ if ($addx); - leaq OPENSSL_ia32cap_P(%rip),%r11 - mov 8(%r11),%r11d - and \$0x80108,%r11d - cmp \$0x80108,%r11d # check for AD*X+BMI2+BMI1 - jne .Lfrom_mont_nox - - lea (%rax,$num),$rptr - call __bn_sqrx8x_reduction - call __bn_postx4x_internal - - pxor %xmm0,%xmm0 - lea 48(%rsp),%rax - jmp .Lfrom_mont_zero - -.align 32 -.Lfrom_mont_nox: -___ -$code.=<<___; - call __bn_sqr8x_reduction - call __bn_post4x_internal - - pxor %xmm0,%xmm0 - lea 48(%rsp),%rax - jmp .Lfrom_mont_zero - -.align 32 -.Lfrom_mont_zero: - mov 40(%rsp),%rsi # restore %rsp -.cfi_def_cfa %rsi,8 - movdqa %xmm0,16*0(%rax) - movdqa %xmm0,16*1(%rax) - movdqa %xmm0,16*2(%rax) - movdqa %xmm0,16*3(%rax) - lea 16*4(%rax),%rax - sub \$32,$num - jnz .Lfrom_mont_zero - - mov \$1,%rax - mov -48(%rsi),%r15 -.cfi_restore %r15 - mov -40(%rsi),%r14 -.cfi_restore %r14 - mov -32(%rsi),%r13 -.cfi_restore %r13 - mov -24(%rsi),%r12 -.cfi_restore %r12 - mov -16(%rsi),%rbp -.cfi_restore %rbp - mov -8(%rsi),%rbx -.cfi_restore %rbx - lea (%rsi),%rsp -.cfi_def_cfa_register %rsp -.Lfrom_epilogue: - ret -.cfi_endproc -.size bn_from_mont8x,.-bn_from_mont8x -___ -} }}} if ($addx) {{{ @@ -3864,10 +3676,6 @@ .rva .LSEH_begin_bn_power5 .rva .LSEH_end_bn_power5 .rva .LSEH_info_bn_power5 - - .rva .LSEH_begin_bn_from_mont8x - .rva .LSEH_end_bn_from_mont8x - .rva .LSEH_info_bn_from_mont8x ___ $code.=<<___ if ($addx); .rva .LSEH_begin_bn_mulx4x_mont_gather5 @@ -3899,11 +3707,6 @@ .byte 9,0,0,0 .rva mul_handler .rva .Lpower5_prologue,.Lpower5_body,.Lpower5_epilogue # HandlerData[] -.align 8 -.LSEH_info_bn_from_mont8x: - .byte 9,0,0,0 - .rva mul_handler - .rva .Lfrom_prologue,.Lfrom_body,.Lfrom_epilogue # HandlerData[] ___ $code.=<<___ if ($addx); .align 8 diff --git a/crypto/fipsmodule/bn/bn_test.cc b/crypto/fipsmodule/bn/bn_test.cc index 7d578028d2..3bcdd630ad 100644 --- a/crypto/fipsmodule/bn/bn_test.cc +++ b/crypto/fipsmodule/bn/bn_test.cc @@ -2788,15 +2788,6 @@ TEST_F(BNTest, BNMulMont5ABI) { words, 13); CHECK_ABI(bn_power5, r.data(), a.data(), table.data(), m->d, mont->n0, words, 13); - EXPECT_EQ(1, CHECK_ABI(bn_from_montgomery, r.data(), r.data(), nullptr, - m->d, mont->n0, words)); - EXPECT_EQ(1, CHECK_ABI(bn_from_montgomery, r.data(), a.data(), nullptr, - m->d, mont->n0, words)); - } else { - EXPECT_EQ(0, CHECK_ABI(bn_from_montgomery, r.data(), r.data(), nullptr, - m->d, mont->n0, words)); - EXPECT_EQ(0, CHECK_ABI(bn_from_montgomery, r.data(), a.data(), nullptr, - m->d, mont->n0, words)); } } } diff --git a/crypto/fipsmodule/bn/bn_tests.txt b/crypto/fipsmodule/bn/bn_tests.txt index 334ef8b687..4bc1fa2c59 100644 --- a/crypto/fipsmodule/bn/bn_tests.txt +++ b/crypto/fipsmodule/bn/bn_tests.txt @@ -10604,6 +10604,76 @@ A = 95564994a96c45954227b845a1e99cb939d5a1da99ee91acc962396ae999a9ee38603790448f E = f95dc0f980fbd22e90caa5a387cc4a369f3f830d50dd321c40db8c09a7e1a241a536e096622d3280c0c1ba849c1f4a79bf490f60006d081e8cf69960189f0d312cd9e17073a3fba7881b21474a13b334116cb2f5dbf3189a6de3515d0840f053c776d3982d391b6d04d642dda5cc6d1640174c09875addb70595658f89efb439dc6fbd55f903aadd307982d3f659207f265e1ec6271b274521b7a5e28e8fd7a55df089292820477802a43cf5b6b94e999e8c9944ddebb0d0e95a60f88cb7e813ba110d20e1024774107dd02949031864923b3cb8c3f7250d6d1287b0a40db6a47bd5a469518eb65aa207ddc47d8c6e5fc8e0c105be8fc1d4b57b2e27540471d5 M = fef15d5ce4625f1bccfbba49fc8439c72bf8202af039a2259678941b60bb4a8f2987e965d58fd8cf86a856674d519763d0e1211cc9f8596971050d56d9b35db3785866cfbca17cfdbed6060be3629d894f924a89fdc1efc624f80d41a22f19009503fcc3824ef62ccb9208430c26f2d8ceb2c63488ec4c07437aa4c96c43dd8b9289ed00a712ff66ee195dc71f5e4ead02172b63c543d69baf495f5fd63ba7bcc633bd309c016e37736da92129d0b053d4ab28d21ad7d8b6fab2a8bbdc8ee647d2fbcf2cf426cf892e6f5639e0252993965dfb73ccd277407014ea784aaa280cb7b03972bc8b0baa72360bdb44b82415b86b2f260f877791cd33ba8f2d65229b +# The following inputs trigger an edge case between Montgomery reduction and the +# "almost" reduction variant from https://eprint.iacr.org/2011/239 + +ModExp = 00 +A = 19c7bc9b97c6083cd7b8d1cd001452c9b67983247169c6532047eb7fc8933014dbf69fee7a358769f1429802c8ea89d4f9ca6ba6f368fbdb1fa5717b4a00 +E = bbc7e09147408571050e8d0c634682c5863b7e8a573626648902cff12e590c74f5a23ecce39732266bc15b8afbd6c48a48c83fbdc33947515cc0b6e4fb98ae2cd730e58f951fec8be7e2e3c74f4506c7fd7e29bdb28675fe8a59789ab1148e931a2ebd2d36f78bc241682a3d8083d8ff538858cd240c5a693936e5a391dc9d77118062a3f868c058440a4192267faaaba91112f45eee5842060febbf9353a6d3e7f7996573209136a5506062ea23d74067f08c613f3ff74bade25f8c3368e6dba84eae672eac11be1137fc514924fcab8c82e46d092bd047dcbadaa48c67a096ec1a04f392a8511e6acbad9954949b703e71ff837337b594055ae6f3c0fc154447a687c9ac8a2cdfd64a2e680c6ff21254735af7f5eb6b43f0bce86bda55a04143a991711081435ed4f4a89b23fc3a588022b7a8543db4bf5c8ac93603367c750ff2191f59a716340fab49bb7544759c8d846465eec1438e76395f73e7b5e945f31f1b87fefa854a0d208846eaab5fa27144fd039911608bab0eaee80f1d3553dfa2d9ba95268479b97a059613660df5ad79796e0b272244aca90ccc13449ec15c206eeed7b60405a4c5cfdf5da5d136c27fa9385d810ad198dfe794ffce9955e10520efea1e2eb794e379401b9affd863b9566ce941c4726755574a1b1946acf0090bfb93f37dd55f524485bbba7fa84b53addfde01ae1de9c57fe50d4b708dd0fa45d02af398b3d05c6d17f84c11e9aacdbe0b146cad6ddbd877731e26a17f3ebed459560d12ed7a6abc2ea6fe922e69d2622ef11b6b245b9ba8f0940faaa671a4beb727be5393a94dafaeff7221b29183e7418f4c5bb95a6a586c93dbc8ce0236d9dbe26c40513611b4141fed66599adbfb20fc30e09a4815e4159f65a6708f34584a7a77b3843941cd61a6917dcc3d07a3dfb5a2cb108bacea7e782f2111b4d22ecaaeff469ecd0da371df1ac5e9bf6df6ccba2d3a9f393d597499eaca2c206bfb81c3426c5fe45bcf16e38aecd246a319a1f37041c638b75a4839517e43a6d01bee7d85eaeedbce13cd15699d3ee42c7414cfed576590e4fb6ddb6edd3e1957efaf039bfe8b9dc75869b1f93abff15cae8b234161070fa3542303c2ed35ca66083d0ac299b81182317a2a3985269602b1fa1e822fcbda48e686d80b273f06b0a702ca7f42cbbbd2fc2b3601422c8bff6302eda3c61b293049636002649b16f3c1f0be2b6599d66493a4497cd795b10a2ab8220fafad24fa90e1bfcf39ecce337e705695c7a224bf9f445a287d6aab221341659ca4be7861f6ac4c9d33dac811e6 +M = 519b6e57781d40d897ec0c1b648d195526726b295438c9a70928ac25979563d72db91c8c42298a33b572edecdf40904c68a23337aa5341b56e92b0da5041 + +# To fully exercise BN_mod_exp_mont_consttime codepaths, we generate inputs at +# different bitwidths. rsaz-avx2.pl only runs at 1024-bit moduli, and +# x86_64-mont5.pl unrolls 8 64-bit words at a time, so we want to capture both +# multiples of 512- and non-multiples. Also include moduli that are not quite a +# full word. + +# 512-bit +ModExp = 00 +A = 8000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000e +E = ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff +M = 8f42c9e9e351ba9b32ab0cf69da43f4acf7028d19cff6e5059ea0e3fcc97c97f36a31470044737d4c0c933ac441ecb29e32c81401523afdac7de9c3fd8493c97 + +# 1024-bit +# TODO(davidben): This test breaks the RSAZ implementation. Fix it and enable +# this test. +# ModExp = 00 +# A = 800000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002f +# E = ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff +# M = 9da8dc26fdf4d2e49833b240ee552beb7a6e251caa91bfb5d6cafaf8ed9461877fda8f6ac299036d35806bc1ae7872e54eaac1ec6bee6d02c6621a9cf8883b3abc33c49b3e601203e0e86ef8f0562412cc689ee2670704583909ca6d7774c9f9f9f4d77d37fedef9cb51d207cb629ec02fa03b526fd6594bfa8f2da71238a0b7 + +# 1025-bit +ModExp = 00 +A = 010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011 +E = ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff +M = 010223abfdda02e84e11cec8ee7fc784fa135733935f7b9054bb70f1f06d234d76dcf3beed55c7f39e955dc1fef2b65009240fd02f7a1b27a78fc2867144bf666efb929856db9f671c356c4c67a068a70fe83c52eebda03668872fd270d0794f0771d217fb6b93b12529a944f7f0496a9158757c55b8ee14f803f1d2d887e2f561 + +# 1088-bit +ModExp = 00 +A = 8000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000003d +E = ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff +M = e91f6d748773cb212a23aa348125615123b1800c9ea222c9374c757702ae4140fa333790ed8f6bf60a1d7dda65c2767cc5f33e32e333d19fbfb5a2b85795757c9ca070268763a618e9d33873d28a89bf88acd209efbb15b80cd33b92a6b3a682e1c91782fc24fb86ddff4f809219c977b54b99359094bbcc51dfe17b992ab24b74a17950ad754281 + +# 1472-bit +ModExp = 00 +A = 8000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001d +E = ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff +M = a8770362f4bfe4fc1ab0e52705c11a9b6ba235d5a5f22197c2d68e27ed18426ede3316af706aa79bcf943dbd51459eb15ae1f9386216b3f3a847f94440a65b97659bc5ba2adb67173714ecaa886c0b926d7a64ea45576f9d2171784ce7e801724d5b0abfd93357d538ea7ad3ad89a74f4660bdb66dfb5f684dcf00402e3cdf0ab58afd867c943c8f47b80268a789456aa7c50a619dd2f9f5e3f74b5d810f0f8dadbf4ad5b917cdcb156c4c132611c8b3b035118a9e03551f + +# 1536-bit +ModExp = 00 +A = 800000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002 +E = ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff +M = 878cd000778f927b2f1a4b8bac86efd282079a7ac0d25e09ffd2f72fbc282e65e233929d2457c7b1d63c56fb706cdfa04fb87e654c578c98d7cf59c2293dc5641086b68db4867105981daaf147a0ee91f6932ef064deae4142c19e58d50c0686f0eaf778be72450f89a98b4680bbc5ffab942195e44dd20616150fd1deca058068ca31ab2f861e99082588f17a2025bf5e536150142fca3187a259c791fc721430f24d7e338f8dc02e693a7e694d42775e80f7f7c03600b6ae86b4aba2b0e991 + +# 2048-bit +ModExp = 00 +A = 8000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000f +E = ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff +M = 9f40a7535c561208ecb38e17c9336d9bc8484d335901b2cd42759cf03689227f6992f10cb6b586d767fbcdf30e9d82a0eda60d2694ccd0194fa96b50b56e0cdeec1951ea9e58b07e334a7f108841a0ab28256917fecea561388807ed124a17386a7a7b501f9cbf3404247a76948d0561e48137d3f9669e36f175731796aeaf78851f7d866917f661422186a4814aa35c066b5a90b9cfc918af769a9f0bb30c12581027df64ac328a0f07dbd20adb704479f6d0f233a131828c71bab19c3c34795ea4fb68aa632c6f688e5b3b84413c9031d8dc251003a590dec0dd09bfa6109ed4570701439b6f265b84ac2170c317357b5fbe5535e2bbdd93c1aacfdaa28c85 + +# 3072-bit +ModExp = 00 +A = 80000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001d +E = ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff +M = c23dfd244a58a668d514498a705c8f8f548311b24f0f98b023d2d33632534c2ae948d6641d41fd7a29fbbd594bfc7fdd6e8162cbb3056af3075347b6fc8876458d33a9d0ffdbcdf482de0c73d1310fd8fa8f9f92dd0dbb0e2034e98a30f6c11b482f7476c5b593f673a322b1130daa4314e9074270dce1076436f0d56cf196afcbb235a9a7b3ac85b9062e85fc0e63a12c468c787019f6805f9faab64fc6a0babc80785d88740243f11366bffb40ccbe8b2bb7a99a2c8238a6f656bb0117d7b2602aa400f4d77de5f93c673f13264ca70de949454e3e3f261993c1aa427e8ef4f507af744f71f3b4aaf3c981d44cc1bfb1eb1151168762b242b740573df698e500d99612e17dc760f7b3bf7c235e39e81ad7edbe6c07dbb8b139745bb394d61cb799bcafec5de074932b0b2d74797e779ac8d81f63a2b2e9baa229dfaa7f90f34ffade1d2ad022a3407d35eb2d7477c6ae8ad100f6e95c05b4f947c1fabfb11a17add384e6b4cd3a02fd9b43f46805c6c74e366b74aa3b766be7a5fbbd67fa81 + +# 4096-bit +ModExp = 00 +A = 8000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001 +E = ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff +M = 8030411ecbddcb0fe4e76fd6b5bf542e8b015d1610cf96130ded12ba2cda0641bd9692080f218ea8b0d751845b519d95b843542ec8d2a07f1f93afe3189b69a4f35c983011c7f7928c3df458cc3eae85c36e6934a4b1bc0a67c8a521de336642c49e10a7ffa8d0af911aacc19e3900449161940f139220e099a150dcaf0ff96ffff6e726c1ac139969103cf6a828ac3adf0301506aa02787b4f570d5dde53a34acab8fec6fa94760abf16ee99954371ad65a6e899daab87b95811d069404991de9abe064ebbddf886e970f10d260c899dda940191a82d4c8bd36651363aff5493f4f59e700007dcadf37ebea7fcfd7600d16617ffea0d9ae659446d851d93c564e50e558f734c894d735fa273770703dab62844d9f01badf632f3d14a00f739c022c9be95f54e9cea46ec6da7cb11f4602e06962951c48204726b7f120ddbd0eb3566dc8d1e6f195a9196e96db33322d088b43aecffe9b4df182dd016aca0bd14f1c56cd1a18b89165c027029862b09ffd78e92ab614349c4fd67f49cb12cd33d0728930d0538bda57acef1365a73cc8fbac7d463b9e3c3bae0bb6224b080cdb8b5cd47d546d53111fdc22b7ff679bcfe27192920ee163b2be337d8cccc93b4de7d2d31934b9c0e97af291dcc1135b4a473bd37114eec3ba75c411887b57799d3188e7353f33a4d31735ebfc9fcfc044985148dd96da3876a5ab7ea7a404b411 + # RSAZ 512-bit. # diff --git a/crypto/fipsmodule/bn/exponentiation.c b/crypto/fipsmodule/bn/exponentiation.c index f5f9dc2b31..38013ed6f6 100644 --- a/crypto/fipsmodule/bn/exponentiation.c +++ b/crypto/fipsmodule/bn/exponentiation.c @@ -979,13 +979,13 @@ int BN_mod_exp_mont_consttime(BIGNUM *rr, const BIGNUM *a, const BIGNUM *p, #if defined(OPENSSL_BN_ASM_MONT5) if (window >= 5) { window = 5; // ~5% improvement for RSA2048 sign, and even for RSA4096 - // reserve space for mont->N.d[] copy + // Reserve space for the |mont->N| copy. powerbufLen += top * sizeof(mont->N.d[0]); } #endif // Allocate a buffer large enough to hold all of the pre-computed - // powers of am, am itself and tmp. + // powers of |am|, |am| itself, and |tmp|. numPowers = 1 << window; powerbufLen += sizeof(m->d[0]) * @@ -1008,7 +1008,7 @@ int BN_mod_exp_mont_consttime(BIGNUM *rr, const BIGNUM *a, const BIGNUM *p, } OPENSSL_memset(powerbuf, 0, powerbufLen); - // lay down tmp and am right after powers table + // Place |tmp| and |am| right after powers table. tmp.d = powerbuf + top * numPowers; am.d = tmp.d + top; tmp.width = am.width = 0; @@ -1028,18 +1028,26 @@ int BN_mod_exp_mont_consttime(BIGNUM *rr, const BIGNUM *a, const BIGNUM *p, } #if defined(OPENSSL_BN_ASM_MONT5) - // This optimization uses ideas from http://eprint.iacr.org/2011/239, - // specifically optimization of cache-timing attack countermeasures - // and pre-computation optimization. - - // Dedicated window==4 case improves 512-bit RSA sign by ~15%, but as - // 512-bit RSA is hardly relevant, we omit it to spare size... + // This optimization uses ideas from https://eprint.iacr.org/2011/239, + // specifically optimization of cache-timing attack countermeasures, + // pre-computation optimization, and Almost Montgomery Multiplication. + // + // The paper discusses a 4-bit window to optimize 512-bit modular + // exponentiation, used in RSA-1024 with CRT, but RSA-1024 is no longer + // important. + // + // |bn_mul_mont_gather5| and |bn_power5| implement the "almost" reduction + // variant, so the values here may not be fully reduced. They are bounded by R + // (i.e. they fit in |top| words), not |m|. Additionally, we pass these + // "almost" reduced inputs into |bn_mul_mont|, which implements the normal + // reduction variant. Given those inputs, |bn_mul_mont| may not give reduced + // output, but it will still produce "almost" reduced output. + // + // TODO(davidben): Using "almost" reduction complicates analysis of this code, + // and its interaction with other parts of the project. Determine whether this + // is actually necessary for performance. if (window == 5 && top > 1) { - const BN_ULONG *n0 = mont->n0; - BN_ULONG *np; - - // BN_to_montgomery can contaminate words above .top - // [in BN_DEBUG[_DEBUG] build]... + // Ensure |am| and |tmp| are padded to the right width. for (i = am.width; i < top; i++) { am.d[i] = 0; } @@ -1047,30 +1055,34 @@ int BN_mod_exp_mont_consttime(BIGNUM *rr, const BIGNUM *a, const BIGNUM *p, tmp.d[i] = 0; } - // copy mont->N.d[] to improve cache locality - for (np = am.d + top, i = 0; i < top; i++) { + // Copy |mont->N| to improve cache locality. + BN_ULONG *np = am.d + top; + for (i = 0; i < top; i++) { np[i] = mont->N.d[i]; } + // Fill |powerbuf| with the first 32 powers of |am|. + const BN_ULONG *n0 = mont->n0; bn_scatter5(tmp.d, top, powerbuf, 0); bn_scatter5(am.d, am.width, powerbuf, 1); bn_mul_mont(tmp.d, am.d, am.d, np, n0, top); bn_scatter5(tmp.d, top, powerbuf, 2); - // same as above, but uses squaring for 1/2 of operations + // Square to compute powers of two. for (i = 4; i < 32; i *= 2) { bn_mul_mont(tmp.d, tmp.d, tmp.d, np, n0, top); bn_scatter5(tmp.d, top, powerbuf, i); } + // Compute odd powers |i| based on |i - 1|, then all powers |i * 2^j|. for (i = 3; i < 8; i += 2) { - int j; bn_mul_mont_gather5(tmp.d, am.d, powerbuf, np, n0, top, i - 1); bn_scatter5(tmp.d, top, powerbuf, i); - for (j = 2 * i; j < 32; j *= 2) { + for (int j = 2 * i; j < 32; j *= 2) { bn_mul_mont(tmp.d, tmp.d, tmp.d, np, n0, top); bn_scatter5(tmp.d, top, powerbuf, j); } } + // These two loops are the above with the inner loop unrolled. for (; i < 16; i += 2) { bn_mul_mont_gather5(tmp.d, am.d, powerbuf, np, n0, top, i - 1); bn_scatter5(tmp.d, top, powerbuf, i); @@ -1137,15 +1149,15 @@ int BN_mod_exp_mont_consttime(BIGNUM *rr, const BIGNUM *a, const BIGNUM *p, bn_power5(tmp.d, tmp.d, powerbuf, np, n0, top, val); } } - - ret = bn_from_montgomery(tmp.d, tmp.d, NULL, np, n0, top); - tmp.width = top; - if (ret) { - if (!BN_copy(rr, &tmp)) { - ret = 0; - } - goto err; // non-zero ret means it's not error - } + // The result is now in |tmp| in Montgomery form, but it may not be fully + // reduced. This is within bounds for |BN_from_montgomery| (tmp < R <= m*R) + // so it will, when converting from Montgomery form, produce a fully reduced + // result. + // + // This differs from Figure 2 of the paper, which uses AMM(h, 1) to convert + // from Montgomery form with unreduced output, followed by an extra + // reduction step. In the paper's terminology, we replace steps 9 and 10 + // with MM(h, 1). } else #endif { @@ -1206,7 +1218,11 @@ int BN_mod_exp_mont_consttime(BIGNUM *rr, const BIGNUM *a, const BIGNUM *p, } } - // Convert the final result from montgomery to standard format + // Convert the final result from Montgomery to standard format. If we used the + // |OPENSSL_BN_ASM_MONT5| codepath, |tmp| may not be fully reduced. It is only + // bounded by R rather than |m|. However, that is still within bounds for + // |BN_from_montgomery|, which implements full Montgomery reduction, not + // "almost" Montgomery reduction. if (!BN_from_montgomery(rr, &tmp, mont, ctx)) { goto err; } diff --git a/crypto/fipsmodule/bn/internal.h b/crypto/fipsmodule/bn/internal.h index d6eb684028..79b25b043b 100644 --- a/crypto/fipsmodule/bn/internal.h +++ b/crypto/fipsmodule/bn/internal.h @@ -353,6 +353,9 @@ int bn_rand_secret_range(BIGNUM *r, int *out_is_uniform, BN_ULONG min_inclusive, // corresponding field in |BN_MONT_CTX|. It returns one if |bn_mul_mont| handles // inputs of this size and zero otherwise. // +// If at least one of |ap| or |bp| is fully reduced, |rp| will be fully reduced. +// If neither is fully-reduced, the output may not be either. +// // TODO(davidben): The x86_64 implementation expects a 32-bit input and masks // off upper bits. The aarch64 implementation expects a 64-bit input and does // not. |size_t| is the safer option but not strictly correct for x86_64. But @@ -372,6 +375,10 @@ int bn_mul_mont(BN_ULONG *rp, const BN_ULONG *ap, const BN_ULONG *bp, // by |ap| modulo |np|, and stores the result in |rp|. The values are |num| // words long and represented in Montgomery form. |n0| is a pointer to the // corresponding field in |BN_MONT_CTX|. +// +// WARNING: This function implements Almost Montgomery Multiplication from +// https://eprint.iacr.org/2011/239. The inputs do not need to be fully reduced. +// However, even if they are fully reduced, the output may not be. void bn_mul_mont_gather5(BN_ULONG *rp, const BN_ULONG *ap, const BN_ULONG *table, const BN_ULONG *np, const BN_ULONG *n0, int num, int power); @@ -391,16 +398,12 @@ void bn_gather5(BN_ULONG *out, size_t num, BN_ULONG *table, size_t power); // values are |num| words long and represented in Montgomery form. |n0| is a // pointer to the corresponding field in |BN_MONT_CTX|. |num| must be divisible // by 8. +// +// WARNING: This function implements Almost Montgomery Multiplication from +// https://eprint.iacr.org/2011/239. The inputs do not need to be fully reduced. +// However, even if they are fully reduced, the output may not be. void bn_power5(BN_ULONG *rp, const BN_ULONG *ap, const BN_ULONG *table, const BN_ULONG *np, const BN_ULONG *n0, int num, int power); - -// bn_from_montgomery converts |ap| from Montgomery form modulo |np| and writes -// the result in |rp|, each of which is |num| words long. It returns one on -// success and zero if it cannot handle inputs of length |num|. |n0| is a -// pointer to the corresponding field in |BN_MONT_CTX|. -int bn_from_montgomery(BN_ULONG *rp, const BN_ULONG *ap, - const BN_ULONG *not_used, const BN_ULONG *np, - const BN_ULONG *n0, int num); #endif // !OPENSSL_NO_ASM && OPENSSL_X86_64 uint64_t bn_mont_n0(const BIGNUM *n); From 801a801024febe1a33add5ddaa719e257d97aba5 Mon Sep 17 00:00:00 2001 From: David Benjamin Date: Fri, 3 Jun 2022 14:51:45 -0400 Subject: [PATCH 02/17] Add an extra reduction step to the end of RSAZ. RSAZ has a very similar bug to mont5 from https://boringssl-review.googlesource.com/c/boringssl/+/52825 and may return the modulus when it should return zero. As in that CL, there is no security impact on our cryptographic primitives. RSAZ is described in the paper "Software Implementation of Modular Exponentiation, Using Advanced Vector Instructions Architectures". The bug comes from RSAZ's use of "NRMM" or "Non Reduced Montgomery Multiplication". This is like normal Montgomery multiplication, but skips the final subtraction altogether (whereas mont5's AMM still subtracts, but replaces MM's tigher bound with just the carry bit). This would normally not be stable, but RSAZ picks a larger R > 4M, and maintains looser bounds for modular arithmetic, a < 2M. Lemma 1 from the paper proves that NRMM(a, b) preserves this 2M bound. It also claims NRMM(a, 1) < M. That is, conversion out of Montgomery form with NRMM is fully reduced. This second claim is wrong. The proof shows that NRMM(a, 1) < 1/2 + M, which only implies NRMM(a, 1) <= M, not NRMM(a, 1) < M. RSAZ relies on this to produce a reduced output (see Figure 7 in the paper). Thus, like mont5 with AMM, RSAZ may return the modulus when it should return zero. Fix this by adding a bn_reduce_once_in_place call at the end of the operation. Change-Id: If28bc49ae8dfbfb43bea02af5ea10c4209a1c6e6 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52827 Reviewed-by: Adam Langley Commit-Queue: David Benjamin --- crypto/fipsmodule/bn/bn_tests.txt | 10 ++++------ crypto/fipsmodule/bn/rsaz_exp.c | 2 ++ crypto/fipsmodule/bn/rsaz_exp.h | 6 +++++- 3 files changed, 11 insertions(+), 7 deletions(-) diff --git a/crypto/fipsmodule/bn/bn_tests.txt b/crypto/fipsmodule/bn/bn_tests.txt index 4bc1fa2c59..9a1a5db929 100644 --- a/crypto/fipsmodule/bn/bn_tests.txt +++ b/crypto/fipsmodule/bn/bn_tests.txt @@ -10625,12 +10625,10 @@ E = ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff M = 8f42c9e9e351ba9b32ab0cf69da43f4acf7028d19cff6e5059ea0e3fcc97c97f36a31470044737d4c0c933ac441ecb29e32c81401523afdac7de9c3fd8493c97 # 1024-bit -# TODO(davidben): This test breaks the RSAZ implementation. Fix it and enable -# this test. -# ModExp = 00 -# A = 800000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002f -# E = ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff -# M = 9da8dc26fdf4d2e49833b240ee552beb7a6e251caa91bfb5d6cafaf8ed9461877fda8f6ac299036d35806bc1ae7872e54eaac1ec6bee6d02c6621a9cf8883b3abc33c49b3e601203e0e86ef8f0562412cc689ee2670704583909ca6d7774c9f9f9f4d77d37fedef9cb51d207cb629ec02fa03b526fd6594bfa8f2da71238a0b7 +ModExp = 00 +A = 800000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002f +E = ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff +M = 9da8dc26fdf4d2e49833b240ee552beb7a6e251caa91bfb5d6cafaf8ed9461877fda8f6ac299036d35806bc1ae7872e54eaac1ec6bee6d02c6621a9cf8883b3abc33c49b3e601203e0e86ef8f0562412cc689ee2670704583909ca6d7774c9f9f9f4d77d37fedef9cb51d207cb629ec02fa03b526fd6594bfa8f2da71238a0b7 # 1025-bit ModExp = 00 diff --git a/crypto/fipsmodule/bn/rsaz_exp.c b/crypto/fipsmodule/bn/rsaz_exp.c index 7e15aaf948..074f05d895 100644 --- a/crypto/fipsmodule/bn/rsaz_exp.c +++ b/crypto/fipsmodule/bn/rsaz_exp.c @@ -219,6 +219,8 @@ void RSAZ_1024_mod_exp_avx2(BN_ULONG result_norm[16], rsaz_1024_mul_avx2(result, result, one, m, k0); rsaz_1024_red2norm_avx2(result_norm, result); + BN_ULONG scratch[16]; + bn_reduce_once_in_place(result_norm, /*carry=*/0, m_norm, scratch, 16); OPENSSL_cleanse(storage, MOD_EXP_CTIME_STORAGE_LEN * sizeof(BN_ULONG)); } diff --git a/crypto/fipsmodule/bn/rsaz_exp.h b/crypto/fipsmodule/bn/rsaz_exp.h index 104bb7a41c..bc7a439e75 100644 --- a/crypto/fipsmodule/bn/rsaz_exp.h +++ b/crypto/fipsmodule/bn/rsaz_exp.h @@ -90,7 +90,11 @@ void rsaz_1024_gather5_avx2(BN_ULONG val[40], const BN_ULONG tbl[32 * 18], int i); // rsaz_1024_red2norm_avx2 converts |red| from RSAZ to |BIGNUM| representation -// and writes the result to |norm|. +// and writes the result to |norm|. The result will be <= the modulus. +// +// WARNING: The result of this operation may not be fully reduced. |norm| may be +// the modulus instead of zero. This function should be followed by a call to +// |bn_reduce_once|. void rsaz_1024_red2norm_avx2(BN_ULONG norm[16], const BN_ULONG red[40]); From c7de4fe0bd28f61d6eb62023c08d2605f2955285 Mon Sep 17 00:00:00 2001 From: David Benjamin Date: Wed, 1 Jun 2022 16:28:42 -0400 Subject: [PATCH 03/17] Simplify mont5 table computation. The unrolled loops appear to have negligible perf impact: Before: Did 18480 RSA 2048 signing operations in 10005085us (1847.1 ops/sec) Did 2720 RSA 4096 signing operations in 10056337us (270.5 ops/sec) After: Did 18480 RSA 2048 signing operations in 10012218us (1845.7 ops/sec) [-0.1%] Did 2700 RSA 4096 signing operations in 10003972us (269.9 ops/sec) [-0.2%] Change-Id: I29073c373a03a9798f6e04016626e6ab910e893a Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52826 Commit-Queue: David Benjamin Reviewed-by: Adam Langley --- crypto/fipsmodule/bn/exponentiation.c | 13 +------------ 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/crypto/fipsmodule/bn/exponentiation.c b/crypto/fipsmodule/bn/exponentiation.c index 38013ed6f6..e2e0d12c17 100644 --- a/crypto/fipsmodule/bn/exponentiation.c +++ b/crypto/fipsmodule/bn/exponentiation.c @@ -1074,7 +1074,7 @@ int BN_mod_exp_mont_consttime(BIGNUM *rr, const BIGNUM *a, const BIGNUM *p, bn_scatter5(tmp.d, top, powerbuf, i); } // Compute odd powers |i| based on |i - 1|, then all powers |i * 2^j|. - for (i = 3; i < 8; i += 2) { + for (i = 3; i < 32; i += 2) { bn_mul_mont_gather5(tmp.d, am.d, powerbuf, np, n0, top, i - 1); bn_scatter5(tmp.d, top, powerbuf, i); for (int j = 2 * i; j < 32; j *= 2) { @@ -1082,17 +1082,6 @@ int BN_mod_exp_mont_consttime(BIGNUM *rr, const BIGNUM *a, const BIGNUM *p, bn_scatter5(tmp.d, top, powerbuf, j); } } - // These two loops are the above with the inner loop unrolled. - for (; i < 16; i += 2) { - bn_mul_mont_gather5(tmp.d, am.d, powerbuf, np, n0, top, i - 1); - bn_scatter5(tmp.d, top, powerbuf, i); - bn_mul_mont(tmp.d, tmp.d, tmp.d, np, n0, top); - bn_scatter5(tmp.d, top, powerbuf, 2 * i); - } - for (; i < 32; i += 2) { - bn_mul_mont_gather5(tmp.d, am.d, powerbuf, np, n0, top, i - 1); - bn_scatter5(tmp.d, top, powerbuf, i); - } bits--; for (wvalue = 0, i = bits % 5; i >= 0; i--, bits--) { From b8a651439b1848fec42e2d27037de3a31d1f3e2f Mon Sep 17 00:00:00 2001 From: David Benjamin Date: Fri, 3 Jun 2022 15:27:33 -0400 Subject: [PATCH 04/17] Align rsaz and mont5 table construction. Both implementations need to compute the first 32 powers of a. There's a commented out naive version in rsaz_exp.c that claims to be smaller, but 1% slower. (It doesn't use squares when it otherwise could.) Instead, we can write out the square-based strategy as a loop. (I wasn't able to measure a difference between any of the three versions, but this one's compact enough and does let us square more and gather5 less.) Change-Id: I7015f2a78584cd97f29b54d0007479bdcc3a01ba Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52828 Reviewed-by: Adam Langley Commit-Queue: David Benjamin --- crypto/fipsmodule/bn/rsaz_exp.c | 119 ++++---------------------------- 1 file changed, 13 insertions(+), 106 deletions(-) diff --git a/crypto/fipsmodule/bn/rsaz_exp.c b/crypto/fipsmodule/bn/rsaz_exp.c index 074f05d895..f4e50a6e75 100644 --- a/crypto/fipsmodule/bn/rsaz_exp.c +++ b/crypto/fipsmodule/bn/rsaz_exp.c @@ -66,23 +66,14 @@ void RSAZ_1024_mod_exp_avx2(BN_ULONG result_norm[16], // R2 = 2^3052 * 2^80 / 2^1044 = 2^2088 = (2^1044)^2 // table[0] = 1 - rsaz_1024_mul_avx2(result, R2, one, m, k0); // table[1] = a_inv^1 + rsaz_1024_mul_avx2(result, R2, one, m, k0); rsaz_1024_mul_avx2(a_inv, a_inv, R2, m, k0); - rsaz_1024_scatter5_avx2(table_s, result, 0); rsaz_1024_scatter5_avx2(table_s, a_inv, 1); - // table[2] = a_inv^2 rsaz_1024_sqr_avx2(result, a_inv, m, k0, 1); rsaz_1024_scatter5_avx2(table_s, result, 2); -#if 0 - // This is almost 2x smaller and less than 1% slower. - for (int index = 3; index < 32; index++) { - rsaz_1024_mul_avx2(result, result, a_inv, m, k0); - rsaz_1024_scatter5_avx2(table_s, result, index); - } -#else // table[4] = a_inv^4 rsaz_1024_sqr_avx2(result, result, m, k0, 1); rsaz_1024_scatter5_avx2(table_s, result, 4); @@ -92,109 +83,25 @@ void RSAZ_1024_mod_exp_avx2(BN_ULONG result_norm[16], // table[16] = a_inv^16 rsaz_1024_sqr_avx2(result, result, m, k0, 1); rsaz_1024_scatter5_avx2(table_s, result, 16); - // table[17] = a_inv^17 - rsaz_1024_mul_avx2(result, result, a_inv, m, k0); - rsaz_1024_scatter5_avx2(table_s, result, 17); - - // table[3] - rsaz_1024_gather5_avx2(result, table_s, 2); - rsaz_1024_mul_avx2(result, result, a_inv, m, k0); - rsaz_1024_scatter5_avx2(table_s, result, 3); - // table[6] - rsaz_1024_sqr_avx2(result, result, m, k0, 1); - rsaz_1024_scatter5_avx2(table_s, result, 6); - // table[12] - rsaz_1024_sqr_avx2(result, result, m, k0, 1); - rsaz_1024_scatter5_avx2(table_s, result, 12); - // table[24] - rsaz_1024_sqr_avx2(result, result, m, k0, 1); - rsaz_1024_scatter5_avx2(table_s, result, 24); - // table[25] - rsaz_1024_mul_avx2(result, result, a_inv, m, k0); - rsaz_1024_scatter5_avx2(table_s, result, 25); - - // table[5] - rsaz_1024_gather5_avx2(result, table_s, 4); - rsaz_1024_mul_avx2(result, result, a_inv, m, k0); - rsaz_1024_scatter5_avx2(table_s, result, 5); - // table[10] - rsaz_1024_sqr_avx2(result, result, m, k0, 1); - rsaz_1024_scatter5_avx2(table_s, result, 10); - // table[20] - rsaz_1024_sqr_avx2(result, result, m, k0, 1); - rsaz_1024_scatter5_avx2(table_s, result, 20); - // table[21] - rsaz_1024_mul_avx2(result, result, a_inv, m, k0); - rsaz_1024_scatter5_avx2(table_s, result, 21); - - // table[7] - rsaz_1024_gather5_avx2(result, table_s, 6); - rsaz_1024_mul_avx2(result, result, a_inv, m, k0); - rsaz_1024_scatter5_avx2(table_s, result, 7); - // table[14] - rsaz_1024_sqr_avx2(result, result, m, k0, 1); - rsaz_1024_scatter5_avx2(table_s, result, 14); - // table[28] - rsaz_1024_sqr_avx2(result, result, m, k0, 1); - rsaz_1024_scatter5_avx2(table_s, result, 28); - // table[29] - rsaz_1024_mul_avx2(result, result, a_inv, m, k0); - rsaz_1024_scatter5_avx2(table_s, result, 29); - - // table[9] - rsaz_1024_gather5_avx2(result, table_s, 8); - rsaz_1024_mul_avx2(result, result, a_inv, m, k0); - rsaz_1024_scatter5_avx2(table_s, result, 9); - // table[18] - rsaz_1024_sqr_avx2(result, result, m, k0, 1); - rsaz_1024_scatter5_avx2(table_s, result, 18); - // table[19] - rsaz_1024_mul_avx2(result, result, a_inv, m, k0); - rsaz_1024_scatter5_avx2(table_s, result, 19); - - // table[11] - rsaz_1024_gather5_avx2(result, table_s, 10); - rsaz_1024_mul_avx2(result, result, a_inv, m, k0); - rsaz_1024_scatter5_avx2(table_s, result, 11); - // table[22] - rsaz_1024_sqr_avx2(result, result, m, k0, 1); - rsaz_1024_scatter5_avx2(table_s, result, 22); - // table[23] - rsaz_1024_mul_avx2(result, result, a_inv, m, k0); - rsaz_1024_scatter5_avx2(table_s, result, 23); - - // table[13] - rsaz_1024_gather5_avx2(result, table_s, 12); - rsaz_1024_mul_avx2(result, result, a_inv, m, k0); - rsaz_1024_scatter5_avx2(table_s, result, 13); - // table[26] - rsaz_1024_sqr_avx2(result, result, m, k0, 1); - rsaz_1024_scatter5_avx2(table_s, result, 26); - // table[27] - rsaz_1024_mul_avx2(result, result, a_inv, m, k0); - rsaz_1024_scatter5_avx2(table_s, result, 27); - - // table[15] - rsaz_1024_gather5_avx2(result, table_s, 14); - rsaz_1024_mul_avx2(result, result, a_inv, m, k0); - rsaz_1024_scatter5_avx2(table_s, result, 15); - // table[30] - rsaz_1024_sqr_avx2(result, result, m, k0, 1); - rsaz_1024_scatter5_avx2(table_s, result, 30); - // table[31] - rsaz_1024_mul_avx2(result, result, a_inv, m, k0); - rsaz_1024_scatter5_avx2(table_s, result, 31); -#endif + for (int i = 3; i < 32; i += 2) { + // table[i] = table[i-1] * a_inv = a_inv^i + rsaz_1024_gather5_avx2(result, table_s, i - 1); + rsaz_1024_mul_avx2(result, result, a_inv, m, k0); + rsaz_1024_scatter5_avx2(table_s, result, i); + for (int j = 2 * i; j < 32; j *= 2) { + // table[j] = table[j/2]^2 = a_inv^j + rsaz_1024_sqr_avx2(result, result, m, k0, 1); + rsaz_1024_scatter5_avx2(table_s, result, j); + } + } + // Load the first window. const uint8_t *p_str = (const uint8_t *)exponent; - - // load first window int wvalue = p_str[127] >> 3; rsaz_1024_gather5_avx2(result, table_s, wvalue); int index = 1014; while (index > -1) { // Loop for the remaining 127 windows. - rsaz_1024_sqr_avx2(result, result, m, k0, 5); uint16_t wvalue_16; From 8ba90d18175ed1b591c16ddedc95f5c51da56d22 Mon Sep 17 00:00:00 2001 From: Adam Langley Date: Mon, 6 Jun 2022 09:36:35 -0700 Subject: [PATCH 05/17] Add some more includes. (It's a pain because we don't have it setup in CMake, but perhaps we should have a builder for the configuration that doesn't have bcm.c.) Change-Id: Ic408f0a86c9d42346244d6a7b7e9e664b58fc70c Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52845 Reviewed-by: David Benjamin Commit-Queue: David Benjamin --- crypto/fipsmodule/ecdh/ecdh.c | 3 ++- crypto/fipsmodule/hmac/hmac.c | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/crypto/fipsmodule/ecdh/ecdh.c b/crypto/fipsmodule/ecdh/ecdh.c index dfea1e1994..25d0702386 100644 --- a/crypto/fipsmodule/ecdh/ecdh.c +++ b/crypto/fipsmodule/ecdh/ecdh.c @@ -74,8 +74,9 @@ #include #include -#include "../ec/internal.h" #include "../../internal.h" +#include "../ec/internal.h" +#include "../service_indicator/internal.h" int ECDH_compute_key_fips(uint8_t *out, size_t out_len, const EC_POINT *pub_key, diff --git a/crypto/fipsmodule/hmac/hmac.c b/crypto/fipsmodule/hmac/hmac.c index 454d0c0d2d..56e21b04ff 100644 --- a/crypto/fipsmodule/hmac/hmac.c +++ b/crypto/fipsmodule/hmac/hmac.c @@ -63,6 +63,7 @@ #include #include "../../internal.h" +#include "../service_indicator/internal.h" uint8_t *HMAC(const EVP_MD *evp_md, const void *key, size_t key_len, From 77dc23983f004056dbcd95ae96922be107365190 Mon Sep 17 00:00:00 2001 From: David Benjamin Date: Mon, 6 Jun 2022 17:18:39 -0400 Subject: [PATCH 06/17] Make it more obvious that am and tmp's widths are accurate. https://boringssl-review.googlesource.com/c/boringssl/+/52825 lost a tmp.width = top line. Without it, tmp.width was set by bn_one_to_montgomery. Since we always size modular arithmetic by the modulus, tmp.width (and am.width) will actually always be top, and there's actually no need to zero pad it. We don't capture this in the type system or BIGNUM width convention, so better to set the width explicitly. The original code did it at the end, but I think doing it right when we zero pad it is better, as that's when the size gets set. But we can go a step further. The manual zero padding code came from OpenSSL, which still had the bn_correct_top invariant. Our BIGNUMs are resizable, so just call bn_resize_words, immediately after the computation. (bn_resize_words will not reallocate the data because the BIGNUMs have the STATIC_DATA flag set. bn_wexpand will internally allow expanding up to dmax, or top.) Change-Id: I2403afa7381b8a407615c6730fba9edaa41125c6 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52906 Reviewed-by: Adam Langley Commit-Queue: David Benjamin --- crypto/fipsmodule/bn/exponentiation.c | 16 +++++----------- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/crypto/fipsmodule/bn/exponentiation.c b/crypto/fipsmodule/bn/exponentiation.c index e2e0d12c17..9b609b3a7a 100644 --- a/crypto/fipsmodule/bn/exponentiation.c +++ b/crypto/fipsmodule/bn/exponentiation.c @@ -1016,14 +1016,16 @@ int BN_mod_exp_mont_consttime(BIGNUM *rr, const BIGNUM *a, const BIGNUM *p, tmp.neg = am.neg = 0; tmp.flags = am.flags = BN_FLG_STATIC_DATA; - if (!bn_one_to_montgomery(&tmp, mont, ctx)) { + if (!bn_one_to_montgomery(&tmp, mont, ctx) || + !bn_resize_words(&tmp, top)) { goto err; } - // prepare a^1 in Montgomery domain + // Prepare a^1 in the Montgomery domain. assert(!a->neg); assert(BN_ucmp(a, m) < 0); - if (!BN_to_montgomery(&am, a, mont, ctx)) { + if (!BN_to_montgomery(&am, a, mont, ctx) || + !bn_resize_words(&am, top)) { goto err; } @@ -1047,14 +1049,6 @@ int BN_mod_exp_mont_consttime(BIGNUM *rr, const BIGNUM *a, const BIGNUM *p, // and its interaction with other parts of the project. Determine whether this // is actually necessary for performance. if (window == 5 && top > 1) { - // Ensure |am| and |tmp| are padded to the right width. - for (i = am.width; i < top; i++) { - am.d[i] = 0; - } - for (i = tmp.width; i < top; i++) { - tmp.d[i] = 0; - } - // Copy |mont->N| to improve cache locality. BN_ULONG *np = am.d + top; for (i = 0; i < top; i++) { From 44872e1c74cbac6a0772dd588a7693bffbdade17 Mon Sep 17 00:00:00 2001 From: David Benjamin Date: Sun, 5 Jun 2022 00:31:29 -0400 Subject: [PATCH 07/17] Add EVP_PKEY_HKDF for OpenSSL compatibility. OpenSSL 1.1.1 added HKDF support, but by sticking it into EVP_PKEY_derive, the API meant for Diffie-Hellman-like primitives. Implement it for OpenSSL compatibility. This does unfortunately mean anything using EVP now pulls in HKDF. HKDF isn't much code, but we should make EVP more static-linker-friendly. (Filed https://crbug.com/boringssl/497) Change-Id: I90b9b0d918129829eb36ba9d50ff4d8580346ff0 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52829 Reviewed-by: Adam Langley Commit-Queue: David Benjamin --- crypto/CMakeLists.txt | 1 + crypto/evp/evp_ctx.c | 1 + crypto/evp/internal.h | 6 + crypto/evp/p_hkdf.c | 241 +++++++++++++++++++++++++++++++++++++++ crypto/hkdf/hkdf_test.cc | 104 +++++++++++++++++ crypto/obj/obj_dat.h | 5 +- crypto/obj/obj_mac.num | 1 + crypto/obj/objects.txt | 2 + include/openssl/evp.h | 11 +- include/openssl/kdf.h | 91 +++++++++++++++ include/openssl/nid.h | 4 + util/doc.config | 3 +- 12 files changed, 463 insertions(+), 7 deletions(-) create mode 100644 crypto/evp/p_hkdf.c create mode 100644 include/openssl/kdf.h diff --git a/crypto/CMakeLists.txt b/crypto/CMakeLists.txt index cf7199cb83..80752d116d 100644 --- a/crypto/CMakeLists.txt +++ b/crypto/CMakeLists.txt @@ -296,6 +296,7 @@ add_library( evp/p_ec_asn1.c evp/p_ed25519.c evp/p_ed25519_asn1.c + evp/p_hkdf.c evp/p_rsa.c evp/p_rsa_asn1.c evp/p_x25519.c diff --git a/crypto/evp/evp_ctx.c b/crypto/evp/evp_ctx.c index 9ca2c55815..fda6536228 100644 --- a/crypto/evp/evp_ctx.c +++ b/crypto/evp/evp_ctx.c @@ -71,6 +71,7 @@ static const EVP_PKEY_METHOD *const evp_methods[] = { &ec_pkey_meth, &ed25519_pkey_meth, &x25519_pkey_meth, + &hkdf_pkey_meth, }; static const EVP_PKEY_METHOD *evp_pkey_meth_find(int type) { diff --git a/crypto/evp/internal.h b/crypto/evp/internal.h index 8b6a583630..95d5917471 100644 --- a/crypto/evp/internal.h +++ b/crypto/evp/internal.h @@ -178,6 +178,11 @@ OPENSSL_EXPORT int EVP_PKEY_CTX_ctrl(EVP_PKEY_CTX *ctx, int keytype, int optype, #define EVP_PKEY_CTRL_RSA_OAEP_LABEL (EVP_PKEY_ALG_CTRL + 11) #define EVP_PKEY_CTRL_GET_RSA_OAEP_LABEL (EVP_PKEY_ALG_CTRL + 12) #define EVP_PKEY_CTRL_EC_PARAMGEN_CURVE_NID (EVP_PKEY_ALG_CTRL + 13) +#define EVP_PKEY_CTRL_HKDF_MODE (EVP_PKEY_ALG_CTRL + 14) +#define EVP_PKEY_CTRL_HKDF_MD (EVP_PKEY_ALG_CTRL + 15) +#define EVP_PKEY_CTRL_HKDF_KEY (EVP_PKEY_ALG_CTRL + 16) +#define EVP_PKEY_CTRL_HKDF_SALT (EVP_PKEY_ALG_CTRL + 17) +#define EVP_PKEY_CTRL_HKDF_INFO (EVP_PKEY_ALG_CTRL + 18) struct evp_pkey_ctx_st { // Method associated with this operation @@ -260,6 +265,7 @@ extern const EVP_PKEY_METHOD rsa_pkey_meth; extern const EVP_PKEY_METHOD ec_pkey_meth; extern const EVP_PKEY_METHOD ed25519_pkey_meth; extern const EVP_PKEY_METHOD x25519_pkey_meth; +extern const EVP_PKEY_METHOD hkdf_pkey_meth; #if defined(__cplusplus) diff --git a/crypto/evp/p_hkdf.c b/crypto/evp/p_hkdf.c new file mode 100644 index 0000000000..932372dfdd --- /dev/null +++ b/crypto/evp/p_hkdf.c @@ -0,0 +1,241 @@ +/* Copyright (c) 2022, Google Inc. + * + * Permission to use, copy, modify, and/or distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY + * SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION + * OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN + * CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ + +#include + +#include +#include +#include +#include +#include + +#include "../internal.h" +#include "internal.h" + + +typedef struct { + int mode; + const EVP_MD *md; + uint8_t *key; + size_t key_len; + uint8_t *salt; + size_t salt_len; + CBB info; +} HKDF_PKEY_CTX; + +static int pkey_hkdf_init(EVP_PKEY_CTX *ctx) { + HKDF_PKEY_CTX *hctx = OPENSSL_malloc(sizeof(HKDF_PKEY_CTX)); + if (hctx == NULL) { + OPENSSL_PUT_ERROR(EVP, ERR_R_MALLOC_FAILURE); + return 0; + } + + OPENSSL_memset(hctx, 0, sizeof(HKDF_PKEY_CTX)); + if (!CBB_init(&hctx->info, 0)) { + OPENSSL_PUT_ERROR(EVP, ERR_R_MALLOC_FAILURE); + OPENSSL_free(hctx); + return 0; + } + + ctx->data = hctx; + return 1; +} + +static int pkey_hkdf_copy(EVP_PKEY_CTX *dst, EVP_PKEY_CTX *src) { + if (!pkey_hkdf_init(dst)) { + return 0; + } + + HKDF_PKEY_CTX *hctx_dst = dst->data; + const HKDF_PKEY_CTX *hctx_src = src->data; + hctx_dst->mode = hctx_src->mode; + hctx_dst->md = hctx_src->md; + + if (hctx_src->key_len != 0) { + hctx_dst->key = OPENSSL_memdup(hctx_src->key, hctx_src->key_len); + if (hctx_src->key == NULL) { + OPENSSL_PUT_ERROR(EVP, ERR_R_MALLOC_FAILURE); + return 0; + } + hctx_dst->key_len = hctx_src->key_len; + } + + if (hctx_src->salt_len != 0) { + hctx_dst->salt = OPENSSL_memdup(hctx_src->salt, hctx_src->salt_len); + if (hctx_src->salt == NULL) { + OPENSSL_PUT_ERROR(EVP, ERR_R_MALLOC_FAILURE); + return 0; + } + hctx_dst->salt_len = hctx_src->salt_len; + } + + if (!CBB_add_bytes(&hctx_dst->info, CBB_data(&hctx_src->info), + CBB_len(&hctx_src->info))) { + OPENSSL_PUT_ERROR(EVP, ERR_R_MALLOC_FAILURE); + return 0; + } + + return 1; +} + +static void pkey_hkdf_cleanup(EVP_PKEY_CTX *ctx) { + HKDF_PKEY_CTX *hctx = ctx->data; + if (hctx != NULL) { + OPENSSL_free(hctx->key); + OPENSSL_free(hctx->salt); + CBB_cleanup(&hctx->info); + OPENSSL_free(hctx); + ctx->data = NULL; + } +} + +static int pkey_hkdf_derive(EVP_PKEY_CTX *ctx, uint8_t *out, size_t *out_len) { + HKDF_PKEY_CTX *hctx = ctx->data; + if (hctx->md == NULL) { + OPENSSL_PUT_ERROR(EVP, EVP_R_MISSING_PARAMETERS); + return 0; + } + if (hctx->key_len == 0) { + OPENSSL_PUT_ERROR(EVP, EVP_R_NO_KEY_SET); + return 0; + } + + if (out == NULL) { + if (hctx->mode == EVP_PKEY_HKDEF_MODE_EXTRACT_ONLY) { + *out_len = EVP_MD_size(hctx->md); + } + // HKDF-Expand is variable-length and returns |*out_len| bytes. "Output" the + // input length by leaving it alone. + return 1; + } + + switch (hctx->mode) { + case EVP_PKEY_HKDEF_MODE_EXTRACT_AND_EXPAND: + return HKDF(out, *out_len, hctx->md, hctx->key, hctx->key_len, hctx->salt, + hctx->salt_len, CBB_data(&hctx->info), CBB_len(&hctx->info)); + + case EVP_PKEY_HKDEF_MODE_EXTRACT_ONLY: + if (*out_len < EVP_MD_size(hctx->md)) { + OPENSSL_PUT_ERROR(EVP, EVP_R_BUFFER_TOO_SMALL); + return 0; + } + return HKDF_extract(out, out_len, hctx->md, hctx->key, hctx->key_len, + hctx->salt, hctx->salt_len); + + case EVP_PKEY_HKDEF_MODE_EXPAND_ONLY: + return HKDF_expand(out, *out_len, hctx->md, hctx->key, hctx->key_len, + CBB_data(&hctx->info), CBB_len(&hctx->info)); + } + OPENSSL_PUT_ERROR(EVP, ERR_R_INTERNAL_ERROR); + return 0; +} + +static int pkey_hkdf_ctrl(EVP_PKEY_CTX *ctx, int type, int p1, void *p2) { + HKDF_PKEY_CTX *hctx = ctx->data; + switch (type) { + case EVP_PKEY_CTRL_HKDF_MODE: + if (p1 != EVP_PKEY_HKDEF_MODE_EXTRACT_AND_EXPAND && + p1 != EVP_PKEY_HKDEF_MODE_EXTRACT_ONLY && + p1 != EVP_PKEY_HKDEF_MODE_EXPAND_ONLY) { + OPENSSL_PUT_ERROR(EVP, EVP_R_INVALID_OPERATION); + return 0; + } + hctx->mode = p1; + return 1; + case EVP_PKEY_CTRL_HKDF_MD: + hctx->md = p2; + return 1; + case EVP_PKEY_CTRL_HKDF_KEY: { + const CBS *key = p2; + if (!CBS_stow(key, &hctx->key, &hctx->key_len)) { + OPENSSL_PUT_ERROR(EVP, ERR_R_MALLOC_FAILURE); + return 0; + } + return 1; + } + case EVP_PKEY_CTRL_HKDF_SALT: { + const CBS *salt = p2; + if (!CBS_stow(salt, &hctx->salt, &hctx->salt_len)) { + OPENSSL_PUT_ERROR(EVP, ERR_R_MALLOC_FAILURE); + return 0; + } + return 1; + } + case EVP_PKEY_CTRL_HKDF_INFO: { + const CBS *info = p2; + // |EVP_PKEY_CTX_add1_hkdf_info| appends to the info string, rather than + // replacing it. + if (!CBB_add_bytes(&hctx->info, CBS_data(info), CBS_len(info))) { + OPENSSL_PUT_ERROR(EVP, ERR_R_MALLOC_FAILURE); + return 0; + } + return 1; + } + default: + OPENSSL_PUT_ERROR(EVP, EVP_R_COMMAND_NOT_SUPPORTED); + return 0; + } +} + +const EVP_PKEY_METHOD hkdf_pkey_meth = { + EVP_PKEY_HKDF, + pkey_hkdf_init, + pkey_hkdf_copy, + pkey_hkdf_cleanup, + /*keygen=*/NULL, + /*sign=*/NULL, + /*sign_message=*/NULL, + /*verify=*/NULL, + /*verify_message=*/NULL, + /*verify_recover=*/NULL, + /*encrypt=*/NULL, + /*decrypt=*/NULL, + pkey_hkdf_derive, + /*paramgen=*/NULL, + pkey_hkdf_ctrl, +}; + +int EVP_PKEY_CTX_hkdf_mode(EVP_PKEY_CTX *ctx, int mode) { + return EVP_PKEY_CTX_ctrl(ctx, EVP_PKEY_HKDF, EVP_PKEY_OP_DERIVE, + EVP_PKEY_CTRL_HKDF_MODE, mode, NULL); +} + +int EVP_PKEY_CTX_set_hkdf_md(EVP_PKEY_CTX *ctx, const EVP_MD *md) { + return EVP_PKEY_CTX_ctrl(ctx, EVP_PKEY_HKDF, EVP_PKEY_OP_DERIVE, + EVP_PKEY_CTRL_HKDF_MD, 0, (void *)md); +} + +int EVP_PKEY_CTX_set1_hkdf_key(EVP_PKEY_CTX *ctx, const uint8_t *key, + size_t key_len) { + CBS cbs; + CBS_init(&cbs, key, key_len); + return EVP_PKEY_CTX_ctrl(ctx, EVP_PKEY_HKDF, EVP_PKEY_OP_DERIVE, + EVP_PKEY_CTRL_HKDF_KEY, 0, &cbs); +} + +int EVP_PKEY_CTX_set1_hkdf_salt(EVP_PKEY_CTX *ctx, const uint8_t *salt, + size_t salt_len) { + CBS cbs; + CBS_init(&cbs, salt, salt_len); + return EVP_PKEY_CTX_ctrl(ctx, EVP_PKEY_HKDF, EVP_PKEY_OP_DERIVE, + EVP_PKEY_CTRL_HKDF_SALT, 0, &cbs); +} + +int EVP_PKEY_CTX_add1_hkdf_info(EVP_PKEY_CTX *ctx, const uint8_t *info, + size_t info_len) { + CBS cbs; + CBS_init(&cbs, info, info_len); + return EVP_PKEY_CTX_ctrl(ctx, EVP_PKEY_HKDF, EVP_PKEY_OP_DERIVE, + EVP_PKEY_CTRL_HKDF_INFO, 0, &cbs); +} diff --git a/crypto/hkdf/hkdf_test.cc b/crypto/hkdf/hkdf_test.cc index 793506ffa4..8aad6c8540 100644 --- a/crypto/hkdf/hkdf_test.cc +++ b/crypto/hkdf/hkdf_test.cc @@ -14,7 +14,9 @@ #include #include +#include #include +#include #include @@ -266,6 +268,108 @@ TEST(HKDFTest, TestVectors) { test->ikm_len, test->salt, test->salt_len, test->info, test->info_len)); EXPECT_EQ(Bytes(test->out, test->out_len), Bytes(buf, test->out_len)); + + // Repeat the test with the OpenSSL compatibility |EVP_PKEY_derive| API. + bssl::UniquePtr ctx( + EVP_PKEY_CTX_new_id(EVP_PKEY_HKDF, nullptr)); + ASSERT_TRUE(ctx); + ASSERT_TRUE(EVP_PKEY_derive_init(ctx.get())); + ASSERT_TRUE( + EVP_PKEY_CTX_hkdf_mode(ctx.get(), EVP_PKEY_HKDEF_MODE_EXTRACT_ONLY)); + ASSERT_TRUE(EVP_PKEY_CTX_set_hkdf_md(ctx.get(), test->md_func())); + ASSERT_TRUE( + EVP_PKEY_CTX_set1_hkdf_key(ctx.get(), test->ikm, test->ikm_len)); + ASSERT_TRUE( + EVP_PKEY_CTX_set1_hkdf_salt(ctx.get(), test->salt, test->salt_len)); + for (bool copy_ctx : {false, true}) { + SCOPED_TRACE(copy_ctx); + bssl::UniquePtr copy; + EVP_PKEY_CTX *use_ctx = ctx.get(); + if (copy_ctx) { + copy.reset(EVP_PKEY_CTX_dup(ctx.get())); + ASSERT_TRUE(copy); + use_ctx = copy.get(); + } + + // A null output should report the length. + prk_len = 0; + ASSERT_TRUE(EVP_PKEY_derive(use_ctx, nullptr, &prk_len)); + EXPECT_EQ(prk_len, test->prk_len); + + // Too small of a buffer should cleanly fail. + prk_len = test->prk_len - 1; + EXPECT_FALSE(EVP_PKEY_derive(use_ctx, prk, &prk_len)); + ERR_clear_error(); + + // Test the correct buffer size. + OPENSSL_memset(prk, 0, sizeof(prk)); + prk_len = test->prk_len; + ASSERT_TRUE(EVP_PKEY_derive(use_ctx, prk, &prk_len)); + EXPECT_EQ(Bytes(test->prk, test->prk_len), Bytes(prk, prk_len)); + + // Test a larger buffer than necessary. + OPENSSL_memset(prk, 0, sizeof(prk)); + prk_len = test->prk_len + 1; + ASSERT_TRUE(EVP_PKEY_derive(use_ctx, prk, &prk_len)); + EXPECT_EQ(Bytes(test->prk, test->prk_len), Bytes(prk, prk_len)); + } + + ctx.reset(EVP_PKEY_CTX_new_id(EVP_PKEY_HKDF, nullptr)); + ASSERT_TRUE(ctx); + ASSERT_TRUE(EVP_PKEY_derive_init(ctx.get())); + ASSERT_TRUE( + EVP_PKEY_CTX_hkdf_mode(ctx.get(), EVP_PKEY_HKDEF_MODE_EXPAND_ONLY)); + ASSERT_TRUE(EVP_PKEY_CTX_set_hkdf_md(ctx.get(), test->md_func())); + ASSERT_TRUE( + EVP_PKEY_CTX_set1_hkdf_key(ctx.get(), test->prk, test->prk_len)); + // |info| can be passed in multiple parts. + size_t half = test->info_len / 2; + ASSERT_TRUE(EVP_PKEY_CTX_add1_hkdf_info(ctx.get(), test->info, half)); + ASSERT_TRUE(EVP_PKEY_CTX_add1_hkdf_info(ctx.get(), test->info + half, + test->info_len - half)); + for (bool copy_ctx : {false, true}) { + SCOPED_TRACE(copy_ctx); + bssl::UniquePtr copy; + EVP_PKEY_CTX *use_ctx = ctx.get(); + if (copy_ctx) { + copy.reset(EVP_PKEY_CTX_dup(ctx.get())); + ASSERT_TRUE(copy); + use_ctx = copy.get(); + } + OPENSSL_memset(buf, 0, sizeof(buf)); + size_t len = test->out_len; + ASSERT_TRUE(EVP_PKEY_derive(use_ctx, buf, &len)); + EXPECT_EQ(Bytes(test->out, test->out_len), Bytes(buf, len)); + } + + ctx.reset(EVP_PKEY_CTX_new_id(EVP_PKEY_HKDF, nullptr)); + ASSERT_TRUE(ctx); + ASSERT_TRUE(EVP_PKEY_derive_init(ctx.get())); + ASSERT_TRUE(EVP_PKEY_CTX_hkdf_mode(ctx.get(), + EVP_PKEY_HKDEF_MODE_EXTRACT_AND_EXPAND)); + ASSERT_TRUE(EVP_PKEY_CTX_set_hkdf_md(ctx.get(), test->md_func())); + ASSERT_TRUE( + EVP_PKEY_CTX_set1_hkdf_key(ctx.get(), test->ikm, test->ikm_len)); + ASSERT_TRUE( + EVP_PKEY_CTX_set1_hkdf_salt(ctx.get(), test->salt, test->salt_len)); + // |info| can be passed in multiple parts. + ASSERT_TRUE(EVP_PKEY_CTX_add1_hkdf_info(ctx.get(), test->info, half)); + ASSERT_TRUE(EVP_PKEY_CTX_add1_hkdf_info(ctx.get(), test->info + half, + test->info_len - half)); + for (bool copy_ctx : {false, true}) { + SCOPED_TRACE(copy_ctx); + bssl::UniquePtr copy; + EVP_PKEY_CTX *use_ctx = ctx.get(); + if (copy_ctx) { + copy.reset(EVP_PKEY_CTX_dup(ctx.get())); + ASSERT_TRUE(copy); + use_ctx = copy.get(); + } + OPENSSL_memset(buf, 0, sizeof(buf)); + size_t len = test->out_len; + ASSERT_TRUE(EVP_PKEY_derive(use_ctx, buf, &len)); + EXPECT_EQ(Bytes(test->out, test->out_len), Bytes(buf, len)); + } } } diff --git a/crypto/obj/obj_dat.h b/crypto/obj/obj_dat.h index 778d8e3cf8..cc185f1bf1 100644 --- a/crypto/obj/obj_dat.h +++ b/crypto/obj/obj_dat.h @@ -57,7 +57,7 @@ /* This file is generated by crypto/obj/objects.go. */ -#define NUM_NID 963 +#define NUM_NID 964 static const uint8_t kObjectData[] = { /* NID_rsadsi */ @@ -8781,6 +8781,7 @@ static const ASN1_OBJECT kObjects[NUM_NID] = { {"ED448", "ED448", NID_ED448, 3, &kObjectData[6181], 0}, {"X448", "X448", NID_X448, 3, &kObjectData[6184], 0}, {"SHA512-256", "sha512-256", NID_sha512_256, 9, &kObjectData[6187], 0}, + {"HKDF", "hkdf", NID_hkdf, 0, NULL, 0}, }; static const uint16_t kNIDsInShortNameOrder[] = { @@ -8878,6 +8879,7 @@ static const uint16_t kNIDsInShortNameOrder[] = { 949 /* ED25519 */, 960 /* ED448 */, 99 /* GN */, + 963 /* HKDF */, 855 /* HMAC */, 780 /* HMAC-MD5 */, 781 /* HMAC-SHA1 */, @@ -10096,6 +10098,7 @@ static const uint16_t kNIDsInLongNameOrder[] = { 601 /* generic cryptogram */, 99 /* givenName */, 814 /* gost89-cnt */, + 963 /* hkdf */, 855 /* hmac */, 780 /* hmac-md5 */, 781 /* hmac-sha1 */, diff --git a/crypto/obj/obj_mac.num b/crypto/obj/obj_mac.num index f110ee9296..6367cc3645 100644 --- a/crypto/obj/obj_mac.num +++ b/crypto/obj/obj_mac.num @@ -951,3 +951,4 @@ CECPQ2 959 ED448 960 X448 961 sha512_256 962 +hkdf 963 diff --git a/crypto/obj/objects.txt b/crypto/obj/objects.txt index 25c2430535..67b76c7734 100644 --- a/crypto/obj/objects.txt +++ b/crypto/obj/objects.txt @@ -1356,3 +1356,5 @@ secg-scheme 14 3 : dhSinglePass-cofactorDH-sha512kdf-scheme # TLS 1.3 cipher suites do not specify key exchange or authentication. : KxANY : kx-any : AuthANY : auth-any + + : HKDF : hkdf diff --git a/include/openssl/evp.h b/include/openssl/evp.h index e195907a20..5f3b03cfc7 100644 --- a/include/openssl/evp.h +++ b/include/openssl/evp.h @@ -178,6 +178,7 @@ OPENSSL_EXPORT EC_KEY *EVP_PKEY_get1_EC_KEY(const EVP_PKEY *pkey); #define EVP_PKEY_EC NID_X9_62_id_ecPublicKey #define EVP_PKEY_ED25519 NID_ED25519 #define EVP_PKEY_X25519 NID_X25519 +#define EVP_PKEY_HKDF NID_hkdf // EVP_PKEY_assign sets the underlying key of |pkey| to |key|, which must be of // the given type. It returns one if successful or zero if the |type| argument @@ -665,11 +666,11 @@ OPENSSL_EXPORT int EVP_PKEY_derive_init(EVP_PKEY_CTX *ctx); // success and zero on error. OPENSSL_EXPORT int EVP_PKEY_derive_set_peer(EVP_PKEY_CTX *ctx, EVP_PKEY *peer); -// EVP_PKEY_derive derives a shared key between the two keys configured in -// |ctx|. If |key| is non-NULL then, on entry, |out_key_len| must contain the -// amount of space at |key|. If sufficient then the shared key will be written -// to |key| and |*out_key_len| will be set to the length. If |key| is NULL then -// |out_key_len| will be set to the maximum length. +// EVP_PKEY_derive derives a shared key from |ctx|. If |key| is non-NULL then, +// on entry, |out_key_len| must contain the amount of space at |key|. If +// sufficient then the shared key will be written to |key| and |*out_key_len| +// will be set to the length. If |key| is NULL then |out_key_len| will be set to +// the maximum length. // // WARNING: Setting |out| to NULL only gives the maximum size of the key. The // actual key may be smaller. diff --git a/include/openssl/kdf.h b/include/openssl/kdf.h new file mode 100644 index 0000000000..7adad38103 --- /dev/null +++ b/include/openssl/kdf.h @@ -0,0 +1,91 @@ +/* Copyright (c) 2022, Google Inc. + * + * Permission to use, copy, modify, and/or distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY + * SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION + * OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN + * CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ + +#ifndef OPENSSL_HEADER_KDF_H +#define OPENSSL_HEADER_KDF_H + +#include + +#if defined(__cplusplus) +extern "C" { +#endif + + +// KDF support for EVP. + + +// HKDF-specific functions. +// +// The following functions are provided for OpenSSL compatibility. Prefer the +// HKDF functions in . In each, |ctx| must be created with +// |EVP_PKEY_CTX_new_id| with |EVP_PKEY_HKDF| and then initialized with +// |EVP_PKEY_derive_init|. + +// EVP_PKEY_HKDEF_MODE_* define "modes" for use with |EVP_PKEY_CTX_hkdf_mode|. +// The mispelling of "HKDF" as "HKDEF" is intentional for OpenSSL compatibility. +#define EVP_PKEY_HKDEF_MODE_EXTRACT_AND_EXPAND 0 +#define EVP_PKEY_HKDEF_MODE_EXTRACT_ONLY 1 +#define EVP_PKEY_HKDEF_MODE_EXPAND_ONLY 2 + +// EVP_PKEY_CTX_hkdf_mode configures which HKDF operation to run. It returns one +// on success and zero on error. |mode| must be one of |EVP_PKEY_HKDEF_MODE_*|. +// By default, the mode is |EVP_PKEY_HKDEF_MODE_EXTRACT_AND_EXPAND|. +// +// If |mode| is |EVP_PKEY_HKDEF_MODE_EXTRACT_AND_EXPAND| or +// |EVP_PKEY_HKDEF_MODE_EXPAND_ONLY|, the output is variable-length. +// |EVP_PKEY_derive| uses the size of the output buffer as the output length for +// HKDF-Expand. +// +// WARNING: Although this API calls it a "mode", HKDF-Extract and HKDF-Expand +// are distinct operations with distinct inputs and distinct kinds of keys. +// Callers should not pass input secrets for one operation into the other. +OPENSSL_EXPORT int EVP_PKEY_CTX_hkdf_mode(EVP_PKEY_CTX *ctx, int mode); + +// EVP_PKEY_CTX_set_hkdf_md sets |md| as the digest to use with HKDF. It returns +// one on success and zero on error. +OPENSSL_EXPORT int EVP_PKEY_CTX_set_hkdf_md(EVP_PKEY_CTX *ctx, + const EVP_MD *md); + +// EVP_PKEY_CTX_set1_hkdf_key configures HKDF to use |key_len| bytes from |key| +// as the "key", described below. It returns one on success and zero on error. +// +// Which input is the key depends on the "mode" (see |EVP_PKEY_CTX_hkdf_mode|). +// If |EVP_PKEY_HKDEF_MODE_EXTRACT_AND_EXPAND| or +// |EVP_PKEY_HKDEF_MODE_EXTRACT_ONLY|, this function specifies the input keying +// material (IKM) for HKDF-Extract. If |EVP_PKEY_HKDEF_MODE_EXPAND_ONLY|, it +// instead specifies the pseudorandom key (PRK) for HKDF-Expand. +OPENSSL_EXPORT int EVP_PKEY_CTX_set1_hkdf_key(EVP_PKEY_CTX *ctx, + const uint8_t *key, + size_t key_len); + +// EVP_PKEY_CTX_set1_hkdf_salt configures HKDF to use |salt_len| bytes from +// |salt| as the salt parameter to HKDF-Extract. It returns one on success and +// zero on error. If performing HKDF-Expand only, this parameter is ignored. +OPENSSL_EXPORT int EVP_PKEY_CTX_set1_hkdf_salt(EVP_PKEY_CTX *ctx, + const uint8_t *salt, + size_t salt_len); + +// EVP_PKEY_CTX_add1_hkdf_info appends |info_len| bytes from |info| to the info +// parameter used with HKDF-Expand. It returns one on success and zero on error. +// If performing HKDF-Extract only, this parameter is ignored. +OPENSSL_EXPORT int EVP_PKEY_CTX_add1_hkdf_info(EVP_PKEY_CTX *ctx, + const uint8_t *info, + size_t info_len); + + +#if defined(__cplusplus) +} // extern C +#endif + +#endif // OPENSSL_HEADER_KDF_H diff --git a/include/openssl/nid.h b/include/openssl/nid.h index bf7f3da5f8..54ecc5e26b 100644 --- a/include/openssl/nid.h +++ b/include/openssl/nid.h @@ -4251,6 +4251,10 @@ extern "C" { #define NID_sha512_256 962 #define OBJ_sha512_256 2L, 16L, 840L, 1L, 101L, 3L, 4L, 2L, 6L +#define SN_hkdf "HKDF" +#define LN_hkdf "hkdf" +#define NID_hkdf 963 + #if defined(__cplusplus) } /* extern C */ diff --git a/util/doc.config b/util/doc.config index 81da8bddbf..8de5f1decf 100644 --- a/util/doc.config +++ b/util/doc.config @@ -48,7 +48,8 @@ "include/openssl/cipher.h", "include/openssl/aead.h", "include/openssl/evp.h", - "include/openssl/hpke.h" + "include/openssl/hpke.h", + "include/openssl/kdf.h" ] },{ "Name": "Legacy ASN.1 and X.509 implementation (documentation in progress)", From 4566bb5fe517f7f141b5fe935c559fc4311af35d Mon Sep 17 00:00:00 2001 From: Rebecca Chang Swee Fun Date: Thu, 2 Jun 2022 07:18:58 +0000 Subject: [PATCH 08/17] Add support for RISC-V 64-bit architecture Signed-off-by: Rebecca Chang Swee Fun Change-Id: If6424a3b268943a5e2dc847f94b509d4b509df79 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52765 Commit-Queue: Adam Langley Reviewed-by: Adam Langley --- CMakeLists.txt | 2 ++ include/openssl/base.h | 2 ++ 2 files changed, 4 insertions(+) diff --git a/CMakeLists.txt b/CMakeLists.txt index aadcb9ba1c..766c839779 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -494,6 +494,8 @@ elseif(CMAKE_SYSTEM_PROCESSOR STREQUAL "mips") set(ARCH "generic") elseif(CMAKE_SYSTEM_PROCESSOR STREQUAL "ppc64le") set(ARCH "ppc64le") +elseif (${CMAKE_SYSTEM_PROCESSOR} STREQUAL "riscv64") + set(ARCH "riscv64") else() message(FATAL_ERROR "Unknown processor:" ${CMAKE_SYSTEM_PROCESSOR}) endif() diff --git a/include/openssl/base.h b/include/openssl/base.h index 4ab9eca7ad..a4a91f5403 100644 --- a/include/openssl/base.h +++ b/include/openssl/base.h @@ -118,6 +118,8 @@ extern "C" { #define OPENSSL_32_BIT #elif defined(__myriad2__) #define OPENSSL_32_BIT +#elif defined(__riscv) && __riscv_xlen == 64 +#define OPENSSL_64_BIT #else // Note BoringSSL only supports standard 32-bit and 64-bit two's-complement, // little-endian architectures. Functions will not produce the correct answer From f575d9b3632cc6bcaa14e8bb0328c9f68579a342 Mon Sep 17 00:00:00 2001 From: David Benjamin Date: Wed, 8 Jun 2022 13:33:00 -0400 Subject: [PATCH 09/17] Add X509_REQ_set1_signature_algo and X509_REQ_set1_signature_value. These are the X509_REQ analogs to https://boringssl-review.googlesource.com/c/boringssl/+/43784 and https://boringssl-review.googlesource.com/c/boringssl/+/47525. The one difference is CSRs do not have the redundant copy of the signature algorithm, so X509_REQ_set1_signature_algo is a little simpler. Change-Id: I5fe27b22b30520c71542d51044b0b16712021f59 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52945 Reviewed-by: Bob Beck Commit-Queue: Bob Beck --- crypto/x509/x509cset.c | 4 ++-- crypto/x509/x509rset.c | 26 ++++++++++++++++++++++++++ crypto/x509/x_x509.c | 4 ++-- include/openssl/x509.h | 18 ++++++++++++++++++ 4 files changed, 48 insertions(+), 4 deletions(-) diff --git a/crypto/x509/x509cset.c b/crypto/x509/x509cset.c index 1671f352cc..a51406bcfc 100644 --- a/crypto/x509/x509cset.c +++ b/crypto/x509/x509cset.c @@ -262,8 +262,8 @@ int i2d_X509_CRL_tbs(X509_CRL *crl, unsigned char **outp) int X509_CRL_set1_signature_algo(X509_CRL *crl, const X509_ALGOR *algo) { - /* TODO(davidben): Const-correct generated ASN.1 dup functions. - * Alternatively, when the types are hidden and we can embed required fields + /* TODO(https://crbug.com/boringssl/407): Generated ASN.1 dup functions + * should be const. Alternatively, when we can embed required fields * directly in structs, import |X509_ALGOR_copy| from upstream. */ X509_ALGOR *copy1 = X509_ALGOR_dup((X509_ALGOR *)algo); X509_ALGOR *copy2 = X509_ALGOR_dup((X509_ALGOR *)algo); diff --git a/crypto/x509/x509rset.c b/crypto/x509/x509rset.c index c69f8cb15f..108485c3a6 100644 --- a/crypto/x509/x509rset.c +++ b/crypto/x509/x509rset.c @@ -87,3 +87,29 @@ int X509_REQ_set_pubkey(X509_REQ *x, EVP_PKEY *pkey) return (0); return (X509_PUBKEY_set(&x->req_info->pubkey, pkey)); } + +int X509_REQ_set1_signature_algo(X509_REQ *req, const X509_ALGOR *algo) +{ + /* TODO(https://crbug.com/boringssl/407): Generated ASN.1 dup functions + * should be const. Alternatively, when we can embed required fields + * directly in structs, import |X509_ALGOR_copy| from upstream. */ + X509_ALGOR *copy = X509_ALGOR_dup((X509_ALGOR *)algo); + if (copy == NULL) { + return 0; + } + + X509_ALGOR_free(req->sig_alg); + req->sig_alg = copy; + return 1; +} + +int X509_REQ_set1_signature_value(X509_REQ *req, const uint8_t *sig, + size_t sig_len) +{ + if (!ASN1_STRING_set(req->signature, sig, sig_len)) { + return 0; + } + req->signature->flags &= ~(ASN1_STRING_FLAG_BITS_LEFT | 0x07); + req->signature->flags |= ASN1_STRING_FLAG_BITS_LEFT; + return 1; +} diff --git a/crypto/x509/x_x509.c b/crypto/x509/x_x509.c index dfcc72d3b4..37ee975219 100644 --- a/crypto/x509/x_x509.c +++ b/crypto/x509/x_x509.c @@ -351,8 +351,8 @@ int i2d_X509_tbs(X509 *x509, unsigned char **outp) int X509_set1_signature_algo(X509 *x509, const X509_ALGOR *algo) { - /* TODO(davidben): Const-correct generated ASN.1 dup functions. - * Alternatively, when the types are hidden and we can embed required fields + /* TODO(https://crbug.com/boringssl/407): Generated ASN.1 dup functions + * should be const. Alternatively, when we can embed required fields * directly in structs, import |X509_ALGOR_copy| from upstream. */ X509_ALGOR *copy1 = X509_ALGOR_dup((X509_ALGOR *)algo); X509_ALGOR *copy2 = X509_ALGOR_dup((X509_ALGOR *)algo); diff --git a/include/openssl/x509.h b/include/openssl/x509.h index 4d312c7e9b..945355f879 100644 --- a/include/openssl/x509.h +++ b/include/openssl/x509.h @@ -1191,6 +1191,24 @@ OPENSSL_EXPORT int X509_REQ_add1_attr_by_txt(X509_REQ *req, const unsigned char *data, int len); +// X509_REQ_set1_signature_algo sets |req|'s signature algorithm to |algo| and +// returns one on success or zero on error. +OPENSSL_EXPORT int X509_REQ_set1_signature_algo(X509_REQ *req, + const X509_ALGOR *algo); + +// X509_REQ_set1_signature_value sets |req|'s signature to a copy of the +// |sig_len| bytes pointed by |sig|. It returns one on success and zero on +// error. +// +// Due to a specification error, PKCS#10 certificate requests store signatures +// in ASN.1 BIT STRINGs, but signature algorithms return byte strings rather +// than bit strings. This function creates a BIT STRING containing a whole +// number of bytes, with the bit order matching the DER encoding. This matches +// the encoding used by all X.509 signature algorithms. +OPENSSL_EXPORT int X509_REQ_set1_signature_value(X509_REQ *req, + const uint8_t *sig, + size_t sig_len); + // X509_CRL_set_version sets |crl|'s version to |version|, which should be one // of the |X509_CRL_VERSION_*| constants. It returns one on success and zero on // error. From 1e469e45a46ff580899cbef939babe02ad916c85 Mon Sep 17 00:00:00 2001 From: David Benjamin Date: Fri, 13 May 2022 17:06:56 -0400 Subject: [PATCH 10/17] Replace some more C unions. I don't think these are all UB by C's rules, but it's easier not to think about the pointers. Still more to go, but these were some easy ones. Bug: 301 Change-Id: Icdcb7fb40f85983cbf566786c5f7dbfd7bb06571 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52905 Reviewed-by: Bob Beck Commit-Queue: Bob Beck --- crypto/bytestring/cbb.c | 11 ++--- crypto/bytestring/cbs.c | 11 ++--- crypto/cipher_extra/e_aesgcmsiv.c | 73 ++++++++++--------------------- crypto/curve25519/curve25519.c | 14 +++--- crypto/fipsmodule/cipher/e_aes.c | 13 +++--- crypto/fipsmodule/modes/gcm.c | 35 ++++++--------- crypto/internal.h | 10 +++++ crypto/siphash/siphash.c | 19 ++++---- crypto/siphash/siphash_test.cc | 12 +++-- 9 files changed, 79 insertions(+), 119 deletions(-) diff --git a/crypto/bytestring/cbb.c b/crypto/bytestring/cbb.c index 12587cdf10..6ce20ad7ab 100644 --- a/crypto/bytestring/cbb.c +++ b/crypto/bytestring/cbb.c @@ -542,14 +542,11 @@ int CBB_add_asn1_int64(CBB *cbb, int64_t value) { return CBB_add_asn1_uint64(cbb, value); } - union { - int64_t i; - uint8_t bytes[sizeof(int64_t)]; - } u; - u.i = value; + uint8_t bytes[sizeof(int64_t)]; + memcpy(bytes, &value, sizeof(value)); int start = 7; // Skip leading sign-extension bytes unless they are necessary. - while (start > 0 && (u.bytes[start] == 0xff && (u.bytes[start - 1] & 0x80))) { + while (start > 0 && (bytes[start] == 0xff && (bytes[start - 1] & 0x80))) { start--; } @@ -558,7 +555,7 @@ int CBB_add_asn1_int64(CBB *cbb, int64_t value) { return 0; } for (int i = start; i >= 0; i--) { - if (!CBB_add_u8(&child, u.bytes[i])) { + if (!CBB_add_u8(&child, bytes[i])) { return 0; } } diff --git a/crypto/bytestring/cbs.c b/crypto/bytestring/cbs.c index 010897b107..741ecfbc56 100644 --- a/crypto/bytestring/cbs.c +++ b/crypto/bytestring/cbs.c @@ -496,15 +496,12 @@ int CBS_get_asn1_int64(CBS *cbs, int64_t *out) { if (len > sizeof(int64_t)) { return 0; } - union { - int64_t i; - uint8_t bytes[sizeof(int64_t)]; - } u; - memset(u.bytes, is_negative ? 0xff : 0, sizeof(u.bytes)); // Sign-extend. + uint8_t sign_extend[sizeof(int64_t)]; + memset(sign_extend, is_negative ? 0xff : 0, sizeof(sign_extend)); for (size_t i = 0; i < len; i++) { - u.bytes[i] = data[len - i - 1]; + sign_extend[i] = data[len - i - 1]; } - *out = u.i; + memcpy(out, sign_extend, sizeof(sign_extend)); return 1; } diff --git a/crypto/cipher_extra/e_aesgcmsiv.c b/crypto/cipher_extra/e_aesgcmsiv.c index 387eafff02..555922dd74 100644 --- a/crypto/cipher_extra/e_aesgcmsiv.c +++ b/crypto/cipher_extra/e_aesgcmsiv.c @@ -262,17 +262,10 @@ static void gcm_siv_asm_polyval(uint8_t out_tag[16], const uint8_t *in, aesgcmsiv_polyval_horner(out_tag, auth_key, scratch, 1); } - union { - uint8_t c[16]; - struct { - uint64_t ad; - uint64_t in; - } bitlens; - } length_block; - - length_block.bitlens.ad = ad_len * 8; - length_block.bitlens.in = in_len * 8; - aesgcmsiv_polyval_horner(out_tag, auth_key, length_block.c, 1); + uint8_t length_block[16]; + CRYPTO_store_u64_le(length_block, ad_len * 8); + CRYPTO_store_u64_le(length_block + 8, in_len * 8); + aesgcmsiv_polyval_horner(out_tag, auth_key, length_block, 1); for (size_t i = 0; i < 12; i++) { out_tag[i] ^= nonce[i]; @@ -289,18 +282,15 @@ static void aead_aes_gcm_siv_asm_crypt_last_block( int is_128_bit, uint8_t *out, const uint8_t *in, size_t in_len, const uint8_t tag[16], const struct aead_aes_gcm_siv_asm_ctx *enc_key_expanded) { - alignas(16) union { - uint8_t c[16]; - uint32_t u32[4]; - } counter; + alignas(16) uint8_t counter[16]; OPENSSL_memcpy(&counter, tag, sizeof(counter)); - counter.c[15] |= 0x80; - counter.u32[0] += in_len / 16; + counter[15] |= 0x80; + CRYPTO_store_u32_le(counter, CRYPTO_load_u32_le(counter) + in_len / 16); if (is_128_bit) { - aes128gcmsiv_ecb_enc_block(&counter.c[0], &counter.c[0], enc_key_expanded); + aes128gcmsiv_ecb_enc_block(counter, counter, enc_key_expanded); } else { - aes256gcmsiv_ecb_enc_block(&counter.c[0], &counter.c[0], enc_key_expanded); + aes256gcmsiv_ecb_enc_block(counter, counter, enc_key_expanded); } const size_t last_bytes_offset = in_len & ~15; @@ -308,7 +298,7 @@ static void aead_aes_gcm_siv_asm_crypt_last_block( uint8_t *last_bytes_out = &out[last_bytes_offset]; const uint8_t *last_bytes_in = &in[last_bytes_offset]; for (size_t i = 0; i < last_bytes_len; i++) { - last_bytes_out[i] = last_bytes_in[i] ^ counter.c[i]; + last_bytes_out[i] = last_bytes_in[i] ^ counter[i]; } } @@ -489,18 +479,11 @@ static int aead_aes_gcm_siv_asm_open(const EVP_AEAD_CTX *ctx, uint8_t *out, scratch, 1); } - union { - uint8_t c[16]; - struct { - uint64_t ad; - uint64_t in; - } bitlens; - } length_block; - - length_block.bitlens.ad = ad_len * 8; - length_block.bitlens.in = plaintext_len * 8; + uint8_t length_block[16]; + CRYPTO_store_u64_le(length_block, ad_len * 8); + CRYPTO_store_u64_le(length_block + 8, plaintext_len * 8); aesgcmsiv_polyval_horner(calculated_tag, (const uint8_t *)record_auth_key, - length_block.c, 1); + length_block, 1); for (size_t i = 0; i < 12; i++) { calculated_tag[i] ^= nonce[i]; @@ -619,18 +602,15 @@ static void aead_aes_gcm_siv_cleanup(EVP_AEAD_CTX *ctx) {} static void gcm_siv_crypt(uint8_t *out, const uint8_t *in, size_t in_len, const uint8_t initial_counter[AES_BLOCK_SIZE], block128_f enc_block, const AES_KEY *key) { - union { - uint32_t w[4]; - uint8_t c[16]; - } counter; + uint8_t counter[16]; - OPENSSL_memcpy(counter.c, initial_counter, AES_BLOCK_SIZE); - counter.c[15] |= 0x80; + OPENSSL_memcpy(counter, initial_counter, AES_BLOCK_SIZE); + counter[15] |= 0x80; for (size_t done = 0; done < in_len;) { uint8_t keystream[AES_BLOCK_SIZE]; - enc_block(counter.c, keystream, key); - counter.w[0]++; + enc_block(counter, keystream, key); + CRYPTO_store_u32_le(counter, CRYPTO_load_u32_le(counter) + 1); size_t todo = AES_BLOCK_SIZE; if (in_len - done < todo) { @@ -670,17 +650,10 @@ static void gcm_siv_polyval( CRYPTO_POLYVAL_update_blocks(&polyval_ctx, scratch, sizeof(scratch)); } - union { - uint8_t c[16]; - struct { - uint64_t ad; - uint64_t in; - } bitlens; - } length_block; - - length_block.bitlens.ad = ad_len * 8; - length_block.bitlens.in = in_len * 8; - CRYPTO_POLYVAL_update_blocks(&polyval_ctx, length_block.c, + uint8_t length_block[16]; + CRYPTO_store_u64_le(length_block, ad_len * 8); + CRYPTO_store_u64_le(length_block + 8, in_len * 8); + CRYPTO_POLYVAL_update_blocks(&polyval_ctx, length_block, sizeof(length_block)); CRYPTO_POLYVAL_finish(&polyval_ctx, out_tag); diff --git a/crypto/curve25519/curve25519.c b/crypto/curve25519/curve25519.c index e316acd79c..4bf37b9fa8 100644 --- a/crypto/curve25519/curve25519.c +++ b/crypto/curve25519/curve25519.c @@ -1936,11 +1936,8 @@ int ED25519_verify(const uint8_t *message, size_t message_len, OPENSSL_memcpy(pkcopy, public_key, 32); uint8_t rcopy[32]; OPENSSL_memcpy(rcopy, signature, 32); - union { - uint64_t u64[4]; - uint8_t u8[32]; - } scopy; - OPENSSL_memcpy(&scopy.u8[0], signature + 32, 32); + uint8_t scopy[32]; + OPENSSL_memcpy(scopy, signature + 32, 32); // https://tools.ietf.org/html/rfc8032#section-5.1.7 requires that s be in // the range [0, order) in order to prevent signature malleability. @@ -1953,9 +1950,10 @@ int ED25519_verify(const uint8_t *message, size_t message_len, UINT64_C(0x1000000000000000), }; for (size_t i = 3;; i--) { - if (scopy.u64[i] > kOrder[i]) { + uint64_t word = CRYPTO_load_u64_le(scopy + i * 8); + if (word > kOrder[i]) { return 0; - } else if (scopy.u64[i] < kOrder[i]) { + } else if (word < kOrder[i]) { break; } else if (i == 0) { return 0; @@ -1973,7 +1971,7 @@ int ED25519_verify(const uint8_t *message, size_t message_len, x25519_sc_reduce(h); ge_p2 R; - ge_double_scalarmult_vartime(&R, h, &A, scopy.u8); + ge_double_scalarmult_vartime(&R, h, &A, scopy); uint8_t rcheck[32]; x25519_ge_tobytes(rcheck, &R); diff --git a/crypto/fipsmodule/cipher/e_aes.c b/crypto/fipsmodule/cipher/e_aes.c index 7677355a35..61f4ef355d 100644 --- a/crypto/fipsmodule/cipher/e_aes.c +++ b/crypto/fipsmodule/cipher/e_aes.c @@ -99,16 +99,13 @@ static void vpaes_ctr32_encrypt_blocks_with_bsaes(const uint8_t *in, out += 16 * bsaes_blocks; blocks -= bsaes_blocks; - union { - uint32_t u32[4]; - uint8_t u8[16]; - } new_ivec; - memcpy(new_ivec.u8, ivec, 16); - uint32_t ctr = CRYPTO_bswap4(new_ivec.u32[3]) + bsaes_blocks; - new_ivec.u32[3] = CRYPTO_bswap4(ctr); + uint8_t new_ivec[16]; + memcpy(new_ivec, ivec, 12); + uint32_t ctr = CRYPTO_load_u32_be(ivec + 12) + bsaes_blocks; + CRYPTO_store_u32_be(new_ivec + 12, ctr); // Finish any remaining blocks with |vpaes_ctr32_encrypt_blocks|. - vpaes_ctr32_encrypt_blocks(in, out, blocks, key, new_ivec.u8); + vpaes_ctr32_encrypt_blocks(in, out, blocks, key, new_ivec); } #endif // BSAES diff --git a/crypto/fipsmodule/modes/gcm.c b/crypto/fipsmodule/modes/gcm.c index 5b909aa653..1a77ec0e4c 100644 --- a/crypto/fipsmodule/modes/gcm.c +++ b/crypto/fipsmodule/modes/gcm.c @@ -137,76 +137,69 @@ void CRYPTO_ghash_init(gmult_func *out_mult, ghash_func *out_hash, const uint8_t gcm_key[16]) { *out_is_avx = 0; - union { - uint64_t u[2]; - uint8_t c[16]; - } H; - - OPENSSL_memcpy(H.c, gcm_key, 16); - - // H is stored in host byte order - H.u[0] = CRYPTO_bswap8(H.u[0]); - H.u[1] = CRYPTO_bswap8(H.u[1]); - - OPENSSL_memcpy(out_key, H.c, 16); + // H is stored in host byte order. + uint64_t H[2] = {CRYPTO_load_u64_be(gcm_key), + CRYPTO_load_u64_be(gcm_key + 8)}; + out_key->hi = H[0]; + out_key->lo = H[1]; #if defined(GHASH_ASM_X86_64) if (crypto_gcm_clmul_enabled()) { if (CRYPTO_is_AVX_capable() && CRYPTO_is_MOVBE_capable()) { - gcm_init_avx(out_table, H.u); + gcm_init_avx(out_table, H); *out_mult = gcm_gmult_avx; *out_hash = gcm_ghash_avx; *out_is_avx = 1; return; } - gcm_init_clmul(out_table, H.u); + gcm_init_clmul(out_table, H); *out_mult = gcm_gmult_clmul; *out_hash = gcm_ghash_clmul; return; } if (CRYPTO_is_SSSE3_capable()) { - gcm_init_ssse3(out_table, H.u); + gcm_init_ssse3(out_table, H); *out_mult = gcm_gmult_ssse3; *out_hash = gcm_ghash_ssse3; return; } #elif defined(GHASH_ASM_X86) if (crypto_gcm_clmul_enabled()) { - gcm_init_clmul(out_table, H.u); + gcm_init_clmul(out_table, H); *out_mult = gcm_gmult_clmul; *out_hash = gcm_ghash_clmul; return; } if (CRYPTO_is_SSSE3_capable()) { - gcm_init_ssse3(out_table, H.u); + gcm_init_ssse3(out_table, H); *out_mult = gcm_gmult_ssse3; *out_hash = gcm_ghash_ssse3; return; } #elif defined(GHASH_ASM_ARM) if (gcm_pmull_capable()) { - gcm_init_v8(out_table, H.u); + gcm_init_v8(out_table, H); *out_mult = gcm_gmult_v8; *out_hash = gcm_ghash_v8; return; } if (gcm_neon_capable()) { - gcm_init_neon(out_table, H.u); + gcm_init_neon(out_table, H); *out_mult = gcm_gmult_neon; *out_hash = gcm_ghash_neon; return; } #elif defined(GHASH_ASM_PPC64LE) if (CRYPTO_is_PPC64LE_vcrypto_capable()) { - gcm_init_p8(out_table, H.u); + gcm_init_p8(out_table, H); *out_mult = gcm_gmult_p8; *out_hash = gcm_ghash_p8; return; } #endif - gcm_init_nohw(out_table, H.u); + gcm_init_nohw(out_table, H); *out_mult = gcm_gmult_nohw; *out_hash = gcm_ghash_nohw; } diff --git a/crypto/internal.h b/crypto/internal.h index c9b5e8e4e9..2e94399c24 100644 --- a/crypto/internal.h +++ b/crypto/internal.h @@ -881,6 +881,16 @@ static inline void CRYPTO_store_u32_be(void *out, uint32_t v) { OPENSSL_memcpy(out, &v, sizeof(v)); } +static inline uint64_t CRYPTO_load_u64_le(const void *in) { + uint64_t v; + OPENSSL_memcpy(&v, in, sizeof(v)); + return v; +} + +static inline void CRYPTO_store_u64_le(void *out, uint64_t v) { + OPENSSL_memcpy(out, &v, sizeof(v)); +} + static inline uint64_t CRYPTO_load_u64_be(const void *ptr) { uint64_t ret; OPENSSL_memcpy(&ret, ptr, sizeof(ret)); diff --git a/crypto/siphash/siphash.c b/crypto/siphash/siphash.c index bb9a0c15a2..0921eac2b4 100644 --- a/crypto/siphash/siphash.c +++ b/crypto/siphash/siphash.c @@ -48,8 +48,7 @@ uint64_t SIPHASH_24(const uint64_t key[2], const uint8_t *input, v[3] = key[1] ^ UINT64_C(0x7465646279746573); while (input_len >= sizeof(uint64_t)) { - uint64_t m; - memcpy(&m, input, sizeof(m)); + uint64_t m = CRYPTO_load_u64_le(input); v[3] ^= m; siphash_round(v); siphash_round(v); @@ -59,18 +58,16 @@ uint64_t SIPHASH_24(const uint64_t key[2], const uint8_t *input, input_len -= sizeof(uint64_t); } - union { - uint8_t bytes[8]; - uint64_t word; - } last_block; - last_block.word = 0; - OPENSSL_memcpy(last_block.bytes, input, input_len); - last_block.bytes[7] = orig_input_len & 0xff; + uint8_t last_block[8]; + OPENSSL_memset(last_block, 0, sizeof(last_block)); + OPENSSL_memcpy(last_block, input, input_len); + last_block[7] = orig_input_len & 0xff; - v[3] ^= last_block.word; + uint64_t last_block_word = CRYPTO_load_u64_le(last_block); + v[3] ^= last_block_word; siphash_round(v); siphash_round(v); - v[0] ^= last_block.word; + v[0] ^= last_block_word; v[2] ^= 0xff; siphash_round(v); diff --git a/crypto/siphash/siphash_test.cc b/crypto/siphash/siphash_test.cc index 6d8f9e7a5a..6407e0da56 100644 --- a/crypto/siphash/siphash_test.cc +++ b/crypto/siphash/siphash_test.cc @@ -23,14 +23,12 @@ TEST(SipHash, Basic) { // This is the example from appendix A of the SipHash paper. - union { - uint8_t bytes[16]; - uint64_t words[2]; - } key; - + uint8_t key_bytes[16]; for (unsigned i = 0; i < 16; i++) { - key.bytes[i] = i; + key_bytes[i] = i; } + uint64_t key[2]; + memcpy(key, key_bytes, sizeof(key)); uint8_t input[15]; for (unsigned i = 0; i < sizeof(input); i++) { @@ -38,7 +36,7 @@ TEST(SipHash, Basic) { } EXPECT_EQ(UINT64_C(0xa129ca6149be45e5), - SIPHASH_24(key.words, input, sizeof(input))); + SIPHASH_24(key, input, sizeof(input))); } TEST(SipHash, Vectors) { From 572c416b257991ad1433b907bf827824509e9f3e Mon Sep 17 00:00:00 2001 From: Adam Langley Date: Mon, 6 Jun 2022 13:14:24 -0700 Subject: [PATCH 11/17] More includes for builds that don't use bcm.c Change-Id: Iddc34918dcd3e4a6e80c79e5b8efa11e846c73d1 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52885 Reviewed-by: David Benjamin --- crypto/fipsmodule/cipher/e_aesccm.c | 3 ++- crypto/fipsmodule/digestsign/digestsign.c | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/crypto/fipsmodule/cipher/e_aesccm.c b/crypto/fipsmodule/cipher/e_aesccm.c index 8fb72cb67e..fcbd20b32d 100644 --- a/crypto/fipsmodule/cipher/e_aesccm.c +++ b/crypto/fipsmodule/cipher/e_aesccm.c @@ -54,8 +54,9 @@ #include #include -#include "internal.h" +#include "../delocate.h" #include "../service_indicator/internal.h" +#include "internal.h" struct ccm128_context { diff --git a/crypto/fipsmodule/digestsign/digestsign.c b/crypto/fipsmodule/digestsign/digestsign.c index ec3744c947..e5b6bc762f 100644 --- a/crypto/fipsmodule/digestsign/digestsign.c +++ b/crypto/fipsmodule/digestsign/digestsign.c @@ -58,6 +58,7 @@ #include #include "../../evp/internal.h" +#include "../delocate.h" #include "../digest/internal.h" #include "../service_indicator/internal.h" From 7c2323820e0d456fa64b4c6d7a3b0b52eebf566c Mon Sep 17 00:00:00 2001 From: Sureshbhai Dave Date: Tue, 7 Jun 2022 13:29:07 +0530 Subject: [PATCH 12/17] Add SHA3 support This is required to run SHA3 tests otherwise we get an error of unknown algorithm. Change-Id: I085da2b6757ba1f452f33abc7f1bafc4a404e025 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52926 Reviewed-by: Adam Langley Reviewed-by: Corie Pressley --- util/fipstools/acvp/acvptool/subprocess/subprocess.go | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/util/fipstools/acvp/acvptool/subprocess/subprocess.go b/util/fipstools/acvp/acvptool/subprocess/subprocess.go index c5003e1ad6..5f7a273ecf 100644 --- a/util/fipstools/acvp/acvptool/subprocess/subprocess.go +++ b/util/fipstools/acvp/acvptool/subprocess/subprocess.go @@ -77,6 +77,10 @@ func NewWithIO(cmd *exec.Cmd, in io.WriteCloser, out io.ReadCloser) *Subprocess "SHA2-384": &hashPrimitive{"SHA2-384", 48}, "SHA2-512": &hashPrimitive{"SHA2-512", 64}, "SHA2-512/256": &hashPrimitive{"SHA2-512/256", 32}, + "SHA3-224": &hashPrimitive{"SHA3-224", 28}, + "SHA3-256": &hashPrimitive{"SHA3-256", 32}, + "SHA3-384": &hashPrimitive{"SHA3-384", 48}, + "SHA3-512": &hashPrimitive{"SHA3-512", 64}, "ACVP-AES-ECB": &blockCipher{"AES", 16, 2, true, false, iterateAES}, "ACVP-AES-CBC": &blockCipher{"AES-CBC", 16, 2, true, true, iterateAESCBC}, "ACVP-AES-CBC-CS3": &blockCipher{"AES-CBC-CS3", 16, 1, false, true, iterateAESCBC}, @@ -93,6 +97,10 @@ func NewWithIO(cmd *exec.Cmd, in io.WriteCloser, out io.ReadCloser) *Subprocess "HMAC-SHA2-384": &hmacPrimitive{"HMAC-SHA2-384", 48}, "HMAC-SHA2-512": &hmacPrimitive{"HMAC-SHA2-512", 64}, "HMAC-SHA2-512/256": &hmacPrimitive{"HMAC-SHA2-512/256", 32}, + "HMAC-SHA3-224": &hmacPrimitive{"HMAC-SHA3-224", 28}, + "HMAC-SHA3-256": &hmacPrimitive{"HMAC-SHA3-256", 32}, + "HMAC-SHA3-384": &hmacPrimitive{"HMAC-SHA3-384", 48}, + "HMAC-SHA3-512": &hmacPrimitive{"HMAC-SHA3-512", 64}, "ctrDRBG": &drbg{"ctrDRBG", map[string]bool{"AES-128": true, "AES-192": true, "AES-256": true}}, "hmacDRBG": &drbg{"hmacDRBG", map[string]bool{"SHA-1": true, "SHA2-224": true, "SHA2-256": true, "SHA2-384": true, "SHA2-512": true}}, "KDF": &kdfPrimitive{}, From d27a01eb9048c35acd99b5e7463162f0cf4cad73 Mon Sep 17 00:00:00 2001 From: David Benjamin Date: Mon, 6 Jun 2022 11:25:08 -0400 Subject: [PATCH 13/17] Get the EVP_PKEY_METHOD from EVP_PKEY_ASN1_METHOD. This split in methods probably needs rearranging. This means most callers (i.e. ones that don't use EVP_PKEY_CTX_new_id) will no longer pick up the HKDF EVP method. Bug: 497 Change-Id: I72ef255a472c458e8db62461f7b3ac65fa488535 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52830 Reviewed-by: Adam Langley --- crypto/evp/evp_ctx.c | 46 +++++++++++++++++++------------------ crypto/evp/internal.h | 2 ++ crypto/evp/p_dsa_asn1.c | 2 ++ crypto/evp/p_ec_asn1.c | 2 ++ crypto/evp/p_ed25519_asn1.c | 1 + crypto/evp/p_rsa_asn1.c | 2 ++ crypto/evp/p_x25519_asn1.c | 1 + 7 files changed, 34 insertions(+), 22 deletions(-) diff --git a/crypto/evp/evp_ctx.c b/crypto/evp/evp_ctx.c index fda6536228..5f31ddb398 100644 --- a/crypto/evp/evp_ctx.c +++ b/crypto/evp/evp_ctx.c @@ -84,26 +84,9 @@ static const EVP_PKEY_METHOD *evp_pkey_meth_find(int type) { return NULL; } -static EVP_PKEY_CTX *evp_pkey_ctx_new(EVP_PKEY *pkey, ENGINE *e, int id) { - EVP_PKEY_CTX *ret; - const EVP_PKEY_METHOD *pmeth; - - if (id == -1) { - if (!pkey || !pkey->ameth) { - return NULL; - } - id = pkey->ameth->pkey_id; - } - - pmeth = evp_pkey_meth_find(id); - - if (pmeth == NULL) { - OPENSSL_PUT_ERROR(EVP, EVP_R_UNSUPPORTED_ALGORITHM); - ERR_add_error_dataf("algorithm %d", id); - return NULL; - } - - ret = OPENSSL_malloc(sizeof(EVP_PKEY_CTX)); +static EVP_PKEY_CTX *evp_pkey_ctx_new(EVP_PKEY *pkey, ENGINE *e, + const EVP_PKEY_METHOD *pmeth) { + EVP_PKEY_CTX *ret = OPENSSL_malloc(sizeof(EVP_PKEY_CTX)); if (!ret) { OPENSSL_PUT_ERROR(EVP, ERR_R_MALLOC_FAILURE); return NULL; @@ -131,11 +114,30 @@ static EVP_PKEY_CTX *evp_pkey_ctx_new(EVP_PKEY *pkey, ENGINE *e, int id) { } EVP_PKEY_CTX *EVP_PKEY_CTX_new(EVP_PKEY *pkey, ENGINE *e) { - return evp_pkey_ctx_new(pkey, e, -1); + if (pkey == NULL || pkey->ameth == NULL) { + OPENSSL_PUT_ERROR(EVP, ERR_R_PASSED_NULL_PARAMETER); + return NULL; + } + + const EVP_PKEY_METHOD *pkey_method = pkey->ameth->pkey_method; + if (pkey_method == NULL) { + OPENSSL_PUT_ERROR(EVP, EVP_R_UNSUPPORTED_ALGORITHM); + ERR_add_error_dataf("algorithm %d", pkey->ameth->pkey_id); + return NULL; + } + + return evp_pkey_ctx_new(pkey, e, pkey_method); } EVP_PKEY_CTX *EVP_PKEY_CTX_new_id(int id, ENGINE *e) { - return evp_pkey_ctx_new(NULL, e, id); + const EVP_PKEY_METHOD *pkey_method = evp_pkey_meth_find(id); + if (pkey_method == NULL) { + OPENSSL_PUT_ERROR(EVP, EVP_R_UNSUPPORTED_ALGORITHM); + ERR_add_error_dataf("algorithm %d", id); + return NULL; + } + + return evp_pkey_ctx_new(NULL, e, pkey_method); } void EVP_PKEY_CTX_free(EVP_PKEY_CTX *ctx) { diff --git a/crypto/evp/internal.h b/crypto/evp/internal.h index 95d5917471..c8fce5727e 100644 --- a/crypto/evp/internal.h +++ b/crypto/evp/internal.h @@ -71,6 +71,8 @@ struct evp_pkey_asn1_method_st { uint8_t oid[9]; uint8_t oid_len; + const EVP_PKEY_METHOD *pkey_method; + // pub_decode decodes |params| and |key| as a SubjectPublicKeyInfo // and writes the result into |out|. It returns one on success and zero on // error. |params| is the AlgorithmIdentifier after the OBJECT IDENTIFIER diff --git a/crypto/evp/p_dsa_asn1.c b/crypto/evp/p_dsa_asn1.c index ac91127b69..ebc50c853b 100644 --- a/crypto/evp/p_dsa_asn1.c +++ b/crypto/evp/p_dsa_asn1.c @@ -252,6 +252,8 @@ const EVP_PKEY_ASN1_METHOD dsa_asn1_meth = { // 1.2.840.10040.4.1 {0x2a, 0x86, 0x48, 0xce, 0x38, 0x04, 0x01}, 7, + NULL /* pkey_method */, + dsa_pub_decode, dsa_pub_encode, dsa_pub_cmp, diff --git a/crypto/evp/p_ec_asn1.c b/crypto/evp/p_ec_asn1.c index dedc5e0ae5..dd421217c7 100644 --- a/crypto/evp/p_ec_asn1.c +++ b/crypto/evp/p_ec_asn1.c @@ -230,6 +230,8 @@ const EVP_PKEY_ASN1_METHOD ec_asn1_meth = { // 1.2.840.10045.2.1 {0x2a, 0x86, 0x48, 0xce, 0x3d, 0x02, 0x01}, 7, + &ec_pkey_meth, + eckey_pub_decode, eckey_pub_encode, eckey_pub_cmp, diff --git a/crypto/evp/p_ed25519_asn1.c b/crypto/evp/p_ed25519_asn1.c index 1f996cf6de..8d68592873 100644 --- a/crypto/evp/p_ed25519_asn1.c +++ b/crypto/evp/p_ed25519_asn1.c @@ -202,6 +202,7 @@ const EVP_PKEY_ASN1_METHOD ed25519_asn1_meth = { EVP_PKEY_ED25519, {0x2b, 0x65, 0x70}, 3, + &ed25519_pkey_meth, ed25519_pub_decode, ed25519_pub_encode, ed25519_pub_cmp, diff --git a/crypto/evp/p_rsa_asn1.c b/crypto/evp/p_rsa_asn1.c index c0971038ee..2e4942a122 100644 --- a/crypto/evp/p_rsa_asn1.c +++ b/crypto/evp/p_rsa_asn1.c @@ -171,6 +171,8 @@ const EVP_PKEY_ASN1_METHOD rsa_asn1_meth = { // 1.2.840.113549.1.1.1 {0x2a, 0x86, 0x48, 0x86, 0xf7, 0x0d, 0x01, 0x01, 0x01}, 9, + &rsa_pkey_meth, + rsa_pub_decode, rsa_pub_encode, rsa_pub_cmp, diff --git a/crypto/evp/p_x25519_asn1.c b/crypto/evp/p_x25519_asn1.c index 99b4cc9c9f..182f6a2d99 100644 --- a/crypto/evp/p_x25519_asn1.c +++ b/crypto/evp/p_x25519_asn1.c @@ -199,6 +199,7 @@ const EVP_PKEY_ASN1_METHOD x25519_asn1_meth = { EVP_PKEY_X25519, {0x2b, 0x65, 0x6e}, 3, + &x25519_pkey_meth, x25519_pub_decode, x25519_pub_encode, x25519_pub_cmp, From efd09b7e370e6975f795cdbc54ff6e0941ebf274 Mon Sep 17 00:00:00 2001 From: David Benjamin Date: Tue, 31 May 2022 11:02:59 -0400 Subject: [PATCH 14/17] Const-correct bn_gather5. Not that we get much type-checking from this, as an assembly function. Change-Id: I21643444cfc577e2d68f11891e602724ded52e7f Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52831 Reviewed-by: Adam Langley --- crypto/fipsmodule/bn/internal.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/crypto/fipsmodule/bn/internal.h b/crypto/fipsmodule/bn/internal.h index 79b25b043b..50fd362d3f 100644 --- a/crypto/fipsmodule/bn/internal.h +++ b/crypto/fipsmodule/bn/internal.h @@ -391,7 +391,7 @@ void bn_scatter5(const BN_ULONG *inp, size_t num, BN_ULONG *table, // bn_gather5 loads index |power| of |table| and stores it in |out|. |out| and // each entry of |table| are |num| words long. |power| must be less than 32. -void bn_gather5(BN_ULONG *out, size_t num, BN_ULONG *table, size_t power); +void bn_gather5(BN_ULONG *out, size_t num, const BN_ULONG *table, size_t power); // bn_power5 squares |ap| five times and multiplies it by the value stored at // index |power| of |table|, modulo |np|. It stores the result in |rp|. The From 0ebd69bd1e0ae834e01935ad0c5cfac63a5aea32 Mon Sep 17 00:00:00 2001 From: David Benjamin Date: Mon, 13 Jun 2022 17:47:39 -0400 Subject: [PATCH 15/17] Add BN_GENCB_get_arg. bind uses this function. Change-Id: I97ba86d9f75597bff125ae0b56952effc397e6b8 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/53010 Commit-Queue: David Benjamin Reviewed-by: Bob Beck Commit-Queue: Bob Beck --- crypto/fipsmodule/bn/prime.c | 2 ++ include/openssl/bn.h | 3 +++ 2 files changed, 5 insertions(+) diff --git a/crypto/fipsmodule/bn/prime.c b/crypto/fipsmodule/bn/prime.c index 0c5edfeeba..0578558296 100644 --- a/crypto/fipsmodule/bn/prime.c +++ b/crypto/fipsmodule/bn/prime.c @@ -386,6 +386,8 @@ int BN_GENCB_call(BN_GENCB *callback, int event, int n) { return callback->callback(event, n, callback); } +void *BN_GENCB_get_arg(const BN_GENCB *callback) { return callback->arg; } + int BN_generate_prime_ex(BIGNUM *ret, int bits, int safe, const BIGNUM *add, const BIGNUM *rem, BN_GENCB *cb) { BIGNUM *t; diff --git a/include/openssl/bn.h b/include/openssl/bn.h index d9491a936b..9abccaffcd 100644 --- a/include/openssl/bn.h +++ b/include/openssl/bn.h @@ -681,6 +681,9 @@ OPENSSL_EXPORT void BN_GENCB_set(BN_GENCB *callback, // the callback, or 1 if |callback| is NULL. OPENSSL_EXPORT int BN_GENCB_call(BN_GENCB *callback, int event, int n); +// BN_GENCB_get_arg returns |callback->arg|. +OPENSSL_EXPORT void *BN_GENCB_get_arg(const BN_GENCB *callback); + // BN_generate_prime_ex sets |ret| to a prime number of |bits| length. If safe // is non-zero then the prime will be such that (ret-1)/2 is also a prime. // (This is needed for Diffie-Hellman groups to ensure that the only subgroups From f8f97bfcbb97c8bd14c4cb6942dd7b0c492cb3e0 Mon Sep 17 00:00:00 2001 From: David Benjamin Date: Sun, 12 Jun 2022 11:10:08 -0400 Subject: [PATCH 16/17] Don't guard alignof static asserts on GCC/Clang. I'm not sure what the history of this is, but it seems to work just fine in MSVC now. Change-Id: Iebdc365486bb30a61a1001f705aef7dcaa2a9fcd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52985 Reviewed-by: Adam Langley --- crypto/cipher_extra/e_aesctrhmac.c | 2 -- crypto/cipher_extra/e_aesgcmsiv.c | 4 ---- crypto/cipher_extra/e_chacha20poly1305.c | 2 -- crypto/cipher_extra/e_tls.c | 2 -- crypto/fipsmodule/cipher/e_aes.c | 8 -------- crypto/fipsmodule/cipher/e_aesccm.c | 2 -- crypto/thread_pthread.c | 2 -- crypto/thread_win.c | 2 -- 8 files changed, 24 deletions(-) diff --git a/crypto/cipher_extra/e_aesctrhmac.c b/crypto/cipher_extra/e_aesctrhmac.c index 8c45c8111a..c75f8a836e 100644 --- a/crypto/cipher_extra/e_aesctrhmac.c +++ b/crypto/cipher_extra/e_aesctrhmac.c @@ -38,11 +38,9 @@ struct aead_aes_ctr_hmac_sha256_ctx { OPENSSL_STATIC_ASSERT(sizeof(((EVP_AEAD_CTX *)NULL)->state) >= sizeof(struct aead_aes_ctr_hmac_sha256_ctx), "AEAD state is too small"); -#if defined(__GNUC__) || defined(__clang__) OPENSSL_STATIC_ASSERT(alignof(union evp_aead_ctx_st_state) >= alignof(struct aead_aes_ctr_hmac_sha256_ctx), "AEAD state has insufficient alignment"); -#endif static void hmac_init(SHA256_CTX *out_inner, SHA256_CTX *out_outer, const uint8_t hmac_key[32]) { diff --git a/crypto/cipher_extra/e_aesgcmsiv.c b/crypto/cipher_extra/e_aesgcmsiv.c index 555922dd74..ee20e1bb1d 100644 --- a/crypto/cipher_extra/e_aesgcmsiv.c +++ b/crypto/cipher_extra/e_aesgcmsiv.c @@ -45,10 +45,8 @@ struct aead_aes_gcm_siv_asm_ctx { OPENSSL_STATIC_ASSERT(sizeof(((EVP_AEAD_CTX *)NULL)->state) + 8 >= sizeof(struct aead_aes_gcm_siv_asm_ctx), "AEAD state is too small"); -#if defined(__GNUC__) || defined(__clang__) OPENSSL_STATIC_ASSERT(alignof(union evp_aead_ctx_st_state) >= 8, "AEAD state has insufficient alignment"); -#endif // asm_ctx_from_ctx returns a 16-byte aligned context pointer from |ctx|. static struct aead_aes_gcm_siv_asm_ctx *asm_ctx_from_ctx( @@ -555,11 +553,9 @@ struct aead_aes_gcm_siv_ctx { OPENSSL_STATIC_ASSERT(sizeof(((EVP_AEAD_CTX *)NULL)->state) >= sizeof(struct aead_aes_gcm_siv_ctx), "AEAD state is too small"); -#if defined(__GNUC__) || defined(__clang__) OPENSSL_STATIC_ASSERT(alignof(union evp_aead_ctx_st_state) >= alignof(struct aead_aes_gcm_siv_ctx), "AEAD state has insufficient alignment"); -#endif static int aead_aes_gcm_siv_init(EVP_AEAD_CTX *ctx, const uint8_t *key, size_t key_len, size_t tag_len) { diff --git a/crypto/cipher_extra/e_chacha20poly1305.c b/crypto/cipher_extra/e_chacha20poly1305.c index 1650188958..a9fce3e88e 100644 --- a/crypto/cipher_extra/e_chacha20poly1305.c +++ b/crypto/cipher_extra/e_chacha20poly1305.c @@ -35,11 +35,9 @@ struct aead_chacha20_poly1305_ctx { OPENSSL_STATIC_ASSERT(sizeof(((EVP_AEAD_CTX *)NULL)->state) >= sizeof(struct aead_chacha20_poly1305_ctx), "AEAD state is too small"); -#if defined(__GNUC__) || defined(__clang__) OPENSSL_STATIC_ASSERT(alignof(union evp_aead_ctx_st_state) >= alignof(struct aead_chacha20_poly1305_ctx), "AEAD state has insufficient alignment"); -#endif static int aead_chacha20_poly1305_init(EVP_AEAD_CTX *ctx, const uint8_t *key, size_t key_len, size_t tag_len) { diff --git a/crypto/cipher_extra/e_tls.c b/crypto/cipher_extra/e_tls.c index 6d84f7f02c..9713dc7eaa 100644 --- a/crypto/cipher_extra/e_tls.c +++ b/crypto/cipher_extra/e_tls.c @@ -48,11 +48,9 @@ OPENSSL_STATIC_ASSERT(EVP_MAX_MD_SIZE < 256, OPENSSL_STATIC_ASSERT(sizeof(((EVP_AEAD_CTX *)NULL)->state) >= sizeof(AEAD_TLS_CTX), "AEAD state is too small"); -#if defined(__GNUC__) || defined(__clang__) OPENSSL_STATIC_ASSERT(alignof(union evp_aead_ctx_st_state) >= alignof(AEAD_TLS_CTX), "AEAD state has insufficient alignment"); -#endif static void aead_tls_cleanup(EVP_AEAD_CTX *ctx) { AEAD_TLS_CTX *tls_ctx = (AEAD_TLS_CTX *)&ctx->state; diff --git a/crypto/fipsmodule/cipher/e_aes.c b/crypto/fipsmodule/cipher/e_aes.c index 61f4ef355d..ad64a39c3f 100644 --- a/crypto/fipsmodule/cipher/e_aes.c +++ b/crypto/fipsmodule/cipher/e_aes.c @@ -335,11 +335,9 @@ ctr128_f aes_ctr_set_key(AES_KEY *aes_key, GCM128_KEY *gcm_key, #endif static EVP_AES_GCM_CTX *aes_gcm_from_cipher_ctx(EVP_CIPHER_CTX *ctx) { -#if defined(__GNUC__) || defined(__clang__) OPENSSL_STATIC_ASSERT( alignof(EVP_AES_GCM_CTX) <= 16, "EVP_AES_GCM_CTX needs more alignment than this function provides"); -#endif // |malloc| guarantees up to 4-byte alignment on 32-bit and 8-byte alignment // on 64-bit systems, so we need to adjust to reach 16-byte alignment. @@ -926,11 +924,9 @@ static int aead_aes_gcm_init_impl(struct aead_aes_gcm_ctx *gcm_ctx, OPENSSL_STATIC_ASSERT(sizeof(((EVP_AEAD_CTX *)NULL)->state) >= sizeof(struct aead_aes_gcm_ctx), "AEAD state is too small"); -#if defined(__GNUC__) || defined(__clang__) OPENSSL_STATIC_ASSERT(alignof(union evp_aead_ctx_st_state) >= alignof(struct aead_aes_gcm_ctx), "AEAD state has insufficient alignment"); -#endif static int aead_aes_gcm_init(EVP_AEAD_CTX *ctx, const uint8_t *key, size_t key_len, size_t requested_tag_len) { @@ -1266,11 +1262,9 @@ struct aead_aes_gcm_tls12_ctx { OPENSSL_STATIC_ASSERT(sizeof(((EVP_AEAD_CTX *)NULL)->state) >= sizeof(struct aead_aes_gcm_tls12_ctx), "AEAD state is too small"); -#if defined(__GNUC__) || defined(__clang__) OPENSSL_STATIC_ASSERT(alignof(union evp_aead_ctx_st_state) >= alignof(struct aead_aes_gcm_tls12_ctx), "AEAD state has insufficient alignment"); -#endif static int aead_aes_gcm_tls12_init(EVP_AEAD_CTX *ctx, const uint8_t *key, size_t key_len, size_t requested_tag_len) { @@ -1365,11 +1359,9 @@ struct aead_aes_gcm_tls13_ctx { OPENSSL_STATIC_ASSERT(sizeof(((EVP_AEAD_CTX *)NULL)->state) >= sizeof(struct aead_aes_gcm_tls13_ctx), "AEAD state is too small"); -#if defined(__GNUC__) || defined(__clang__) OPENSSL_STATIC_ASSERT(alignof(union evp_aead_ctx_st_state) >= alignof(struct aead_aes_gcm_tls13_ctx), "AEAD state has insufficient alignment"); -#endif static int aead_aes_gcm_tls13_init(EVP_AEAD_CTX *ctx, const uint8_t *key, size_t key_len, size_t requested_tag_len) { diff --git a/crypto/fipsmodule/cipher/e_aesccm.c b/crypto/fipsmodule/cipher/e_aesccm.c index fcbd20b32d..c9d52e0a5b 100644 --- a/crypto/fipsmodule/cipher/e_aesccm.c +++ b/crypto/fipsmodule/cipher/e_aesccm.c @@ -279,11 +279,9 @@ struct aead_aes_ccm_ctx { OPENSSL_STATIC_ASSERT(sizeof(((EVP_AEAD_CTX *)NULL)->state) >= sizeof(struct aead_aes_ccm_ctx), "AEAD state is too small"); -#if defined(__GNUC__) || defined(__clang__) OPENSSL_STATIC_ASSERT(alignof(union evp_aead_ctx_st_state) >= alignof(struct aead_aes_ccm_ctx), "AEAD state has insufficient alignment"); -#endif static int aead_aes_ccm_init(EVP_AEAD_CTX *ctx, const uint8_t *key, size_t key_len, size_t tag_len, unsigned M, diff --git a/crypto/thread_pthread.c b/crypto/thread_pthread.c index e873d04909..1b8329429c 100644 --- a/crypto/thread_pthread.c +++ b/crypto/thread_pthread.c @@ -26,10 +26,8 @@ OPENSSL_STATIC_ASSERT(sizeof(CRYPTO_MUTEX) >= sizeof(pthread_rwlock_t), "CRYPTO_MUTEX is too small"); -#if defined(__GNUC__) || defined(__clang__) OPENSSL_STATIC_ASSERT(alignof(CRYPTO_MUTEX) >= alignof(pthread_rwlock_t), "CRYPTO_MUTEX has insufficient alignment"); -#endif void CRYPTO_MUTEX_init(CRYPTO_MUTEX *lock) { if (pthread_rwlock_init((pthread_rwlock_t *) lock, NULL) != 0) { diff --git a/crypto/thread_win.c b/crypto/thread_win.c index 49ecc12a11..1065884f11 100644 --- a/crypto/thread_win.c +++ b/crypto/thread_win.c @@ -29,10 +29,8 @@ OPENSSL_MSVC_PRAGMA(warning(pop)) OPENSSL_STATIC_ASSERT(sizeof(CRYPTO_MUTEX) >= sizeof(SRWLOCK), "CRYPTO_MUTEX is too small"); -#if defined(__GNUC__) || defined(__clang__) OPENSSL_STATIC_ASSERT(alignof(CRYPTO_MUTEX) >= alignof(SRWLOCK), "CRYPTO_MUTEX has insufficient alignment"); -#endif static BOOL CALLBACK call_once_init(INIT_ONCE *once, void *arg, void **out) { void (**init)(void) = (void (**)(void))arg; From 167f1760ddfeaea1ee1a671ca88aafcccfe30ee0 Mon Sep 17 00:00:00 2001 From: David Benjamin Date: Sun, 12 Jun 2022 13:47:08 -0400 Subject: [PATCH 17/17] Fix build with MSVC 2022. MSVC 2022's C4191 warns on most function pointer casts. Fix and/or silence them: connect.c is an unavoidable false positive. We're casting the pointer to the correct type. The problem was that the caller is required to first cast it to the wrong type in OpenSSL's API, due to the BIO_callback_ctrl calling convention. Suppress it. LHASH_OF(T) and STACK_OF(T)'s defintions also have a false positive. Suppress that warning. Calling the functions through the casted types would indeed be UB, but we don't do that. We use them as goofy type-erased types. The problem is there is no function pointer equivalent of void*. (Using actual void* instead trips a GCC warning.) The sk_sort instance is a true instance of UB. The problem is qsort lacks a context parameter. I've made sk_sort call qsort_s on _MSC_VER, to silence the warning. Ideally we'd fix it on other platforms, but qsort_r and qsort_s are a disaster. See https://stackoverflow.com/a/39561369 Fixed: 495 Change-Id: I0dca80670c74afaa03fc5c8fd7059b4cfadfac72 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/53005 Reviewed-by: Adam Langley Commit-Queue: Adam Langley --- crypto/bio/connect.c | 12 +++++-- crypto/lhash/internal.h | 14 ++++++++- crypto/stack/stack.c | 49 +++++++++++++++++++---------- include/openssl/stack.h | 69 +++++++++++++++++++++++++++-------------- 4 files changed, 101 insertions(+), 43 deletions(-) diff --git a/crypto/bio/connect.c b/crypto/bio/connect.c index 3b65acfca7..9b86e5138a 100644 --- a/crypto/bio/connect.c +++ b/crypto/bio/connect.c @@ -117,7 +117,8 @@ static int closesocket(int sock) { // split_host_and_port sets |*out_host| and |*out_port| to the host and port // parsed from |name|. It returns one on success or zero on error. Even when // successful, |*out_port| may be NULL on return if no port was specified. -static int split_host_and_port(char **out_host, char **out_port, const char *name) { +static int split_host_and_port(char **out_host, char **out_port, + const char *name) { const char *host, *port = NULL; size_t host_len = 0; @@ -466,8 +467,7 @@ static long conn_ctrl(BIO *bio, int cmd, long num, void *ptr) { case BIO_CTRL_FLUSH: break; case BIO_CTRL_GET_CALLBACK: { - int (**fptr)(const BIO *bio, int state, int xret); - fptr = (int (**)(const BIO *bio, int state, int xret))ptr; + int (**fptr)(const BIO *bio, int state, int xret) = ptr; *fptr = data->info_callback; } break; default: @@ -485,7 +485,13 @@ static long conn_callback_ctrl(BIO *bio, int cmd, bio_info_cb fp) { switch (cmd) { case BIO_CTRL_SET_CALLBACK: + // This is the actual type signature of |fp|. The caller is expected to + // cast it to |bio_info_cb| due to the |BIO_callback_ctrl| calling + // convention. + OPENSSL_MSVC_PRAGMA(warning(push)) + OPENSSL_MSVC_PRAGMA(warning(disable : 4191)) data->info_callback = (int (*)(const struct bio_st *, int, int))fp; + OPENSSL_MSVC_PRAGMA(warning(pop)) break; default: ret = 0; diff --git a/crypto/lhash/internal.h b/crypto/lhash/internal.h index 64dca1d364..512f06df44 100644 --- a/crypto/lhash/internal.h +++ b/crypto/lhash/internal.h @@ -157,6 +157,16 @@ OPENSSL_EXPORT void OPENSSL_lh_doall_arg(_LHASH *lh, void *arg); #define DEFINE_LHASH_OF(type) \ + /* We disable MSVC C4191 in this macro, which warns when pointers are cast \ + * to the wrong type. While the cast itself is valid, it is often a bug \ + * because calling it through the cast is UB. However, we never actually \ + * call functions as |lhash_cmp_func|. The type is just a type-erased \ + * function pointer. (C does not guarantee function pointers fit in \ + * |void*|, and GCC will warn on this.) Thus we just disable the false \ + * positive warning. */ \ + OPENSSL_MSVC_PRAGMA(warning(push)) \ + OPENSSL_MSVC_PRAGMA(warning(disable : 4191)) \ + \ DECLARE_LHASH_OF(type) \ \ typedef int (*lhash_##type##_cmp_func)(const type *, const type *); \ @@ -243,7 +253,9 @@ OPENSSL_EXPORT void OPENSSL_lh_doall_arg(_LHASH *lh, LHASH_OF(type) *lh, void (*func)(type *, void *), void *arg) { \ LHASH_DOALL_##type cb = {func, arg}; \ OPENSSL_lh_doall_arg((_LHASH *)lh, lh_##type##_call_doall_arg, &cb); \ - } + } \ + \ + OPENSSL_MSVC_PRAGMA(warning(pop)) #if defined(__cplusplus) diff --git a/crypto/stack/stack.c b/crypto/stack/stack.c index 6da6e3b232..6412e07bd1 100644 --- a/crypto/stack/stack.c +++ b/crypto/stack/stack.c @@ -131,7 +131,7 @@ void sk_free(_STACK *sk) { OPENSSL_free(sk); } -void sk_pop_free_ex(_STACK *sk, void (*call_free_func)(stack_free_func, void *), +void sk_pop_free_ex(_STACK *sk, stack_call_free_func call_free_func, stack_free_func free_func) { if (sk == NULL) { return; @@ -234,8 +234,7 @@ void *sk_delete_ptr(_STACK *sk, const void *p) { } int sk_find(const _STACK *sk, size_t *out_index, const void *p, - int (*call_cmp_func)(stack_cmp_func, const void **, - const void **)) { + stack_call_cmp_func call_cmp_func) { if (sk == NULL) { return 0; } @@ -355,24 +354,43 @@ _STACK *sk_dup(const _STACK *sk) { return NULL; } -void sk_sort(_STACK *sk) { +#if defined(_MSC_VER) +struct sort_compare_ctx { + stack_call_cmp_func call_cmp_func; + stack_cmp_func cmp_func; +}; + +static int sort_compare(void *ctx_v, const void *a, const void *b) { + struct sort_compare_ctx *ctx = ctx_v; + return ctx->call_cmp_func(ctx->cmp_func, a, b); +} +#endif + +void sk_sort(_STACK *sk, stack_call_cmp_func call_cmp_func) { if (sk == NULL || sk->comp == NULL || sk->sorted) { return; } - // sk->comp is a function that takes pointers to pointers to elements, but - // qsort take a comparison function that just takes pointers to elements. - // However, since we're passing an array of pointers to qsort, we can just - // cast the comparison function and everything works. - // - // TODO(davidben): This is undefined behavior, but the call is in libc so, - // e.g., CFI does not notice. Unfortunately, |qsort| is missing a void* - // parameter in its callback and |qsort_s| / |qsort_r| are a mess of - // incompatibility. if (sk->num >= 2) { +#if defined(_MSC_VER) + // MSVC's |qsort_s| is different from the C11 one. + // https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/qsort-s?view=msvc-170 + struct sort_compare_ctx ctx = {call_cmp_func, sk->comp}; + qsort_s(sk->data, sk->num, sizeof(void *), sort_compare, &ctx); +#else + // sk->comp is a function that takes pointers to pointers to elements, but + // qsort take a comparison function that just takes pointers to elements. + // However, since we're passing an array of pointers to qsort, we can just + // cast the comparison function and everything works. + // + // TODO(davidben): This is undefined behavior, but the call is in libc so, + // e.g., CFI does not notice. |qsort| is missing a void* parameter in its + // callback, while no one defines |qsort_r| or |qsort_s| consistently. See + // https://stackoverflow.com/a/39561369 int (*comp_func)(const void *, const void *) = (int (*)(const void *, const void *))(sk->comp); qsort(sk->data, sk->num, sizeof(void *), comp_func); +#endif } sk->sorted = 1; } @@ -395,10 +413,9 @@ stack_cmp_func sk_set_cmp_func(_STACK *sk, stack_cmp_func comp) { return old; } -_STACK *sk_deep_copy(const _STACK *sk, - void *(*call_copy_func)(stack_copy_func, void *), +_STACK *sk_deep_copy(const _STACK *sk, stack_call_copy_func call_copy_func, stack_copy_func copy_func, - void (*call_free_func)(stack_free_func, void *), + stack_call_free_func call_free_func, stack_free_func free_func) { _STACK *ret = sk_dup(sk); if (ret == NULL) { diff --git a/include/openssl/stack.h b/include/openssl/stack.h index df54713f18..87403678a3 100644 --- a/include/openssl/stack.h +++ b/include/openssl/stack.h @@ -101,10 +101,20 @@ typedef void *(*stack_copy_func)(void *ptr); // extra indirection - the function is given a pointer to a pointer to the // element. This differs from the usual qsort/bsearch comparison function. // -// Note its actual type is int (*)(const T **, const T **). Low-level |sk_*| +// Note its actual type is |int (*)(const T **a, const T **b)|. Low-level |sk_*| // functions will be passed a type-specific wrapper to call it correctly. +// +// TODO(davidben): This type should be |const T *const *|. It is already fixed +// in OpenSSL 1.1.1, so hopefully we can fix this compatibly. typedef int (*stack_cmp_func)(const void **a, const void **b); +// The following function types call the above type-erased signatures with the +// true types. +typedef void (*stack_call_free_func)(stack_free_func, void *); +typedef void *(*stack_call_copy_func)(stack_copy_func, void *); +typedef int (*stack_call_cmp_func)(stack_cmp_func, const void *const *, + const void *const *); + // stack_st contains an array of pointers. It is not designed to be used // directly, rather the wrapper macros should be used. typedef struct stack_st { @@ -161,8 +171,7 @@ OPENSSL_EXPORT void sk_free(_STACK *sk); // |sk_pop_free_ex| as a workaround for existing code calling an older version // of |sk_pop_free|. OPENSSL_EXPORT void sk_pop_free_ex(_STACK *sk, - void (*call_free_func)(stack_free_func, - void *), + stack_call_free_func call_free_func, stack_free_func free_func); // sk_insert inserts |p| into the stack at index |where|, moving existing @@ -192,8 +201,7 @@ OPENSSL_EXPORT void *sk_delete_ptr(_STACK *sk, const void *p); // OpenSSL's sk_find will implicitly sort |sk| if it has a comparison function // defined. OPENSSL_EXPORT int sk_find(const _STACK *sk, size_t *out_index, const void *p, - int (*call_cmp_func)(stack_cmp_func, const void **, - const void **)); + stack_call_cmp_func call_cmp_func); // sk_shift removes and returns the first element in the stack, or returns NULL // if the stack is empty. @@ -214,7 +222,7 @@ OPENSSL_EXPORT _STACK *sk_dup(const _STACK *sk); // sk_sort sorts the elements of |sk| into ascending order based on the // comparison function. The stack maintains a |sorted| flag and sorting an // already sorted stack is a no-op. -OPENSSL_EXPORT void sk_sort(_STACK *sk); +OPENSSL_EXPORT void sk_sort(_STACK *sk, stack_call_cmp_func call_cmp_func); // sk_is_sorted returns one if |sk| is known to be sorted and zero // otherwise. @@ -227,10 +235,11 @@ OPENSSL_EXPORT stack_cmp_func sk_set_cmp_func(_STACK *sk, stack_cmp_func comp); // sk_deep_copy performs a copy of |sk| and of each of the non-NULL elements in // |sk| by using |copy_func|. If an error occurs, |free_func| is used to free // any copies already made and NULL is returned. -OPENSSL_EXPORT _STACK *sk_deep_copy( - const _STACK *sk, void *(*call_copy_func)(stack_copy_func, void *), - stack_copy_func copy_func, void (*call_free_func)(stack_free_func, void *), - stack_free_func free_func); +OPENSSL_EXPORT _STACK *sk_deep_copy(const _STACK *sk, + stack_call_copy_func call_copy_func, + stack_copy_func copy_func, + stack_call_free_func call_free_func, + stack_free_func free_func); // Deprecated functions. @@ -277,6 +286,16 @@ BSSL_NAMESPACE_END #endif #define BORINGSSL_DEFINE_STACK_OF_IMPL(name, ptrtype, constptrtype) \ + /* We disable MSVC C4191 in this macro, which warns when pointers are cast \ + * to the wrong type. While the cast itself is valid, it is often a bug \ + * because calling it through the cast is UB. However, we never actually \ + * call functions as |stack_cmp_func|. The type is just a type-erased \ + * function pointer. (C does not guarantee function pointers fit in \ + * |void*|, and GCC will warn on this.) Thus we just disable the false \ + * positive warning. */ \ + OPENSSL_MSVC_PRAGMA(warning(push)) \ + OPENSSL_MSVC_PRAGMA(warning(disable : 4191)) \ + \ DECLARE_STACK_OF(name) \ \ typedef void (*stack_##name##_free_func)(ptrtype); \ @@ -294,14 +313,17 @@ BSSL_NAMESPACE_END } \ \ OPENSSL_INLINE int sk_##name##_call_cmp_func( \ - stack_cmp_func cmp_func, const void **a, const void **b) { \ + stack_cmp_func cmp_func, const void *const *a, const void *const *b) { \ + /* The data is actually stored as |void*| pointers, so read the pointer \ + * as |void*| and then pass the corrected type into the caller-supplied \ + * function, which expects |constptrtype*|. */ \ constptrtype a_ptr = (constptrtype)*a; \ constptrtype b_ptr = (constptrtype)*b; \ return ((stack_##name##_cmp_func)cmp_func)(&a_ptr, &b_ptr); \ } \ \ - OPENSSL_INLINE STACK_OF(name) * \ - sk_##name##_new(stack_##name##_cmp_func comp) { \ + OPENSSL_INLINE STACK_OF(name) *sk_##name##_new( \ + stack_##name##_cmp_func comp) { \ return (STACK_OF(name) *)sk_new((stack_cmp_func)comp); \ } \ \ @@ -327,12 +349,12 @@ BSSL_NAMESPACE_END return (ptrtype)sk_set((_STACK *)sk, i, (void *)p); \ } \ \ - OPENSSL_INLINE void sk_##name##_free(STACK_OF(name) * sk) { \ + OPENSSL_INLINE void sk_##name##_free(STACK_OF(name) *sk) { \ sk_free((_STACK *)sk); \ } \ \ OPENSSL_INLINE void sk_##name##_pop_free( \ - STACK_OF(name) * sk, stack_##name##_free_func free_func) { \ + STACK_OF(name) *sk, stack_##name##_free_func free_func) { \ sk_pop_free_ex((_STACK *)sk, sk_##name##_call_free_func, \ (stack_free_func)free_func); \ } \ @@ -353,7 +375,7 @@ BSSL_NAMESPACE_END } \ \ OPENSSL_INLINE int sk_##name##_find(const STACK_OF(name) *sk, \ - size_t * out_index, constptrtype p) { \ + size_t *out_index, constptrtype p) { \ return sk_find((const _STACK *)sk, out_index, (const void *)p, \ sk_##name##_call_cmp_func); \ } \ @@ -370,12 +392,12 @@ BSSL_NAMESPACE_END return (ptrtype)sk_pop((_STACK *)sk); \ } \ \ - OPENSSL_INLINE STACK_OF(name) * sk_##name##_dup(const STACK_OF(name) *sk) { \ + OPENSSL_INLINE STACK_OF(name) *sk_##name##_dup(const STACK_OF(name) *sk) { \ return (STACK_OF(name) *)sk_dup((const _STACK *)sk); \ } \ \ OPENSSL_INLINE void sk_##name##_sort(STACK_OF(name) *sk) { \ - sk_sort((_STACK *)sk); \ + sk_sort((_STACK *)sk, sk_##name##_call_cmp_func); \ } \ \ OPENSSL_INLINE int sk_##name##_is_sorted(const STACK_OF(name) *sk) { \ @@ -388,15 +410,16 @@ BSSL_NAMESPACE_END (stack_cmp_func)comp); \ } \ \ - OPENSSL_INLINE STACK_OF(name) * \ - sk_##name##_deep_copy(const STACK_OF(name) *sk, \ - ptrtype(*copy_func)(ptrtype), \ - void (*free_func)(ptrtype)) { \ + OPENSSL_INLINE STACK_OF(name) *sk_##name##_deep_copy( \ + const STACK_OF(name) *sk, ptrtype (*copy_func)(ptrtype), \ + void (*free_func)(ptrtype)) { \ return (STACK_OF(name) *)sk_deep_copy( \ (const _STACK *)sk, sk_##name##_call_copy_func, \ (stack_copy_func)copy_func, sk_##name##_call_free_func, \ (stack_free_func)free_func); \ - } + } \ + \ + OPENSSL_MSVC_PRAGMA(warning(pop)) // DEFINE_NAMED_STACK_OF defines |STACK_OF(name)| to be a stack whose elements // are |type| *.