STM32 Hash Accelerator #2528

caleb-garrett · 2024-02-03T22:24:56Z

Adds STM32 Hash accelerator HAL to embassy-stm32. This includes support for the MD5, SHA1, SHA224, SHA256, SHA-512/224, SHA-512/256, and SHA-512 algorithms (depending on device). HMAC support is the only feature of this peripheral that is not yet implemented.

The API mimics the start/update/finish of other popular crypto libraries, so integration is theoretically straightforward. This includes use of the context switching feature where one digest calculation may be paused for another, then later resumed. This is particularly useful when digests are computed in a concurrent TX/RX scenario.

An example for the STM32F7 series is included that demonstrates use and compares execution time of the SHA256 algorithm with the sha2 crate. The hardware calculation runs >3x faster than the sha2 crate's implementation in my tests.

caleb-garrett · 2024-02-04T22:28:50Z

Note that the hash_v1 implementation differs from the hash_v2 and hash_v3 implementations in that it does not integrate DMA. While the peripheral supports DMA, it does not support multiple DMA transfers (MDMAT). This makes it impractical for DMA to be used in a context-switching scenario. Thus, DMA is only implemented for hash_v2 and hash_v3.

Dirbaio · 2024-02-04T22:42:42Z

Very cool!

Can you add on-hardware tests for HASH here? This is a prime candidate for testing since it needs no connections to external hardware.

caleb-garrett · 2024-02-04T23:10:10Z

Will do!

Dirbaio · 2024-02-05T19:46:44Z

the trusted label makes CI run the tests on hardware every time you push :)

caleb-garrett · 2024-02-05T23:17:41Z

50% of the 8 supported boards failed on the first HIL round :( Two of the boards failed due to an apparent error in the quantity of context registers that needs to be corrected in the stm32-data repo. I will open a PR for that.

The other two failures were the two STM32U5 boards. I don't currently have this hardware on hand and the log is a little vague:

ERROR teleprobe::client > === nucleo-stm32u5a5zj out/tests/stm32u5a5zj/hash: FAILED: HTTP request failed with status code: 400: Bad Request
ERROR teleprobe::client > INFO - run_from_ram: true
INFO - flashing program...
INFO - flashing done!
TRACE - 0.000000 BDCR ok: 0c00a200
DEBUG - 0.000000 rcc: Clocks { audioclk: None, hclk1: Some(Hertz(48000000)), hclk2: Some(Hertz(48000000)), hclk3: Some(Hertz(48000000)), hse: None, hsi: None, hsi48: Some(Hertz(48000000)), hsi48_div_2: None, lse: None, lsi: None, msik: None, pclk1: Some(Hertz(48000000)), pclk1_tim: Some(Hertz(48000000)), pclk2: Some(Hertz(48000000)), pclk2_tim: Some(Hertz(48000000)), pclk3: Some(Hertz(48000000)), pll1_p: None, pll1_q: None, pll2_p: None, pll3_q: None, rtc: Some(Hertz(32000)), sys: Some(Hertz(48000000)) }
WARN - Deadline exceeded!
INFO - R0: 00000001 R1: 00000000 R2: 00000004 R3: 00000000
INFO - R4: 20007fe0 R5: 20007fe0 R6: 20004db0 R7: 20007fd8
INFO - R8: ffffffff R9: 00000000 R10: 00000001 R11: 00000101
INFO - R12: 4000000c SP: 20007fd0 LR: 20002df5 PC: 20001710
INFO - XPSR: 29000000
INFO -
INFO - Stack:
INFO - 20007fd0: 20007fe0 20007fe8 20007ff0 2000181b
INFO - 20007fe0: 00000000 ffffffff 00000000 5cc6e200
INFO - 20007ff0: 20007ff8 20001805 00000000 20000277
INFO - 20008000: c1a55e6c f19f6939 77c5c992 c338d887
INFO - 20008010: 24103718 3f7f34f3 2f9a8695 fbda6252
INFO - 20008020: d0c8dbc0 01121e5f 304fb27a d6f8d21d
INFO - 20008030: 1e0a47db 6af3cd36 2739c5c0 6f964eba
INFO - 20008040: 488f00be 7cdfab77 adeb4b11 e7790f73
INFO -
INFO - Backtrace:
WARN - UNWIND: Error while checking for exception context. The stack trace will not include the calling frames. : Register error: No value for Return Address register. Please report this as a bug.
INFO - Frame 0: <unknown function @ 0x20001710> @ 0x20001710
/ci/code/embassy-executor/src/arch/cortex_m.rs:106:21
INFO - Frame 1: __cortex_m_rt_main @ 0x2000181a
/ci/code/tests/stm32/src/bin/hash.rs:34:1
INFO - Frame 2: __cortex_m_rt_main_trampoline @ 0x20001804
/ci/code/tests/stm32/src/bin/hash.rs:34:1
INFO - Frame 3: <unknown function @ 0x20000276> @ 0x20000276
INFO - Frame 4: <unknown function @ 0x20000276> @ 0x20000276
ERROR - Run failed: Deadline exceeded

The affected line in source is:

let dma = peri!(p, HASH_DMA); // HASH_DMA set to GPDMA1_CH0
let mut hw_hasher = Hash::new(p.HASH, dma);

I assume it has something to do with the DMA configuration. I can work on getting hold of an STM32U5 Nucleo board to debug. In the meantime, if anyone has any insight I would appreciate it :)

Dirbaio · 2024-02-05T23:32:12Z

the "WARN - UNWIND: Error while checking for exception context" is a red herring, it's an error trying to print the stacktrace. by then the test has already failed. The "real" error is " WARN - Deadline exceeded!". Tests have a timeout of 10 seconds by default, "deadline exceeded" means the test took longer than that (i.e. it's hanging)

Dirbaio · 2024-02-06T00:05:35Z

if you want, I can give you a teleprobe token so you can locally cargo run remotely into the test farm, so you don't have to wait for CI. Helps with fast troubleshooting if you want to e.g. add more log printing. Ping me on matrix if you're interested!

caleb-garrett · 2024-02-08T17:41:18Z

I was able to debug the HIL build issue on a NUCLEO-U5A5ZJ and uncovered a very frustrating issue. There is in fact a 4th variant of this peripheral that is used on the STM32WBA, STM32H5, and STM32U5 series. From the STM32 reference manual, there is virtually no difference in this peripheral from the v2 variant besides the memory mapping of ALGO bits. (I suspect the real difference is under the hood to support GPDMA on these chips.)

The HIL test is timing out on these chips because when multiple DMA transfers are enabled (MDMAT=1), the data input interrupt status bit (DINIS) is never asserted after a DMA transfer. This prevents context switching because the CSRx registers are never loaded. Note that the peripheral otherwise works exactly as expected as long as you don't attempt to context switch between DMA transfers. I wonder if this is a silicon issue and I'm just the first person to try this particular feature on these chips...

My current plan is to add a v4 variant of this peripheral without DMA support. Not ideal, but still considerably faster than a software implementation.

Dirbaio · 2024-02-09T23:33:15Z

the L0 ADC failures should be fixed in git main, could you rebase?

Dirbaio · 2024-02-09T23:40:18Z

embassy-stm32/src/hash/v1v3v4.rs

+
+    /// Computes a digest for the given context. A slice of the provided digest buffer is returned.
+    /// The length of the returned slice is dependent on the digest length of the selected algorithm.
+    pub async fn finish<'a>(&mut self, mut ctx: Context, digest: &'a mut [u8; MAX_DIGEST_SIZE]) -> &'a [u8] {


this signature could be improved. We shouldn't require the user to allocate MAX_DIGEST_SIZE bytes if they're using an algorithm with a smaller digest.

Also, returning a subslice of the passed-in slice sometimes run into trouble with the borrow checker. In the rest of Embassy we follow the other convention, which is returning the written length as usize.

so I suggest changing it to:

pub async fn finish(&mut self, mut ctx: Context, digest: &mut [u8]) -> usize {

it could either panic or return an error if the buffer is too short.

Dirbaio · 2024-02-09T23:46:12Z

embassy-stm32/src/hash/v2.rs

+    /// Instantiates, resets, and enables the HASH peripheral.
+    pub fn new(
+        peripheral: impl Peripheral<P = T> + 'd,
+        dma: impl Peripheral<P = D> + 'd,


i'm somewhat worried about code duplication here, also the fact that the Hash::new() signature changes between chip families which is something we try to avoid, since ot helps writing portable code.

I think it'd be better to have 2 constructors: new with DMA and new_blocking without (which sets the D generic param to NoDma). So the same driver can both be used with and without DMA. Then you can cfg the DMA one to only v2.

This is what other drivers like UART, SPI etc already do, you might want to look at their code to see how it works.

The advantage is now you're not forced to use DMA on v2, so you can write code that works on all families as long as you don't use DMA.

It looks like other STM32 drivers are only implementing new and not new_blocking. Could you clarify how this should look? For instance this doesn't compile:

pub struct Hash<'d, T: Instance, D = NoDma> { _peripheral: PeripheralRef<'d, T>, dma: PeripheralRef<'d, D>, } impl<'d, T: Instance, D> Hash<'d, T, D> { pub fn new_blocking( peripheral: impl Peripheral<P = T> + 'd, irq: impl interrupt::typelevel::Binding<T::Interrupt, InterruptHandler<T>> + 'd, ) -> Self { HASH::enable_and_reset(); let no_dma = NoDma; into_ref!(peripheral, no_dma); let instance = Self { _peripheral: peripheral, dma: no_dma, // ERROR: expected struct `PeripheralRef<'d, D>', found struct `PeripheralRef<'_, NoDma>` }; T::Interrupt::unpend(); unsafe { T::Interrupt::enable() }; instance } }

Forgive me if this is a dumb question, I'm still new to Rust.

Dirbaio · 2024-02-10T00:00:42Z

embassy-stm32/src/hash/v1v3v4.rs

+        T::regs().str().write(|w| w.set_dcal(true));
+
+        // Wait for completion.
+        poll_fn(|cx| {


IMO these should just do a blockingbusy-loop wait. The manual says these computations take ~60 cycles. The cost of an async context switch is ~100 cycles in the very best case, likely more. So using async waits here is likely to make things slower for no benefit. (same for the poll_fn in store_context).

Another benefit of doing blocking waits is now the entire hash computation can be blocking (when not using DMA), which is required in some cases, for example when implementing the RustCrypto traits.

That makes sense. I will make that adjustment.

I was actually looking at the RustCrypto traits today. If implementing a trait such as Digest is desired or appropriate to include within this project I can certainly contribute towards that.

caleb-garrett · 2024-02-11T16:55:10Z

I removed the hash test from the STM32U5A5ZJ. The application won't fit due to incorrect memory size for this chip noted here: embassy-rs/stm32-data#301

Dirbaio

very nice, thank you! :)

Dirbaio · 2024-02-13T00:36:07Z

f103 failure is unrelated (I bricked it refactoring RCC 😭 )

caleb-garrett added 5 commits January 31, 2024 21:21

Added hash module with blocking implementation. Included SHA256 example.

6e9ddd4

Added hash v1/v2 configs.

1dbfa5a

Added hash interrupts for async.

1027530

Added hash DMA implementation.

72bbfec

Merge branch 'main' into hash

1f940bf

caleb-garrett marked this pull request as draft February 4, 2024 11:55

caleb-garrett added 4 commits February 4, 2024 17:16

Addressed hash CI build issues.

66f44b9

Merge commit '1f940bf9e868438090ea126eb2267f5e9325fbd4' into hash

059d8a8

Format hash example.

a260c0a

Merge branch 'main' into hash

e1f6f4b

caleb-garrett marked this pull request as ready for review February 4, 2024 22:32

caleb-garrett added 2 commits February 5, 2024 14:36

Added STM32 hash test.

079bb7b

Corrected hash CI build issues.

09973ad

Dirbaio added the trusted label Feb 5, 2024

Separated hash interrupt bindings.

0b70d67

caleb-garrett added 2 commits February 6, 2024 10:44

Updated stm32-metapac.

b7db75a

Fix digest interrupt enable.

bfa67c2

Removed hash DMA from unsupported configs.

f664575

Dirbaio reviewed Feb 9, 2024

View reviewed changes

Dirbaio reviewed Feb 10, 2024

View reviewed changes

caleb-garrett added 3 commits February 9, 2024 19:24

Merge branch 'main' into hash

0c9661a

Consolidated hash drivers.

eb64d71

fmt

7bf0442

Dirbaio approved these changes Feb 13, 2024

View reviewed changes

Dirbaio added this pull request to the merge queue Feb 13, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 13, 2024

Dirbaio merged commit 8c82d1b into embassy-rs:main Feb 13, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STM32 Hash Accelerator #2528

STM32 Hash Accelerator #2528

caleb-garrett commented Feb 3, 2024 •

edited

Loading

caleb-garrett commented Feb 4, 2024

Dirbaio commented Feb 4, 2024

caleb-garrett commented Feb 4, 2024

Dirbaio commented Feb 5, 2024

caleb-garrett commented Feb 5, 2024

Dirbaio commented Feb 5, 2024

Dirbaio commented Feb 6, 2024

caleb-garrett commented Feb 8, 2024

Dirbaio commented Feb 9, 2024

Dirbaio Feb 9, 2024

Dirbaio Feb 9, 2024

caleb-garrett Feb 10, 2024

Dirbaio Feb 10, 2024

caleb-garrett Feb 10, 2024

caleb-garrett commented Feb 11, 2024

Dirbaio left a comment

Dirbaio commented Feb 13, 2024

STM32 Hash Accelerator #2528

STM32 Hash Accelerator #2528

Conversation

caleb-garrett commented Feb 3, 2024 • edited Loading

caleb-garrett commented Feb 4, 2024

Dirbaio commented Feb 4, 2024

caleb-garrett commented Feb 4, 2024

Dirbaio commented Feb 5, 2024

caleb-garrett commented Feb 5, 2024

Dirbaio commented Feb 5, 2024

Dirbaio commented Feb 6, 2024

caleb-garrett commented Feb 8, 2024

Dirbaio commented Feb 9, 2024

Dirbaio Feb 9, 2024

Choose a reason for hiding this comment

Dirbaio Feb 9, 2024

Choose a reason for hiding this comment

caleb-garrett Feb 10, 2024

Choose a reason for hiding this comment

Dirbaio Feb 10, 2024

Choose a reason for hiding this comment

caleb-garrett Feb 10, 2024

Choose a reason for hiding this comment

caleb-garrett commented Feb 11, 2024

Dirbaio left a comment

Choose a reason for hiding this comment

Dirbaio commented Feb 13, 2024

caleb-garrett commented Feb 3, 2024 •

edited

Loading