Skip to content

flash algorithm execution speed #1079

@nerdralph

Description

@nerdralph

I've been trying to optimize pyOCD flash write speed with my debug probe. After configuring trace debug logs (thanks Chris!), I noticed less than half the time is spent doing block writes. Most of the time seems to be spent generating the AP read and write commands in between block transfers. Here's an excerpt from the logs:

0002482:DEBUG:ap:_write_block32:001014 }
0002482:DEBUG:ap:read_mem:001018 (ap=0x0; addr=0xe000edf0, size=32) {
0002482:DEBUG:ap:write_ap:001019 cached (ap=0x0; addr=0x00000000) = 0x23000012
0002482:DEBUG:dap:write_ap:001020 (addr=0x00000004) = 0xe000edf0
0002482:DEBUG:dap_access_cmsis_dap:get_request_space(1, 05:w)[wc=7, rc=0, ba=1->0] -> (sz=1, free=5, delta=-247)
0002482:DEBUG:dap_access_cmsis_dap:add(1, 05:w) -> [wc=8, rc=0, ba=0]
0002482:DEBUG:dap_access_cmsis_dap:get_request_space(1, 0f:r)[wc=8, rc=0, ba=0->0] -> (sz=1, free=15, delta=-246)
0002482:DEBUG:dap_access_cmsis_dap:add(1, 0f:r) -> [wc=8, rc=1, ba=0]
0002483:DEBUG:dap:read_ap:001021 (addr=0x0000000c) -> ...
0002483:DEBUG:dap_access_cmsis_dap:New _Command
0002485:DEBUG:dap:read_ap:001021 ...(addr=0x0000000c) -> 0x01030003
0002485:DEBUG:ap:read_mem:001018 (ap=0x0; addr=0xe000edf0, size=32) -> 0x01030003 }
0002485:DEBUG:ap:read_mem:001022 (ap=0x0; addr=0xe000edf0, size=32) {
0002485:DEBUG:ap:write_ap:001023 cached (ap=0x0; addr=0x00000000) = 0x23000012
0002485:DEBUG:dap:write_ap:001024 (addr=0x00000004) = 0xe000edf0
0002485:DEBUG:dap_access_cmsis_dap:get_request_space(1, 05:w)[wc=0, rc=0, ba=1->1] -> (sz=1, free=14)
0002485:DEBUG:dap_access_cmsis_dap:add(1, 05:w) -> [wc=1, rc=0, ba=1]
0002485:DEBUG:dap_access_cmsis_dap:get_request_space(1, 0f:r)[wc=1, rc=0, ba=1->0] -> (sz=1, free=15, delta=-253)
0002485:DEBUG:dap_access_cmsis_dap:add(1, 0f:r) -> [wc=1, rc=1, ba=0]
0002485:DEBUG:dap:read_ap:001025 (addr=0x0000000c) -> ...
0002486:DEBUG:dap_access_cmsis_dap:New _Command
0002488:DEBUG:dap:read_ap:001025 ...(addr=0x0000000c) -> 0x00030003
0002488:DEBUG:ap:read_mem:001022 (ap=0x0; addr=0xe000edf0, size=32) -> 0x00030003 }
0002488:DEBUG:ap:write_mem:001026 (ap=0x0; addr=0xe000edf4, size=32) = 0x00000000 {
0002488:DEBUG:ap:write_ap:001027 cached (ap=0x0; addr=0x00000000) = 0x23000012
0002488:DEBUG:dap:write_ap:001028 (addr=0x00000004) = 0xe000edf4
0002488:DEBUG:dap_access_cmsis_dap:get_request_space(1, 05:w)[wc=0, rc=0, ba=1->1] -> (sz=1, free=14)

It takes until the 2504ms timestamp to get a full packet to write. In total from the end of one write_block32 until the start of the next write_block32 is 45ms. pyOCD seems to be doing some sort of caching of the parsed flash algorithm, as the time between the 1st and 2nd block writes is about twice as long (~100ms). Is there any easy way to speed this up? The host CPU is a 3.2GHz Intel Core 64-bit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions