Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Document Performance Issues of DSP_FlushDataCache #556

Open
Wyatt-James opened this issue Dec 18, 2024 · 0 comments
Open

[RFC] Document Performance Issues of DSP_FlushDataCache #556

Wyatt-James opened this issue Dec 18, 2024 · 0 comments

Comments

@Wyatt-James
Copy link

Feature Request

What feature are you suggesting?

Overview:

The typical way to flush data to the DSP is with DSP_FlushDataCache. However, its performance is actually very poor when the syscore's App CPU Time Limit is set to a high value, scaling from about 0ms to a very jittery 5ms in my application. svcFlushProcessDataCache does not suffer from these performance issues and seems to work perfectly. This should be documented or fixed, though I assume the latter is infeasible or impossible.

Smaller Details:

By default, DSP_FlushDataCache is very speedy. However, if APT_SetAppCpuTimeLimit is run, it can become quite slow.

Without running APT_SetAppCPUTimeLimit, it takes around 0.1 milliseconds.
At the maximum CPU time limit of 80, it takes 1-5 milliseconds, average 2.5.
At a CPU time limit of 50, it takes anywhere from 0.1 milliseconds to 1.2 milliseconds.
At a CPU time limit of 10, it takes around 0.1 milliseconds.

The size of the data flushed does not affect the time taken.

The attached photo shows the time taken over 90 frames of real game time, in identical scenarios. On the left is pure single-threaded operation, where APT_SetAppCPUTimeLimit was never run. On the right is a CPU time limit of 80, with the DSP flush occurring on the second CPU core. The performance characteristics of DSP_FlushDataCache appear to be very jittery. svcFlushProcessDataCache gives results essentially identical to the left graph.
dsp export graph

Here is a comparison between multi-threaded audio with DSP cache flush enabled/disabled (obviously, this causes audio issues). These measurements are sum totals of the entire audio frame's time for a specific benchmark. svcFlushProcessDataCache gives results essentially identical to the right graph.
aggregate flush vs noflush

Nature of Request:

  • Addition
    • After confirming its behavior, add documentation to DSP_FlushDataCache to explain its performance issues. Use of DSP_FlushDataCache should be discouraged in favor of svcFlushProcessDataCache.
    • Documentation for svcFlushProcessDataCache should explain that it accepts a virtual address, not a physical address.

Why would this feature be useful?

This should be self-evident.

Additional Context

  • I made a devkitPro forum post covering this exact topic in January, but it still has yet to be approved. Do moderators even check the 3DS board? However, because it's been so long, I can't quite remember all of the performance details. Most of the nitty-gritty was copied directly from that post.
  • I do not have access to a New 3DS to test its performance characteristics. All testing was done on an old model.
@Wyatt-James Wyatt-James changed the title Document Performance Issues of DSP_FlushDataCache [RFC] Document Performance Issues of DSP_FlushDataCache Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant