You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The typical way to flush data to the DSP is with DSP_FlushDataCache. However, its performance is actually very poor when the syscore's App CPU Time Limit is set to a high value, scaling from about 0ms to a very jittery 5ms in my application. svcFlushProcessDataCache does not suffer from these performance issues and seems to work perfectly. This should be documented or fixed, though I assume the latter is infeasible or impossible.
Smaller Details:
By default, DSP_FlushDataCache is very speedy. However, if APT_SetAppCpuTimeLimit is run, it can become quite slow.
Without running APT_SetAppCPUTimeLimit, it takes around 0.1 milliseconds.
At the maximum CPU time limit of 80, it takes 1-5 milliseconds, average 2.5.
At a CPU time limit of 50, it takes anywhere from 0.1 milliseconds to 1.2 milliseconds.
At a CPU time limit of 10, it takes around 0.1 milliseconds.
The size of the data flushed does not affect the time taken.
The attached photo shows the time taken over 90 frames of real game time, in identical scenarios. On the left is pure single-threaded operation, where APT_SetAppCPUTimeLimit was never run. On the right is a CPU time limit of 80, with the DSP flush occurring on the second CPU core. The performance characteristics of DSP_FlushDataCache appear to be very jittery. svcFlushProcessDataCache gives results essentially identical to the left graph.
Here is a comparison between multi-threaded audio with DSP cache flush enabled/disabled (obviously, this causes audio issues). These measurements are sum totals of the entire audio frame's time for a specific benchmark. svcFlushProcessDataCache gives results essentially identical to the right graph.
Nature of Request:
Addition
After confirming its behavior, add documentation to DSP_FlushDataCache to explain its performance issues. Use of DSP_FlushDataCache should be discouraged in favor of svcFlushProcessDataCache.
Documentation for svcFlushProcessDataCache should explain that it accepts a virtual address, not a physical address.
Why would this feature be useful?
This should be self-evident.
Additional Context
I made a devkitPro forum post covering this exact topic in January, but it still has yet to be approved. Do moderators even check the 3DS board? However, because it's been so long, I can't quite remember all of the performance details. Most of the nitty-gritty was copied directly from that post.
I do not have access to a New 3DS to test its performance characteristics. All testing was done on an old model.
The text was updated successfully, but these errors were encountered:
Wyatt-James
changed the title
Document Performance Issues of DSP_FlushDataCache
[RFC] Document Performance Issues of DSP_FlushDataCache
Dec 19, 2024
Feature Request
What feature are you suggesting?
Overview:
The typical way to flush data to the DSP is with DSP_FlushDataCache. However, its performance is actually very poor when the syscore's App CPU Time Limit is set to a high value, scaling from about 0ms to a very jittery 5ms in my application. svcFlushProcessDataCache does not suffer from these performance issues and seems to work perfectly. This should be documented or fixed, though I assume the latter is infeasible or impossible.
Smaller Details:
By default, DSP_FlushDataCache is very speedy. However, if APT_SetAppCpuTimeLimit is run, it can become quite slow.
Without running APT_SetAppCPUTimeLimit, it takes around 0.1 milliseconds.
At the maximum CPU time limit of 80, it takes 1-5 milliseconds, average 2.5.
At a CPU time limit of 50, it takes anywhere from 0.1 milliseconds to 1.2 milliseconds.
At a CPU time limit of 10, it takes around 0.1 milliseconds.
The size of the data flushed does not affect the time taken.
The attached photo shows the time taken over 90 frames of real game time, in identical scenarios. On the left is pure single-threaded operation, where APT_SetAppCPUTimeLimit was never run. On the right is a CPU time limit of 80, with the DSP flush occurring on the second CPU core. The performance characteristics of DSP_FlushDataCache appear to be very jittery. svcFlushProcessDataCache gives results essentially identical to the left graph.
Here is a comparison between multi-threaded audio with DSP cache flush enabled/disabled (obviously, this causes audio issues). These measurements are sum totals of the entire audio frame's time for a specific benchmark. svcFlushProcessDataCache gives results essentially identical to the right graph.
Nature of Request:
Why would this feature be useful?
This should be self-evident.
Additional Context
The text was updated successfully, but these errors were encountered: