Skip to content

Conversation

ylpoonlg
Copy link

@ylpoonlg ylpoonlg commented Sep 4, 2025

Add micro benchmark for the partition function in scalar and SVE.

@dotnet/arm64-contrib @a74nh

Only contains scalar and SVE versions for now since Neon is much more
complicated to implement.
@ylpoonlg
Copy link
Author

ylpoonlg commented Sep 4, 2025

@dotnet-policy-service agree company="Arm"

@ylpoonlg
Copy link
Author

ylpoonlg commented Sep 4, 2025

Performance results (run on Nvidia Grace)

| Method       | Size  | Mean         | Error      | StdDev     | Median       | Min          | Max          | Allocated |
|------------- |------ |-------------:|-----------:|-----------:|-------------:|-------------:|-------------:|----------:|
| Scalar       | 15    |    10.995 ns |  0.2522 ns |  0.2477 ns |    10.844 ns |    10.827 ns |    11.568 ns |         - |
| SvePartition | 15    |     9.756 ns |  0.0081 ns |  0.0076 ns |     9.752 ns |     9.749 ns |     9.772 ns |         - |
| Scalar       | 127   |   100.488 ns |  0.0617 ns |  0.0515 ns |   100.472 ns |   100.419 ns |   100.615 ns |         - |
| SvePartition | 127   |    85.037 ns |  0.0778 ns |  0.0650 ns |    85.032 ns |    84.953 ns |    85.192 ns |         - |
| Scalar       | 527   |   431.728 ns |  0.2506 ns |  0.2093 ns |   431.621 ns |   431.529 ns |   432.207 ns |         - |
| SvePartition | 527   |   357.896 ns |  0.0638 ns |  0.0533 ns |   357.912 ns |   357.796 ns |   357.982 ns |         - |
| Scalar       | 10015 | 8,338.500 ns |  5.8883 ns |  5.2198 ns | 8,338.681 ns | 8,328.850 ns | 8,346.849 ns |         - |
| SvePartition | 10015 | 6,968.263 ns | 11.8289 ns | 11.0648 ns | 6,972.462 ns | 6,932.894 ns | 6,978.159 ns |         - |

@a74nh
Copy link
Contributor

a74nh commented Sep 4, 2025

Performance results (run on Nvidia Grace)

| Method       | Size  | Mean         | Error      | StdDev     | Median       | Min          | Max          | Allocated |
|------------- |------ |-------------:|-----------:|-----------:|-------------:|-------------:|-------------:|----------:|
| Scalar       | 15    |    10.995 ns |  0.2522 ns |  0.2477 ns |    10.844 ns |    10.827 ns |    11.568 ns |         - |
| SvePartition | 15    |     9.756 ns |  0.0081 ns |  0.0076 ns |     9.752 ns |     9.749 ns |     9.772 ns |         - |
| Scalar       | 127   |   100.488 ns |  0.0617 ns |  0.0515 ns |   100.472 ns |   100.419 ns |   100.615 ns |         - |
| SvePartition | 127   |    85.037 ns |  0.0778 ns |  0.0650 ns |    85.032 ns |    84.953 ns |    85.192 ns |         - |
| Scalar       | 527   |   431.728 ns |  0.2506 ns |  0.2093 ns |   431.621 ns |   431.529 ns |   432.207 ns |         - |
| SvePartition | 527   |   357.896 ns |  0.0638 ns |  0.0533 ns |   357.912 ns |   357.796 ns |   357.982 ns |         - |
| Scalar       | 10015 | 8,338.500 ns |  5.8883 ns |  5.2198 ns | 8,338.681 ns | 8,328.850 ns | 8,346.849 ns |         - |
| SvePartition | 10015 | 6,968.263 ns | 11.8289 ns | 11.0648 ns | 6,972.462 ns | 6,932.894 ns | 6,978.159 ns |         - |

10% to 20% improvement with SVE over scalar, getting better as the size increases. Not as fast as I was hoping for, but it's a definite gain. We should probably do some comparisons against C++ to see if it's the algorithm/hardware or the CoreCLR codegen.

@a74nh
Copy link
Contributor

a74nh commented Sep 4, 2025

@LoopedBard3 @DrewScoggins @adamsitnik - not sure who best to ping for this, so going off who replied on #4841.

@ylpoonlg is with us for a few months, so I've asked him to make some improvements/additions to the SVE tests as we could do with improving the coverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants