Daily Perf Improver - Enable additional vector benchmarks #16

github-actions · 2025-10-10T23:51:06Z

Summary

This PR enables three previously commented-out vector benchmarks as part of Phase 1 (Quick Wins) of the performance improvement plan. This expands benchmark coverage to measure performance of additional core vector operations.

Performance Goal

Goal Selected: Expand benchmark coverage for all vector operations (Phase 1, Priority: HIGH)

Rationale: The research identified that many important operations (multiply, dot product, norm) were commented out in the benchmark suite. Having comprehensive benchmarks is essential for:

Establishing performance baselines
Identifying performance regressions
Validating future optimizations
Understanding performance characteristics across different vector sizes

Changes Made

Enabled Benchmarks

Multiply - Element-wise vector multiplication
DotProduct - SIMD-optimized dot product computation
Norm - Euclidean norm calculation

Files Modified

benchmarks/FsMath.Benchmarks/Vector.fs - Uncommented three benchmark methods
build.sh - Made executable (required by build steps)

Approach

✅ Verified all Vector operations exist (Vector.multiply, Vector.dot, Vector.norm)
✅ Removed cross product benchmark (not implemented in Vector module)
✅ Uncommented and renamed benchmarks following existing conventions
✅ Verified compilation succeeds
✅ Verified benchmarks can be listed and discovered
✅ Ran complete benchmark suite with --job short

Performance Measurements

Test Environment

Platform: Linux Ubuntu 24.04.3 LTS (virtualized)
CPU: AMD EPYC 7763, 2 physical cores (4 logical)
Runtime: .NET 8.0.20 with AVX2 SIMD support
Job: ShortRun (3 warmup, 3 iterations, 1 launch)

Results Summary

Operation	Size=10	Size=100	Size=1000	Size=10000
Multiply	14.04 ns	65.91 ns	595.54 ns	5.10 μs
DotProduct	7.58 ns	27.28 ns	239.51 ns	2.36 μs
Norm	5.33 ns	21.99 ns	228.74 ns	2.35 μs

Key Observations

SIMD Effectiveness: Dot product and norm operations show excellent scaling, benefiting from SIMD optimizations with zero allocations
Memory Allocation: Multiply allocates new arrays (expected behavior for element-wise operations)
Performance Scaling: All operations scale linearly with vector size as expected
Reduction Operations: Dot product and norm are ~2x faster than element-wise operations due to reduced memory traffic

Detailed Results

| Method     | Size  | Mean         | Error         | StdDev     | Gen0   | Allocated |
|----------- |------ |-------------:|--------------:|-----------:|-------:|----------:|
| Add        | 10    |    13.418 ns |     1.7189 ns |  0.0942 ns | 0.0062 |     104 B |
| Sub        | 10    |    15.287 ns |     5.7870 ns |  0.3172 ns | 0.0062 |     104 B |
| Multiply   | 10    |    14.039 ns |     4.8120 ns |  0.2638 ns | 0.0062 |     104 B |
| DotProduct | 10    |     7.580 ns |     0.0802 ns |  0.0044 ns |      - |         - |
| Norm       | 10    |     5.325 ns |     0.2020 ns |  0.0111 ns |      - |         - |
| Add        | 100   |    69.179 ns |    27.3849 ns |  1.5011 ns | 0.0492 |     824 B |
| Sub        | 100   |    64.633 ns |    20.3229 ns |  1.1140 ns | 0.0492 |     824 B |
| Multiply   | 100   |    65.909 ns |    30.3342 ns |  1.6627 ns | 0.0492 |     824 B |
| DotProduct | 100   |    27.283 ns |     0.8278 ns |  0.0454 ns |      - |         - |
| Norm       | 100   |    21.987 ns |     0.3608 ns |  0.0198 ns |      - |         - |
| Add        | 1000  |   598.511 ns |   303.1359 ns | 16.6159 ns | 0.4787 |    8024 B |
| Sub        | 1000  |   583.309 ns |   204.4454 ns | 11.2063 ns | 0.4787 |    8024 B |
| Multiply   | 1000  |   595.542 ns |    14.9460 ns |  0.8192 ns | 0.4787 |    8024 B |
| DotProduct | 1000  |   239.512 ns |     1.7292 ns |  0.0948 ns |      - |         - |
| Norm       | 1000  |   228.741 ns |     1.1622 ns |  0.0637 ns |      - |         - |
| Add        | 10000 | 4,889.167 ns |   538.7803 ns | 29.5324 ns | 4.7607 |   80024 B |
| Sub        | 10000 | 5,065.203 ns | 1,134.9065 ns | 62.2081 ns | 4.7607 |   80024 B |
| Multiply   | 10000 | 5,098.030 ns |   628.1919 ns | 34.4333 ns | 4.7607 |   80024 B |
| DotProduct | 10000 | 2,355.899 ns |    16.7396 ns |  0.9176 ns |      - |         - |
| Norm       | 10000 | 2,347.459 ns |    67.4266 ns |  3.6959 ns |      - |         - |

Replicating the Performance Measurements

To replicate these benchmarks:

# 1. Build the project
./build.sh

# 2. Run the benchmarks with short job (quick validation, ~3 minutes)
dotnet run -c Release --project benchmarks/FsMath.Benchmarks/FsMath.Benchmarks.fsproj -- --job short

# 3. For more accurate measurements, run with default settings (~15-20 minutes)
dotnet run -c Release --project benchmarks/FsMath.Benchmarks/FsMath.Benchmarks.fsproj

Testing

✅ All benchmarks compile successfully
✅ All benchmarks can be discovered via --list flat
✅ All benchmarks execute without errors
✅ Existing tests still pass (132 tests passed)

Next Steps

This PR establishes comprehensive baseline measurements for vector operations. Future work from the performance plan includes:

Add matrix operation benchmarks (Phase 1)
Optimize outer product implementation (Phase 1)
Implement blocked matrix multiplication (Phase 2)
Add parallel operations for large matrices (Phase 3)

Related Issues/Discussions

Performance Research: https://github.com/fslaborg/FsMath/discussions/11

🤖 Generated with Claude Code

AI generated by Daily Perf Improver

…ting This change enables three previously commented-out benchmarks: - Multiply: Tests element-wise vector multiplication - DotProduct: Tests SIMD-optimized dot product computation - Norm: Tests Euclidean norm calculation These benchmarks are part of Phase 1 (Quick Wins) from the performance improvement plan to expand benchmark coverage for all core vector operations. Initial benchmark results (ShortRun on AMD EPYC 7763, .NET 8.0.20): Size 10: - Multiply: 14.04 ns - DotProduct: 7.58 ns - Norm: 5.33 ns Size 100: - Multiply: 65.91 ns - DotProduct: 27.28 ns - Norm: 21.99 ns Size 1000: - Multiply: 595.54 ns - DotProduct: 239.51 ns - Norm: 228.74 ns Size 10000: - Multiply: 5.10 μs - DotProduct: 2.36 μs - Norm: 2.35 μs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

dsyme closed this Oct 10, 2025

dsyme reopened this Oct 10, 2025

dsyme marked this pull request as ready for review October 12, 2025 12:59

dsyme merged commit d937954 into main Oct 12, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Daily Perf Improver - Enable additional vector benchmarks #16

Daily Perf Improver - Enable additional vector benchmarks #16

github-actions bot commented Oct 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Daily Perf Improver - Enable additional vector benchmarks #16

Daily Perf Improver - Enable additional vector benchmarks #16

Conversation

github-actions bot commented Oct 10, 2025

Summary

Performance Goal

Changes Made

Enabled Benchmarks

Files Modified

Approach

Performance Measurements

Test Environment

Results Summary

Key Observations

Detailed Results

Replicating the Performance Measurements

Testing

Next Steps

Related Issues/Discussions

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant