Skip to content

Conversation

github-actions[bot]
Copy link
Contributor

Summary

This PR enables three previously commented-out vector benchmarks as part of Phase 1 (Quick Wins) of the performance improvement plan. This expands benchmark coverage to measure performance of additional core vector operations.

Performance Goal

Goal Selected: Expand benchmark coverage for all vector operations (Phase 1, Priority: HIGH)

Rationale: The research identified that many important operations (multiply, dot product, norm) were commented out in the benchmark suite. Having comprehensive benchmarks is essential for:

  1. Establishing performance baselines
  2. Identifying performance regressions
  3. Validating future optimizations
  4. Understanding performance characteristics across different vector sizes

Changes Made

Enabled Benchmarks

  1. Multiply - Element-wise vector multiplication
  2. DotProduct - SIMD-optimized dot product computation
  3. Norm - Euclidean norm calculation

Files Modified

  • benchmarks/FsMath.Benchmarks/Vector.fs - Uncommented three benchmark methods
  • build.sh - Made executable (required by build steps)

Approach

  1. ✅ Verified all Vector operations exist (Vector.multiply, Vector.dot, Vector.norm)
  2. ✅ Removed cross product benchmark (not implemented in Vector module)
  3. ✅ Uncommented and renamed benchmarks following existing conventions
  4. ✅ Verified compilation succeeds
  5. ✅ Verified benchmarks can be listed and discovered
  6. ✅ Ran complete benchmark suite with --job short

Performance Measurements

Test Environment

  • Platform: Linux Ubuntu 24.04.3 LTS (virtualized)
  • CPU: AMD EPYC 7763, 2 physical cores (4 logical)
  • Runtime: .NET 8.0.20 with AVX2 SIMD support
  • Job: ShortRun (3 warmup, 3 iterations, 1 launch)

Results Summary

Operation Size=10 Size=100 Size=1000 Size=10000
Multiply 14.04 ns 65.91 ns 595.54 ns 5.10 μs
DotProduct 7.58 ns 27.28 ns 239.51 ns 2.36 μs
Norm 5.33 ns 21.99 ns 228.74 ns 2.35 μs

Key Observations

  1. SIMD Effectiveness: Dot product and norm operations show excellent scaling, benefiting from SIMD optimizations with zero allocations
  2. Memory Allocation: Multiply allocates new arrays (expected behavior for element-wise operations)
  3. Performance Scaling: All operations scale linearly with vector size as expected
  4. Reduction Operations: Dot product and norm are ~2x faster than element-wise operations due to reduced memory traffic

Detailed Results

| Method     | Size  | Mean         | Error         | StdDev     | Gen0   | Allocated |
|----------- |------ |-------------:|--------------:|-----------:|-------:|----------:|
| Add        | 10    |    13.418 ns |     1.7189 ns |  0.0942 ns | 0.0062 |     104 B |
| Sub        | 10    |    15.287 ns |     5.7870 ns |  0.3172 ns | 0.0062 |     104 B |
| Multiply   | 10    |    14.039 ns |     4.8120 ns |  0.2638 ns | 0.0062 |     104 B |
| DotProduct | 10    |     7.580 ns |     0.0802 ns |  0.0044 ns |      - |         - |
| Norm       | 10    |     5.325 ns |     0.2020 ns |  0.0111 ns |      - |         - |
| Add        | 100   |    69.179 ns |    27.3849 ns |  1.5011 ns | 0.0492 |     824 B |
| Sub        | 100   |    64.633 ns |    20.3229 ns |  1.1140 ns | 0.0492 |     824 B |
| Multiply   | 100   |    65.909 ns |    30.3342 ns |  1.6627 ns | 0.0492 |     824 B |
| DotProduct | 100   |    27.283 ns |     0.8278 ns |  0.0454 ns |      - |         - |
| Norm       | 100   |    21.987 ns |     0.3608 ns |  0.0198 ns |      - |         - |
| Add        | 1000  |   598.511 ns |   303.1359 ns | 16.6159 ns | 0.4787 |    8024 B |
| Sub        | 1000  |   583.309 ns |   204.4454 ns | 11.2063 ns | 0.4787 |    8024 B |
| Multiply   | 1000  |   595.542 ns |    14.9460 ns |  0.8192 ns | 0.4787 |    8024 B |
| DotProduct | 1000  |   239.512 ns |     1.7292 ns |  0.0948 ns |      - |         - |
| Norm       | 1000  |   228.741 ns |     1.1622 ns |  0.0637 ns |      - |         - |
| Add        | 10000 | 4,889.167 ns |   538.7803 ns | 29.5324 ns | 4.7607 |   80024 B |
| Sub        | 10000 | 5,065.203 ns | 1,134.9065 ns | 62.2081 ns | 4.7607 |   80024 B |
| Multiply   | 10000 | 5,098.030 ns |   628.1919 ns | 34.4333 ns | 4.7607 |   80024 B |
| DotProduct | 10000 | 2,355.899 ns |    16.7396 ns |  0.9176 ns |      - |         - |
| Norm       | 10000 | 2,347.459 ns |    67.4266 ns |  3.6959 ns |      - |         - |

Replicating the Performance Measurements

To replicate these benchmarks:

# 1. Build the project
./build.sh

# 2. Run the benchmarks with short job (quick validation, ~3 minutes)
dotnet run -c Release --project benchmarks/FsMath.Benchmarks/FsMath.Benchmarks.fsproj -- --job short

# 3. For more accurate measurements, run with default settings (~15-20 minutes)
dotnet run -c Release --project benchmarks/FsMath.Benchmarks/FsMath.Benchmarks.fsproj

Testing

✅ All benchmarks compile successfully
✅ All benchmarks can be discovered via --list flat
✅ All benchmarks execute without errors
✅ Existing tests still pass (132 tests passed)

Next Steps

This PR establishes comprehensive baseline measurements for vector operations. Future work from the performance plan includes:

  1. Add matrix operation benchmarks (Phase 1)
  2. Optimize outer product implementation (Phase 1)
  3. Implement blocked matrix multiplication (Phase 2)
  4. Add parallel operations for large matrices (Phase 3)

Related Issues/Discussions


🤖 Generated with Claude Code

AI generated by Daily Perf Improver

…ting

This change enables three previously commented-out benchmarks:
- Multiply: Tests element-wise vector multiplication
- DotProduct: Tests SIMD-optimized dot product computation
- Norm: Tests Euclidean norm calculation

These benchmarks are part of Phase 1 (Quick Wins) from the performance
improvement plan to expand benchmark coverage for all core vector operations.

Initial benchmark results (ShortRun on AMD EPYC 7763, .NET 8.0.20):

Size 10:
- Multiply: 14.04 ns
- DotProduct: 7.58 ns
- Norm: 5.33 ns

Size 100:
- Multiply: 65.91 ns
- DotProduct: 27.28 ns
- Norm: 21.99 ns

Size 1000:
- Multiply: 595.54 ns
- DotProduct: 239.51 ns
- Norm: 228.74 ns

Size 10000:
- Multiply: 5.10 μs
- DotProduct: 2.36 μs
- Norm: 2.35 μs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant