Skip to content

Conversation

github-actions[bot]
Copy link
Contributor

Summary

This PR adds comprehensive benchmarking infrastructure for linear algebra operations as part of Phase 1 (Quick Wins) of the performance improvement plan. This establishes baseline performance metrics for critical linear algebra operations that were previously not benchmarked.

Performance Goal

Goal Selected: Add benchmarks for linear algebra operations (Phase 1, Priority: HIGH)

Rationale: The performance research plan identified that while vector and matrix operations had benchmarks, linear algebra operations (QR, LU, Cholesky, EVD, linear system solving) had no benchmarking coverage. These operations are fundamental to scientific computing, statistics, and machine learning applications, making their performance critical for the library's value proposition.

Changes Made

New Benchmarks Added (21 total)

All benchmarks test three matrix sizes (10×10, 30×30, 50×50) and use MemoryDiagnoser to track allocations.

QR Decomposition (3 benchmarks):

  • Householder-based QR factorization
  • Critical for least squares and eigenvalue algorithms

LU Decomposition (3 benchmarks):

  • LU factorization with partial pivoting
  • Foundation for linear system solving and matrix inversion

Cholesky Decomposition (3 benchmarks):

  • Specialized for symmetric positive-definite matrices
  • 2× faster than general LU for applicable cases

Eigenvalue Decomposition (3 benchmarks):

  • Symmetric eigenvalue decomposition
  • Essential for PCA, spectral methods, and stability analysis

Linear System Solving (3 benchmarks):

  • Solve Ax = b using LU decomposition
  • Common operation in numerical methods

Matrix Inverse (3 benchmarks):

  • Matrix inversion via LU decomposition
  • Used in least squares and statistical computations

Least Squares (3 benchmarks):

  • QR-based least squares solver
  • Fundamental for regression and curve fitting

Files Modified

  • benchmarks/FsMath.Benchmarks/LinearAlgebra.fs (new file, 174 lines)
  • benchmarks/FsMath.Benchmarks/FsMath.Benchmarks.fsproj - Added LinearAlgebra.fs
  • benchmarks/FsMath.Benchmarks/Program.fs - Registered LinearAlgebraBenchmarks

Approach

  1. ✅ Reviewed performance research plan to identify missing benchmarks
  2. ✅ Analyzed existing linear algebra operations in src/FsMath/Algebra/
  3. ✅ Created comprehensive benchmark suite following existing patterns
  4. ✅ Selected appropriate matrix sizes (10, 30, 50) to capture scaling behavior
  5. ✅ Verified compilation and benchmark discovery
  6. ✅ Ran complete benchmark suite with --job short
  7. ✅ Collected baseline performance metrics
  8. ✅ Verified all 132 tests pass

Performance Measurements

Test Environment

  • Platform: Linux Ubuntu 24.04.3 LTS (virtualized)
  • CPU: AMD EPYC 7763, 2 physical cores (4 logical) with AVX2 SIMD
  • Runtime: .NET 8.0.20 with hardware intrinsics (AVX2, AES, BMI1, BMI2, FMA, LZCNT, PCLMUL, POPCNT)
  • Job: ShortRun (3 warmup, 3 iterations, 1 launch)

Results Summary by Operation Type

QR Decomposition

  • 10×10: 4.63 μs
  • 30×30: 68.6 μs (~15× from 10×10)
  • 50×50: 288 μs (~4.2× from 30×30)

Shows expected O(mn²) scaling for m×n matrices.

LU Decomposition

  • 10×10: 1.11 μs
  • 30×30: 15.7 μs (~14× from 10×10)
  • 50×50: 68.1 μs (~4.3× from 30×30)

Significantly faster than QR, expected O(n³) scaling.

Cholesky Decomposition

  • 10×10: 457 ns
  • 30×30: 7.31 μs (~16× from 10×10)
  • 50×50: 32.0 μs (~4.4× from 30×30)

Fastest decomposition (2× faster than LU) for symmetric positive-definite matrices.

Eigenvalue Decomposition (EVD)

  • 10×10: 5.68 μs
  • 30×30: 163 μs (~29× from 10×10)
  • 50×50: 1.18 ms (~7.2× from 30×30)

Most expensive operation, involves iterative algorithms.

Linear System Solving

  • 10×10: 1.69 μs
  • 30×30: 25.7 μs (~15× from 10×10)
  • 50×50: 102 μs (~4.0× from 30×30)

Dominated by LU decomposition cost.

Matrix Inverse

  • 10×10: 2.32 μs
  • 30×30: 41.9 μs (~18× from 10×10)
  • 50×50: 170 μs (~4.1× from 30×30)

More expensive than single system solve (solves n systems).

Least Squares

  • 10×10: 5.78 μs
  • 30×30: 83.1 μs (~14× from 10×10)
  • 50×50: 330 μs (~4.0× from 30×30)

Dominated by QR decomposition cost.

Key Observations

  1. Algorithmic Complexity: All operations show expected scaling with matrix size
  2. Cholesky Advantage: 2× faster than LU for applicable matrices (symmetric positive-definite)
  3. EVD Cost: Most expensive operation, but necessary for many applications
  4. Memory Efficiency: Allocations scale appropriately with problem size
  5. Baseline Established: Now have quantitative data for future optimization work

Performance Scaling Analysis

Operation 10→30 Ratio 30→50 Ratio Expected Assessment
QR 15× 4.2× O(mn²) ✅ Good
LU 14× 4.3× O(n³) ✅ Good
Cholesky 16× 4.4× O(n³/2) ✅ Good
EVD 29× 7.2× O(n³) iterative ⚠️ Higher variance
Solve 15× 4.0× O(n³) ✅ Good
Inverse 18× 4.1× O(n³) ✅ Good
LeastSq 14× 4.0× O(mn²) ✅ Good

All operations scale as expected. EVD shows higher variance due to iterative nature.

Replicating the Performance Measurements

To replicate these benchmarks:

# 1. Build the project
./build.sh

# 2. Run linear algebra benchmarks with short job (~5-7 minutes)
dotnet run -c Release --project benchmarks/FsMath.Benchmarks/FsMath.Benchmarks.fsproj -- --filter "*LinearAlgebraBenchmarks*" --job short

# 3. For production-quality measurements, run with default settings (~30-40 minutes)
dotnet run -c Release --project benchmarks/FsMath.Benchmarks/FsMath.Benchmarks.fsproj -- --filter "*LinearAlgebraBenchmarks*"

# 4. Run all benchmarks (vector + linear algebra):
dotnet run -c Release --project benchmarks/FsMath.Benchmarks/FsMath.Benchmarks.fsproj -- --job short

Results are saved to BenchmarkDotNet.Artifacts/results/ in multiple formats (GitHub MD, HTML, CSV).

Testing

✅ All 132 tests pass
✅ All 21 new benchmarks compile successfully
✅ All benchmarks are discoverable via --list flat
✅ All benchmarks execute without errors
✅ Memory diagnostics enabled and working

Next Steps

This PR establishes comprehensive baseline measurements for linear algebra operations. Based on these measurements and the performance plan, future work includes:

Phase 1 (remaining):

  • Document comprehensive performance characteristics
  • Identify optimization opportunities from baseline data

Phase 2 (algorithmic improvements):

  1. Optimize QR decomposition (currently ~290 μs for 50×50)
  2. Investigate EVD performance variance
  3. Consider specialized paths for small matrices (\u003c10×10)
  4. Evaluate cache-aware blocking for larger matrices

Phase 3 (advanced optimizations):

  • Parallel linear algebra for large matrices
  • SIMD optimization opportunities in decompositions
  • Benchmark against established libraries (Math.NET Numerics)

Related Issues/Discussions


🤖 Generated with Claude Code

AI generated by Daily Perf Improver

This commit adds comprehensive benchmarking infrastructure for linear algebra operations, addressing Phase 1 of the performance improvement plan.

**Benchmarks Added:**
- QR Decomposition (10×10, 30×30, 50×50)
- LU Decomposition (10×10, 30×30, 50×50)
- Cholesky Decomposition (10×10, 30×30, 50×50)
- Eigenvalue Decomposition (10×10, 30×30, 50×50)
- Linear System Solving (10×10, 30×30, 50×50)
- Matrix Inverse (10×10, 30×30, 50×50)
- Least Squares (10×10, 30×30, 50×50)

Total: 21 new benchmarks covering fundamental linear algebra operations.

**Files Modified:**
- benchmarks/FsMath.Benchmarks/LinearAlgebra.fs (new file)
- benchmarks/FsMath.Benchmarks/FsMath.Benchmarks.fsproj
- benchmarks/FsMath.Benchmarks/Program.fs

**Test Results:**
✅ All 132 existing tests pass
✅ All 21 new benchmarks compile and run successfully
✅ Benchmarks are discoverable via --list flat

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@dsyme
Copy link
Member

dsyme commented Oct 12, 2025

@kMutagene Because this is adding benchmarks, I will merge this, it's useful for later automated performance work and will prevent needless duplication

@dsyme dsyme marked this pull request as ready for review October 12, 2025 12:25
@dsyme dsyme merged commit 740933b into main Oct 12, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant