Daily Perf Improver: Research and Plan

# Performance Research and Improvement Plan

## Executive Summary

FSharp.Control.AsyncSeq has several critical performance issues that significantly impact its usability with large datasets. The primary concerns are **memory leaks in append operations** and **O(n²) performance in recursive patterns**. This research provides a comprehensive analysis and improvement plan.

## Current Performance Testing Infrastructure

### Testing Setup
- **Performance Test File**: `tests/FSharp.Control.AsyncSeq.Tests/AsyncSeqPerf.fsx`
- **Test Framework**: Basic F# Script with `System.Diagnostics.Stopwatch`
- **Test Scale**: N=1,000,000 elements
- **Major Gap**: No modern benchmarking framework (BenchmarkDotNet)

### Historical Performance Results
```
unfoldIter (N=1,000,000):
- Real: 00:00:08.565, GC gen0: 889, gen1: 2, gen2: 0

unfoldChooseIter (N=1,000,000):  
- Real: 00:00:10.818, GC gen0: 1114, gen1: 3, gen2: 0

replicate (N=1,000,000):
- Real: 00:00:01.530, GC gen0: 156, gen1: 1, gen2: 0
```

## Critical Performance Issues Identified

### 🔴 HIGH PRIORITY: Memory Leaks in `append` Operations
- **Issue**: Chain of append closures causes memory accumulation (Issue #35)
- **Impact**: O(n) memory usage instead of O(1) for streaming, causes OutOfMemoryException
- **Root Cause**: `AsyncSeq.fs:append` creates generator chains that prevent GC

### 🔴 HIGH PRIORITY: O(n²) Performance in Recursive Patterns
- **Issue**: Computation builder creates nested continuations (Issue #57)
- **Impact**: 4-second operation becomes 18+ seconds with 4x more data  
- **Root Cause**: AsyncSeq computation expressions with recursive `yield!`

### 🟡 MEDIUM PRIORITY: Excessive `append` Function Calls
- **Issue**: 3,000 element processing → 13 million append calls (Issue #50)
- **Impact**: Affects all sequence operations
- **Root Cause**: Implementation relies heavily on append for composition

### 🟡 MEDIUM PRIORITY: Parallelism Issues
- **Issue**: `mapAsyncParallel` doesn't provide true parallelism (Issues #74, #95)
- **Impact**: Sequential processing instead of parallel
- **Root Cause**: Implementation serializes work instead of parallelizing

## Performance Characteristics

### Bottleneck Analysis
- **Primarily I/O Bound**: Designed for async I/O scenarios
- **High CPU Overhead**: From continuations and allocations  
- **Memory Issues**: Major leaks in core operations, high GC pressure
- **Threading**: Non-blocking but doesn't effectively utilize multiple cores

### Allocation Patterns
- **High Allocation**: Each `MoveNext()` requires "several allocations"
- **GC Pressure**: Heavy generation 0 collections (889 gen0 for 1M elements)
- **Long-lived Objects**: Continuation chains prevent garbage collection

## Typical Workloads
1. **Streaming I/O Processing**: Large files, network streams, database cursors
2. **Event Processing**: Event streams with backpressure control  
3. **ETL Pipelines**: Extract-Transform-Load workflows
4. **Paging Data**: Consuming paginated APIs
5. **Parallel Processing**: Throttled parallel processing

## Performance-Critical Operations (Priority Order)
1. **`unfoldAsync`** - Core sequence generation, used by most other functions
2. **`append`** - Used extensively, major memory leak source
3. **`collect`** - Monadic bind, complex state management
4. **`mapAsync`** - Most common transformation operation
5. **`iterAsync`** - Terminal operation, recursion performance issue

## Build and Testing Commands

### Current Commands
```bash
# Build and test
dotnet build -c Release
dotnet test -c Release

# Run performance tests (manual)
dotnet fsi tests/FSharp.Control.AsyncSeq.Tests/AsyncSeqPerf.fsx

# Fable tests (JavaScript compatibility)  
cd tests/fable && npm i && npm test
```

### Recommended Performance Setup
```bash
# Add benchmarking framework
dotnet add package BenchmarkDotNet

# Profiling tools
dotnet tool install -g dotMemory.Unit
dotnet tool install -g JetBrains.dotTrace.GlobalTools

# Memory/CPU profiling
dotnet trace collect -- dotnet run YourApp.exe
```

## Performance Improvement Plan

### Round 1: Foundation & Critical Fixes
**Goal**: Establish measurement infrastructure and fix critical memory leaks
- [ ] Set up BenchmarkDotNet for systematic benchmarking
- [ ] Fix memory leaks in `append` operations (Issue #35)
- [ ] Add performance regression tests to CI
- [ ] Document baseline performance for all core operations
- [ ] **Expected Outcome**: Enable reliable performance work, fix crashes with large datasets

### Round 2: Core Algorithm Optimization  
**Goal**: Address O(n²) performance issues and optimize hot paths
- [ ] Eliminate O(n²) behavior in recursive patterns (Issue #57)
- [ ] Optimize `unfoldAsync` implementation for better memory usage
- [ ] Add fast paths for common operations using `AsyncSeqOp`
- [ ] Reduce per-operation allocations by 50%
- [ ] **Expected Outcome**: 5-10x performance improvement for typical workloads

### Round 3: Parallelism & Advanced Features
**Goal**: Enable true parallelism and advanced optimizations
- [ ] Fix `mapAsyncParallel` to achieve true parallelism (Issues #74, #95)  
- [ ] Implement efficient `AsyncSeq.cache` for multiple consumers
- [ ] Add vectorization support for mathematical operations
- [ ] Optimize continuation chains to prevent deep nesting
- [ ] **Expected Outcome**: True parallel processing, better resource utilization

### Round 4: Performance Parity & Advanced Optimizations
**Goal**: Match or exceed alternative implementations
- [ ] Zero-copy operations where possible
- [ ] Operation fusion to eliminate intermediate collections
- [ ] Advanced lazy evaluation improvements
- [ ] Benchmark against `IObservable<T>` and `seq<'T>` alternatives
- [ ] **Expected Outcome**: Performance parity with competing approaches

## Environment Requirements

### Development Setup
- **.NET 8.0**: Current test target framework
- **Release Configuration**: Essential for accurate measurement
- **BenchmarkDotNet**: Statistical benchmarking framework
- **Profiling Tools**: dotTrace (CPU), dotMemory (memory), PerfView (free alternative)

### Measurement Guidelines
- Use dedicated test machine for reproducible results
- Long-running tests to surface issues that only appear with sustained load
- Focus on one major bottleneck per improvement round
- Systematic before/after measurement for all changes

## Success Metrics

### Primary Metrics
- **Memory Usage**: Constant O(1) memory for streaming operations
- **Throughput**: Elements processed per second  
- **Latency**: Time to first element in streaming scenarios
- **GC Pressure**: Reduction in generation 0 collections

### Baseline Targets (Round 1)
- No OutOfMemoryException with any dataset size
- Linear O(n) performance for all operations  
- 50% reduction in allocations per operation
- Reliable 1-minute benchmark suite

---

> AI-generated content by [Daily Perf Improver](https://github.com/fsprojects/FSharp.Control.AsyncSeq/actions/runs/17331000071) may contain mistakes.

Daily Perf Improver: Research and Plan #190

Description

Performance Research and Improvement Plan

Executive Summary

Current Performance Testing Infrastructure

Testing Setup

Historical Performance Results

Critical Performance Issues Identified

🔴 HIGH PRIORITY: Memory Leaks in append Operations

🔴 HIGH PRIORITY: O(n²) Performance in Recursive Patterns

🟡 MEDIUM PRIORITY: Excessive append Function Calls

🟡 MEDIUM PRIORITY: Parallelism Issues

Performance Characteristics

Bottleneck Analysis

Allocation Patterns

Typical Workloads

Performance-Critical Operations (Priority Order)

Build and Testing Commands

Current Commands

Recommended Performance Setup

Performance Improvement Plan

Round 1: Foundation & Critical Fixes

Round 2: Core Algorithm Optimization

Round 3: Parallelism & Advanced Features

Round 4: Performance Parity & Advanced Optimizations

Environment Requirements

Development Setup

Measurement Guidelines

Success Metrics

Primary Metrics

Baseline Targets (Round 1)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

🔴 HIGH PRIORITY: Memory Leaks in `append` Operations

🟡 MEDIUM PRIORITY: Excessive `append` Function Calls