Skip to content

Conversation

@jackwatters45
Copy link
Owner

Summary

  • Add CacheService with R2-backed caching and ETags/conditional request support
  • Extend StatsService to use CacheService for efficient data fetching
  • Add comprehensive tests for cache service functionality

WIP

This is a work-in-progress PR - lint/tests may not pass yet.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 21, 2026

 ## 🚀 Preview Deployed

 Your changes have been deployed to a preview environment:

 **🌐 Docs:** https://laxdb-docs-pr-101.jack-watters.workers.dev
 **🌐 Marketing:** https://laxdb-marketing-pr-101.jack-watters.workers.dev
 **🌐 Website:** https://laxdb-web-pr-101.jack-watters.workers.dev

 Built from commit 6d10335

 ---
 <sub>🤖 This comment updates automatically with each push.</sub>

jackwatters45 and others added 15 commits January 22, 2026 19:52
- Add test:unit script excluding *.integration.test.ts
- Add test:integration script for integration tests only
- Add turbo test:unit task with caching
- Rename data-anomaly.test.ts to integration test (requires extracted data)

Workflow: Ralph loop uses fast test:unit, pre-merge uses full test suite.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Implement CacheService for R2-backed caching with conditional fetching
- Add ETag and Last-Modified support for cache validation
- Extend StatsService to use CacheService for efficient data fetching
- Add comprehensive tests for cache service functionality
@jackwatters45 jackwatters45 force-pushed the feat/backend-services-schema branch from 8e01a1e to 43e026b Compare January 22, 2026 09:01
@github-actions github-actions bot changed the title [WIP] feat: Backend services and CacheService implementation feat(pipeline): add database schema, repositories, services, and loader for pipeline data Jan 22, 2026
@github-actions
Copy link
Contributor

Code Review

Summary

This PR implements the database schema, repositories, services, and loader for the pipeline data ingestion system. It adds comprehensive support for storing and managing lacrosse stats from multiple leagues including player identities, stats, teams, games, standings, and scrape runs. The implementation follows Effect-TS patterns with typed errors, includes a caching layer with R2 backing, and provides services for player identity resolution and stats aggregation.

Title Convention

✅ Title follows conventional commit format correctly (all lowercase).

Feedback

Strengths:

  • Excellent test coverage with comprehensive unit tests for all services and repositories
  • Proper Effect-TS patterns throughout with tagged errors and service composition
  • Well-designed database schema with appropriate indexes, foreign keys, and constraints
  • Smart caching strategy with TTL configuration and stale-while-revalidate support
  • Clean separation of concerns: repos handle data access, services handle business logic
  • Identity resolution service properly handles normalization with special character mapping
  • Good use of cursor-based pagination for performance

Issues:

  1. Type Safety Violation in loader.service.ts

    • Line 115: Object.keys(data as object).toSorted() - uses as object type assertion
    • This violates the project's strict "no as Type" rule per CLAUDE.md
    • Should use proper type guards or Schema validation instead
  2. Missing Error Handling in Cache Service

    • cache.service.ts doesn't handle R2 errors gracefully in all cases
    • R2 operations can fail with network errors, quotas, etc. but some paths don't wrap in Effect error handling
    • Should use Effect.tryPromise with proper error mapping for all R2 operations
  3. Potential SQL Injection Risk

    • stats.repo.ts uses sql template tag in some queries but should verify all user inputs are properly parameterized
    • Review buildCursorCondition and ensure cursor values can't be manipulated
  4. Missing Index in Migration

    • pipeline_player_stat table has compound unique constraint on (source_player_id, season_id, game_id) but no dedicated index for just (source_player_id, season_id) queries which are common
    • Already has separate indexes but consider if compound index would be better
  5. Hardcoded Values

    • loader.service.ts:73-97 - LEAGUE_CONFIGS is hardcoded
    • Should be loaded from config or database for flexibility
    • What happens when a new league is added?
  6. Inconsistent Null Handling

    • Some services use Option from Effect for nullable values, others use | null
    • Should standardize on Effect's Option type throughout for consistency with Effect patterns
  7. Missing Documentation

    • While code has some JSDoc comments, complex business logic like identity resolution algorithm needs more explanation
    • normalizeName function is well-documented, but the overall matching strategy in IdentityService needs similar treatment
  8. Performance Concern

    • loader.service.ts reads entire JSON files into memory
    • For large datasets (thousands of players), should consider streaming or batching
    • No pagination mentioned for bulk loads

Suggestions

  1. Fix Type Safety Violation

    // Instead of:
    const content = JSON.stringify(data, Object.keys(data as object).toSorted());
    
    // Use Schema validation or type guard:
    const content = typeof data === 'object' && data !== null
      ? JSON.stringify(data, Object.keys(data).toSorted())
      : JSON.stringify(data);
  2. Add Schema Validation for Loader Inputs

    • Use Effect Schema to validate JSON files before processing
    • This would catch malformed data early and provide better error messages
  3. Consider Batch Operations

    • loader.service.ts processes entities one at a time
    • Could use db.insert().values([...]) for bulk inserts with better performance
  4. Add Migration Rollback

    • Current migration only has forward SQL
    • Consider adding rollback scripts for safer deployments
  5. Extract Configuration

    • Move LEAGUE_CONFIGS to a config file or database table
    • Would make adding new leagues easier without code changes
  6. Add Telemetry

    • Cache hit/miss rates would help tune TTL values
    • Track identity resolution confidence scores for monitoring

Overall

Strong implementation with excellent Effect-TS patterns and comprehensive test coverage. The architecture is well-designed with clear separation of concerns. Main concerns are the type safety violation (must fix per project rules), potential performance issues with large dataset loads, and some missing error handling. The caching strategy is well-thought-out and aligns with the spec. After addressing the type safety issue and improving error handling, this will be production-ready.

Recommendation: Request changes for type safety fix, then approve after verification.


Auto-generated review by Claude

jackwatters45 and others added 9 commits January 22, 2026 21:10
Add Effect Schema definitions and RPC contracts for:
- Stats: LeaderboardEntry, GetLeaderboardInput, LeaderboardResponse
- Players: CanonicalPlayer, SourcePlayer, GetPlayerInput, GetPlayersInput
- Teams: TeamDetails, TeamWithRoster, GetTeamInput, GetTeamsInput

Co-Authored-By: Claude <[email protected]>
Add Effect-based repos and services for pipeline data access:
- StatsRepo/StatsService: leaderboard queries with cursor pagination
- PlayersRepo/PlayersService: canonical player lookup with source records
- TeamsRepo/TeamsService: team details with roster information

Uses PgDrizzle for database queries with proper error handling.

Co-Authored-By: Claude <[email protected]>
Add Effect RPC handlers for stats, players, and teams:
- StatsRpcs: getLeaderboard with multi-league filtering
- PlayersRpcs: getPlayer, getPlayers with canonical resolution
- TeamsRpcs: getTeam, getTeams with roster data

Integrates with LaxdbRpc group and handlers.

Co-Authored-By: Claude <[email protected]>
Add REST API for public stats access:
- POST /api/stats/leaderboard - paginated leaderboard with league filtering
- StatsApiGroup with HttpApiBuilder handlers
- Integrates with LaxdbApi definition

Co-Authored-By: Claude <[email protected]>
Add lacrosse stats leaderboard page:
- URL state management with Effect Schema validation
- StatsTable with sortable columns (points, goals, assists)
- LeagueFilter checkboxes (PLL, NLL, MLL, MSL, WLA)
- Cursor-based pagination component
- TanStack Query integration with prefetching

Co-Authored-By: Claude <[email protected]>
Add league season configuration:
- LEAGUE_SEASONS with start/end dates for each league
- isInSeason helper handling year-boundary cases (NLL: Dec-May)
- getActiveLeagues returns currently in-season leagues
- getAllActiveLeagues returns non-historical leagues

Used by cron worker to determine which leagues to extract.

Co-Authored-By: Claude <[email protected]>
Add Cloudflare Workers scheduled handler:
- Hourly cron trigger for data extraction
- Season-aware league selection (only active leagues)
- Parallel extraction with error isolation per league
- Cache invalidation after successful loads
- Cloudflare worker types in tsconfig

Note: Extract/load functions are placeholders for MVP.

Co-Authored-By: Claude <[email protected]>
Configure Alchemy deployment for pipeline MVP:
- Add pipelineKV namespace for caching
- Enable API worker with PIPELINE_KV binding
- Add hourly cron trigger (0 * * * *)
- Update PR preview comments to include API URL

Co-Authored-By: Claude <[email protected]>
- Add @laxdb/pipeline workspace dependency to api package
- Remove unused PLLTeamDetail import from pll.extractor.ts
- Update bun.lock

Co-Authored-By: Claude <[email protected]>
Document simplified 8-item MVP plan for public /stats page:
- Stats/Players/Teams RPC endpoints
- /stats route with URL state management
- League filtering and pagination
- Unified cron worker with season awareness
- Alchemy deployment configuration

Co-Authored-By: Claude <[email protected]>
@github-actions
Copy link
Contributor

Code Review

Summary

This PR adds comprehensive database schema, repositories, services, and a loader for pipeline data. Key additions include:

  • CacheService with R2-backed caching and ETags/conditional request support
  • Complete pipeline database schema (11 tables) with foreign keys and indexes
  • Repository layer for players, teams, stats with proper Effect-TS patterns
  • Service layer with business logic including identity matching and cache integration
  • LoaderService for ingesting extracted data into the database
  • API integration (RPC + HTTP endpoints) for stats, players, and teams
  • Frontend stats page with leaderboard display
  • Cron scheduled handler for automated pipeline extraction (placeholder implementation)
  • Comprehensive test coverage (18 test files)

This represents significant infrastructure for the MVP pipeline feature. The PR description correctly notes this is WIP with potential lint/test failures.

Title Convention

Title follows conventional commit format: feat(pipeline): add database schema, repositories, services, and loader for pipeline data

  • Correct type: feat for new feature
  • Correct scope: (pipeline)
  • Lowercase description
  • Descriptive and accurate

Feedback

Critical Issues

  1. Effect.catchAll Anti-Pattern (BLOCKING - CLAUDE.md violation)

    • /home/runner/work/laxdb/laxdb/packages/pipeline/src/pll/explore-data.ts: Lines 77, 118, 180, 237, 253, 275, 292, 308, 336, 346
    • /home/runner/work/laxdb/laxdb/packages/pipeline/src/scraper/scraper.service.ts: Lines 62, 108
    • /home/runner/work/laxdb/laxdb/packages/pipeline/src/validate/validate.service.ts: Lines 19, 28, 47, 142, 175
    • /home/runner/work/laxdb/laxdb/packages/pipeline/src/validate/data-anomaly.integration.test.ts: Lines 150, 158, 170, 177
    • Issue: Multiple uses of Effect.catchAll which swallows typed errors (explicit anti-pattern in CLAUDE.md)
    • Impact: Defeats Effect-TS type safety by losing error type information
    • Fix Required: Replace with Effect.catchTag for each specific error type
    • Note: Some uses appear to be in exploration/validation scripts (not production code), but pattern should still be avoided for consistency
  2. Type Safety Concerns - Excessive as Type Assertions

    • /home/runner/work/laxdb/laxdb/packages/pipeline/src/service/players.repo.ts: Lines 162, 228, 298, 390-391
    • /home/runner/work/laxdb/laxdb/packages/pipeline/src/service/stats.repo.ts: Lines 180, 252, 327, 434, 507
    • /home/runner/work/laxdb/laxdb/packages/pipeline/src/service/teams.repo.ts: Lines 144, 187, 240, 271, 318
    • /home/runner/work/laxdb/laxdb/packages/pipeline/src/rpc/players.repo.ts: Lines 87, 126
    • /home/runner/work/laxdb/laxdb/packages/pipeline/src/rpc/stats.repo.ts: Lines 63, 171
    • Pattern: return result[0] as TypeWithDetails after joins
    • Issue: While CLAUDE.md prohibits as Type, these are technically safe because they're widening the Drizzle inferred type to include joined columns. However, the volume is concerning.
    • Risk: If join conditions change and columns are missing, runtime errors instead of compile-time errors
    • Better approach: Define explicit return types in Drizzle select clauses or use Schema validation
    • Severity: Medium - not unsafe but violates spirit of type safety guidelines
  3. Cron Handler Not Connected to Pipeline Extractors

    • /home/runner/work/laxdb/laxdb/packages/api/src/cron/scheduled.ts: Lines 24-32, 40-47
    • Issue: extractLeague and loadLeague are placeholder TODOs with no actual implementation
    • Impact: Hourly cron job will run but do nothing
    • Fix Required: Wire up to actual @laxdb/pipeline extractors and LoaderService
    • Note: PR description acknowledges this is WIP, so acceptable for this PR if documented
  4. Database Schema - Missing Foreign Key Relationships

    • /home/runner/work/laxdb/laxdb/packages/core/migrations/20260122085443_dusty_mentor/migration.sql
    • Observation: All pipeline tables use ON DELETE CASCADE
    • Risk: Deleting a league will cascade delete ALL data (players, teams, stats, games, seasons)
    • Consideration: Is this intended behavior? For a data pipeline storing historical data, this might be too aggressive. Consider RESTRICT or SET NULL for some relationships.
    • Severity: Medium - depends on business logic requirements
  5. Inconsistent Error Handling Between Service Layers

    • /home/runner/work/laxdb/laxdb/packages/pipeline/src/service/stats.service.ts vs /home/runner/work/laxdb/laxdb/packages/pipeline/src/rpc/stats.service.ts
    • Issue: Two different StatsService implementations with different patterns
    • Details:
      • service/stats.service.ts: Uses CacheService, canonical player resolution, complex business logic
      • rpc/stats.service.ts: Simpler, direct repo calls, different error handling
    • Confusion: Why two implementations? Which is authoritative?
    • Recommendation: Clarify architecture - appears service/ is for pipeline internal use while rpc/ is for external API

Medium Priority Issues

  1. Type Safety - Unknown and Any in Validation

    • /home/runner/work/laxdb/laxdb/packages/pipeline/src/extract/manifest.factory.ts: Line 40
    • Issue: Schema.Schema<TSeasonManifest, any> with eslint disable comment
    • Impact: Type erasure in manifest schema
    • Recommendation: Define proper encoded type instead of any
  2. Cache Service - Missing Background Revalidation for SWR

    • /home/runner/work/laxdb/laxdb/packages/pipeline/src/service/cache.service.ts: Lines 354-357
    • Issue: Comment acknowledges "Real SWR would use waitUntil for background refresh" but not implemented
    • Impact: Stale data returned but not refreshed in background
    • Recommendation: Add TODO to implement proper SWR with ExecutionContext.waitUntil when running in Cloudflare Worker
  3. Cache Key Invalidation Pattern

    • /home/runner/work/laxdb/laxdb/packages/api/src/cron/scheduled.ts: Lines 59-63
    • Issue: Cache keys in cron handler (cache:leaderboard:${league}) don't match CacheKeys patterns (leaderboard:...)
    • Impact: Cache invalidation won't work - different key format than what's actually cached
    • Fix Required: Use CacheKeys.leaderboard() builder or ensure prefix matches
  4. Test Coverage - Mock Implementation Quality

    • /home/runner/work/laxdb/laxdb/packages/pipeline/src/service/cache.service.test.ts: Lines 66, 178-181
    • Issue: Mock KV implementation doesn't fully replicate CF KV behavior (e.g., TTL expiration is simplified)
    • Recommendation: Consider using actual CF KV in integration tests via miniflare
  5. LoaderService - Hash Generation for Change Detection

    • /home/runner/work/laxdb/laxdb/packages/pipeline/src/load/loader.service.ts: Lines 114-117
    • Issue: generateSourceHash sorts object keys but this is fragile
    • Risk: Key order matters in JS - if upstream data changes key order without value changes, hash changes unnecessarily
    • Recommendation: Consider stable-stringify or document that this is intentional for detecting even structural changes

Minor Issues

  1. Drizzle Config Schema Path

    • /home/runner/work/laxdb/laxdb/packages/core/drizzle.config.ts: Line 7
    • Issue: Core drizzle config now includes ../pipeline/src/db/**/*.sql.ts
    • Risk: Coupling between packages at build time
    • Consideration: Is this intended? Might be better to have separate migration directories per package
    • Note: This works but creates tight coupling
  2. API Worker Bindings

    • /home/runner/work/laxdb/laxdb/alchemy.run.ts: Lines 42-44
    • Issue: New PIPELINE_KV binding added but not used in actual service layer yet
    • Recommendation: Document that this is for future use or wire it up to CacheService
  3. Frontend Stats Page - Type Safety

    • /home/runner/work/laxdb/laxdb/packages/web/src/routes/stats/index.tsx: Line 77
    • Issue: as { data: LeaderboardEntry[]; nextCursor: string | null } type assertion
    • Recommendation: Define proper schema and validate response instead of type assertion
  4. Console Logging in Production Code

    • /home/runner/work/laxdb/laxdb/packages/api/src/cron/scheduled.ts: Lines 25, 31, 41, 47, 56, 67, 82, 88, 92, 110, 113, 118
    • Issue: Extensive console.log usage in cron handler
    • Recommendation: Consider using Effect.log for consistency with rest of codebase
    • Note: Console.log is acceptable for CF Workers, but Effect.log integrates better with Effect runtime
  5. Database Index Coverage

    • Migration shows good index coverage on foreign keys and frequently queried columns
    • Positive: idx_pipeline_source_player_normalized_name for identity matching
    • Positive: Composite indexes on player_stat for efficient querying
    • Consideration: Monitor query performance once data is loaded - may need additional indexes on frequently joined columns
  6. Soft Deletes Pattern

    • /home/runner/work/laxdb/laxdb/packages/pipeline/src/db/source-players.sql.ts: Line 49
    • Positive: deletedAt field for soft deletes with proper filtering in queries
    • Observation: Consistent with best practices for data pipeline audit trails

Positive Observations

  1. Excellent Effect-TS Patterns (Mostly)

    • Services properly use Effect.Service pattern
    • Repos return typed Effects with proper error channels
    • Good use of Effect.gen for sequential operations
    • Proper dependency injection with dependencies: [...]
  2. Strong Test Coverage

    • 18 test files across the pipeline package
    • Tests for cache service, repos, services, and identity matching
    • Proper use of Effect test utilities and vitest
  3. Database Schema Design

    • Well-normalized schema with clear separation of concerns
    • Good use of composite unique constraints (e.g., league_id + source_id)
    • Source hash fields for change detection (incremental loading support)
    • Canonical player / source player separation for cross-league identity
  4. Cache Implementation

    • TTL configuration with different values for different data types
    • Stale-while-revalidate support (even if not fully implemented)
    • Structured cache key builders to avoid string concatenation errors
    • Metadata tracking for cache entries
  5. Type Safety - No any Types (Except Noted Exception)

    • Only one any in production code (manifest.factory.ts with eslint disable)
    • All other uses of "any" are in test files with expect.any() or comments
    • No use of ! (non-null assertion operator)
    • No @ts-ignore or @ts-expect-error directives found
  6. Proper Error Types

    • All errors use Schema.TaggedError pattern
    • Typed error channels throughout Effect chains
    • Good error messages with context
  7. Documentation

    • Good JSDoc comments on complex functions
    • Schema comments explain purpose and constraints
    • README-style comments in complex files

Suggestions

  1. Create Type-Safe Drizzle Result Validators

    // Instead of: return result[0] as SourcePlayerWithLeague
    // Consider:
    const SourcePlayerWithLeagueSchema = Schema.Struct({
      ...sourcePlayerFields,
      leagueName: Schema.String,
      leagueAbbreviation: Schema.String,
      leaguePriority: Schema.Number,
    });
    
    return Schema.decode(SourcePlayerWithLeagueSchema)(result[0]);

    This catches schema mismatches at runtime and makes types explicit.

  2. Add Integration Test for Complete Pipeline Flow

    • Test: Extract → Load → Query → Cache
    • Validates entire data flow end-to-end
    • Could use PLL as test case since it has credentials
  3. Document Service Architecture

    • Add AGENTS.md to /packages/pipeline explaining:
      • Why two service layers (service/ vs rpc/)
      • When to use CacheService vs direct repo
      • Identity matching strategy
    • Reference suggests this may already exist but not visible in diff
  4. Consider Batch Operations for Loader

    • /home/runner/work/laxdb/laxdb/packages/pipeline/src/load/loader.service.ts
    • Current implementation appears to insert one-by-one
    • For large datasets, batch inserts would be more efficient
    • Drizzle supports .insert().values([...]) with arrays
  5. Add Observability

    • Consider adding metrics for:
      • Cache hit rates
      • Extraction durations
      • Load success/failure rates
    • Effect.log is good start but structured logging would help production debugging
  6. Cache Warming Strategy

    • After loading new data, consider warming cache for common queries
    • E.g., pre-compute and cache leaderboards for popular league combinations
    • Could be part of invalidateCache → becomes refreshCache
  7. Schema Validation on Load

    • LoaderService reads JSON files but doesn't validate structure
    • Consider adding Schema.decode step to catch malformed extraction data early

Overall

Status: ✅ Approve with Comments

This is a well-architected, comprehensive foundation for the pipeline MVP. The code demonstrates:

  • Strong understanding of Effect-TS patterns (with noted exceptions)
  • Thoughtful database design for cross-league data aggregation
  • Good separation of concerns (extract → load → serve)
  • Proper error handling (mostly)
  • Comprehensive test coverage

Blocking Issues:

  1. Fix Effect.catchAll anti-patterns (CLAUDE.md violation) - at minimum in production code (service/, not exploration scripts)
  2. Wire up cron handler to actual extractors or document as future work
  3. Fix cache key mismatch between CacheService and cron invalidation

Recommended Before Merge:

  • Address excessive as Type assertions with better typing strategy
  • Review CASCADE delete behavior on league deletion
  • Clarify dual service layer architecture (document in AGENTS.md)

Can Be Addressed in Follow-up PRs:

  • Full SWR background revalidation implementation
  • Batch operations in loader
  • Integration test for complete flow
  • Observability improvements

The PR correctly notes this is WIP with potential lint/test failures. The architecture and patterns are solid - the issues are mostly about tightening type safety and completing TODOs rather than fundamental problems.

Great work on a complex feature!


Auto-generated review by Claude

- Use Effect Array.head + Option for safe array access
- Replace type assertions with type-safe iteration using LEAGUE_KEYS
- Remove unnecessary type assertion in stats.repo.ts

Co-Authored-By: Claude <[email protected]>
@github-actions
Copy link
Contributor

Code Review

Summary

This PR adds foundational pipeline infrastructure including 10 database tables for leagues/seasons/players/stats, Cloudflare Worker cron bindings, RPC/HTTP API endpoints for stats/players/teams, and supporting services. The PR establishes comprehensive database schema with proper indexing and foreign keys. However, several critical issues must be addressed before merging, particularly around incomplete Effect-TS integration in the cron handler and placeholder implementations.

Title Convention

✅ Title follows conventional commit format correctly: "feat(pipeline): add database schema, repositories, services, and loader for pipeline data" - lowercase with proper scope.

Feedback

Critical Issues - Must Fix Before Merge:

  1. Cron handler breaks Effect-TS pattern (packages/api/src/cron/scheduled.ts)

    • Uses Promise-based async/await and Promise.allSettled instead of Effect
    • Per CLAUDE.md: "Effect patterns are strict" - this violates core architecture
    • Missing typed error handling with Effect.catchTag
    • Can't compose with Effect services or trace through Effect context
    • Should rewrite using Effect.gen and Effect.all with proper error types
  2. TODO placeholder code in production cron (packages/api/src/cron/scheduled.ts:209-227)

    • extractLeague, loadLeague functions only log messages with TODO comments
    • Will be triggered hourly by cron, wasting compute and masking real issues
    • Either implement or remove cron binding until ready
  3. Missing test coverage

    • No unit tests for service functions, cron handler logic, or data transformations
    • No integration tests for API endpoints or database constraints
    • Critical for new pipeline functionality before deployment

High Priority Issues:

  1. Service imports may be undefined

    • packages/api/src/pipeline/{stats,players,teams}.rpc.ts import services from @laxdb/pipeline/rpc/
    • These services don't appear in truncated diff - verify they exist or code will fail at runtime
  2. Missing error flow demonstration in handlers (packages/api/src/pipeline/stats.handlers.ts)

    • API endpoints declare errors (NotFoundError, DatabaseError, etc.) but handlers only show happy path
    • Should demonstrate error mapping with Effect.catchTag or remove unused error declarations

Database Schema Issues:

  1. Missing foreign key on game_id (migration.sql:664)

    • pipeline_player_stat.game_id is varchar(50) but not constrained to pipeline_game.id (serial)
    • Data integrity gap - stats can reference non-existent games
    • Add FK or clarify if game_id references external system
  2. Nullable critical columns

    • pipeline_game.venue, pipeline_season.name, pipeline_team.source_id should likely be NOT NULL
    • Tighten constraints for data consistency
  3. Missing index for leaderboard queries

    • pipeline_standing.rank has no index but leaderboard queries will sort by rank
    • Add composite index: (season_id, rank)

Security Concerns:

  1. Public API endpoints lack rate limiting (packages/api/src/pipeline/stats.api.ts)

    • Stats endpoints are unauthenticated (comments say "public endpoints")
    • No auth middleware shown - vulnerable to DoS attacks
    • Add rate limiting middleware or API key requirement
  2. No input validation shown

    • Contract payload validation not demonstrated
    • Ensure contracts validate shape and bounds to prevent injection attacks

Architecture Strengths:

  • ✅ Proper service/repo/contract layer structure
  • ✅ Comprehensive database foreign keys with CASCADE DELETE
  • ✅ Good index coverage for common query patterns (season+date, normalized names)
  • ✅ Unique constraints prevent duplicates
  • ✅ PIPELINE_KV namespace properly isolated for caching
  • ✅ Cron resilience with Promise.allSettled (though should use Effect instead)
  • ✅ Soft deletes on players for historical tracking

Suggestions

  1. Rewrite cron handler to use Effect-based patterns:
const results = yield* Effect.forEach(activeLeagues, (league) =>
  extractLeague(league, env).pipe(
    Effect.catchTag("ExtractionError", (err) =>
      Effect.succeed({ league, error: err })
    )
  )
);
  1. Add comprehensive test suite covering:

    • Service functions (getLeaderboard, getPlayerStats, getTeamStats)
    • Cron handler idempotency and error isolation
    • Database constraint enforcement
    • End-to-end API calls
  2. Add validation middleware for public Stats API endpoints

  3. Consider adding database transaction support for loader operations

  4. Document contract expectations in AGENTS.md for pipeline package

Overall

This is a well-structured foundation for the pipeline system with solid database design and proper architectural layering. However, it should not be merged in its current state due to:

  • Critical Effect-TS pattern violations in cron handler
  • Placeholder implementations in production code
  • Missing test coverage
  • Incomplete error handling

The database schema is well-designed with appropriate indexes and constraints (with minor improvements needed). Once the Effect integration is corrected, placeholders are removed, and tests are added, this will be a strong addition to the codebase.

Recommendation: Mark as WIP and address critical issues (1-3) before requesting re-review.


Auto-generated review by Claude

- Move helper functions to module scope to fix consistent-function-scoping:
  - incremental.service.ts: normalizeOptions
  - season-config.ts: isTimestampStale
  - cache.service.ts: serialize
  - stats.service.ts: computeAggregatedTotals, groupStatsByLeague
  - mll.client.ts: normalizeTeamName, extractTeamId, parseDate, parseScore, getScheduleUrlPriority

- Fix no-negated-condition by flipping ternary expressions:
  - stats.service.ts: seasonId checks
  - nll.schema.ts: null checks for weight, team_id, winningSquadId
  - msl.client.ts: team_id undefined checks

- Fix max-depth in mll.client.ts by using early continues
- Add exhaustive case for 'unknown' in cache.service.ts switch
- Add compare function to toSorted() in cache.service.ts
- Remove unused imports across multiple files

Remaining warnings are:
- Test file false positives (unbound-method for mock assertions)
- Necessary type assertions for Drizzle query results

Co-Authored-By: Claude <[email protected]>
@github-actions
Copy link
Contributor

Code Review

Summary

This PR adds comprehensive infrastructure for pipeline data extraction and querying. It introduces:

  • Hourly cron trigger on API worker for scheduled data extraction
  • 10 new database tables (pipeline_league, pipeline_season, pipeline_team, pipeline_source_player, pipeline_canonical_player, pipeline_player_identity, pipeline_player_stat, pipeline_game, pipeline_standing, pipeline_scrape_run)
  • Stats/Teams/Players query endpoints (RPC + HTTP)
  • Service layer with repositories for stats queries
  • Cache invalidation logic using new PIPELINE_KV namespace

The database schema is well-designed with proper indexes and foreign key constraints. The service/repo/contract structure follows Effect-TS patterns correctly. However, the cron handler has placeholder implementations (TODOs) and some type safety concerns exist.

Title Convention

✅ Title follows conventional commit format: feat(pipeline): add database schema, repositories, services, and loader for pipeline data

  • Lowercase throughout
  • Proper scope (pipeline)
  • Descriptive summary

Feedback

Type Safety Concerns

  1. packages/pipeline/src/rpc/stats.repo.ts:170 - Hard cast as TeamStatSummary[] violates the "no as Type" rule from CLAUDE.md. Should use Schema validation or proper type narrowing.

  2. packages/api/src/cron/scheduled.ts:194-197 - Env interface doesn't use Effect.Service pattern. Should leverage Effect Config for DATABASE_URL and proper service composition for KVNamespace.

  3. packages/core/src/pipeline/stats.contract.ts - NotFoundError instantiation passes id: input.playerId (number) but error domain may expect string. Verify type compatibility.

  4. packages/pipeline/src/rpc/stats.repo.ts:79-82 - Silent handling of NaN from cursor parsing could cause subtle bugs. Should explicitly handle invalid cursors with typed error.

Effect-TS Pattern Issues

  1. packages/api/src/cron/scheduled.ts:206-250 - extractLeague, loadLeague, invalidateCache use async/await instead of Effect types. Should return Effect<void, Error> for proper composition and error handling.

  2. packages/api/src/cron/scheduled.ts:247 - env.PIPELINE_KV.delete() can fail but errors are silently swallowed in Promise.allSettled. Consider adding retry logic or circuit breaker pattern.

Schema/Configuration

  1. LeagueAbbreviation defined in two places - packages/core/src/pipeline/stats.schema.ts:4 uses Schema.Literal while packages/pipeline/src/config/seasons.ts:8 likely uses a type alias. These could drift out of sync. Consider single source of truth.

  2. packages/pipeline/src/rpc/stats.repo.ts:36 - COALESCE suggests nullable columns but defaults to 0. Schema should clarify if stats can actually be null or if defaults always apply.

Missing Implementation

  1. packages/api/src/cron/scheduled.ts:209, 225 - Core business logic (extractLeague, loadLeague) marked TODO. Currently just logs, won't actually process data. Should either implement or clearly document WIP status in PR description.

  2. Service layer exports - StatsService.Default, PlayersService.Default, TeamsService.Default assumed to exist. Verify they're properly exported from packages/pipeline/src/rpc/index.ts.

Test Coverage

  • No test files for the cron scheduled handler
  • No tests for stats service/repo query logic
  • No integration tests for new API endpoints
  • Given the complexity of cursor pagination, stats aggregation, and error handling, tests are critical before merge

SQL/Query Considerations

  1. Cursor pagination (stats.repo.ts:109) - Fetches limit + 1 but no upper bound validation. Schema limits to 100, but defensive programming suggests double-checking.

  2. Complex JOIN queries - Stats repo has multi-table joins with aggregations. Consider adding query performance tests or EXPLAIN ANALYZE to verify index usage.

Suggestions

  1. Add Schema validation at repo boundaries - Replace as casts with Schema.decodeUnknown() to maintain type safety guarantees.

  2. Refactor cron handler to use Effect - Convert scheduled.ts to use Effect.gen and proper service composition:

    export const scheduled = (controller: ScheduledController, env: Env, ctx: ExecutionContext) =>
      Effect.gen(function* () {
        const kv = yield* PipelineKVService
        const db = yield* DatabaseService
        // ... rest of logic
      }).pipe(Effect.runPromise)
  3. Add explicit error handling for cursor parsing - Return ValidationError instead of silently skipping:

    const cursorId = Number.parseInt(input.cursor, 10);
    if (Number.isNaN(cursorId)) {
      return Effect.fail(new ValidationError({ message: "Invalid cursor" }))
    }
  4. Add test coverage before merge - At minimum:

    • Unit tests for stats.repo pagination logic
    • Integration tests for API endpoints
    • Tests for scheduled handler error isolation (Promise.allSettled behavior)
  5. Consider adding cache TTL - Pipeline KV cache keys don't appear to have expiration. Should consider TTL to prevent unbounded growth.

  6. Document WIP status - PR description says "WIP - lint/tests may not pass yet" but the scope of remaining work (TODOs in cron handler) should be clearly documented.

Overall

This is a well-structured foundation for the pipeline infrastructure. The database schema is properly normalized with good indexing strategy. The service/repo separation is clean and follows Effect-TS patterns in most places. The API surface (RPC + HTTP) is well-designed.

However, there are blocking issues that should be addressed before merge:

  • Type safety violations (hard casts, any usage risk)
  • Missing Effect patterns in cron handler
  • No test coverage for critical path (stats queries, cron execution)
  • Placeholder implementations (TODOs) that leave functionality incomplete

Recommendation: Request changes to fix type safety issues and add test coverage. The placeholder TODOs should either be implemented or the PR scope adjusted to clearly indicate this is infrastructure-only (no data extraction yet).


Auto-generated review by Claude

Refactored mock KV setup to expose mock functions directly as separate
variables (getMock, putMock, deleteMock, listMock) instead of accessing
them via mockKV.kv.method which caused unbound-method lint violations.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@github-actions
Copy link
Contributor

Code Review

Summary

This PR adds comprehensive pipeline data infrastructure including database schemas for 11 tables (leagues, seasons, teams, players, games, stats, standings, scrape runs, player identities, canonical players, and team seasons), along with repositories, services, and loader logic. It also adds CacheService with R2-backed caching, ETags support, RPC/HTTP API endpoints for stats/players/teams, cron-based scheduled data extraction, and extensive test coverage.

Title Convention

✅ Title follows conventional commit format: feat(pipeline): with lowercase description.

Feedback

Positive:

  • Comprehensive schema design: Well-thought-out pipeline tables with proper indexes, foreign keys, and constraints. The separation between pipeline_source_player (raw league data) and pipeline_canonical_player (deduplicated identities) is architecturally sound
  • Strong test coverage: Includes extensive tests for cache service (509 lines), identity service (461 lines), repos (341-259 lines), and services (267-440 lines)
  • Effect-TS patterns: Correctly uses Effect patterns with typed errors, proper service layers, and dependency injection
  • Good documentation: Cron handler has clear JSDoc comments explaining the scheduled extraction flow
  • Infrastructure additions: Properly adds PIPELINE_KV namespace, API worker with cron triggers, and bindings

Issues:

  1. PR marked as WIP - The PR body states "This is a work-in-progress PR - lint/tests may not pass yet." This should be resolved before merging

  2. TODO placeholders in production code:

    • packages/api/src/cron/scheduled.ts:209-213 - extractLeague is a placeholder that just logs
    • packages/api/src/cron/scheduled.ts:222-227 - loadLeague is a placeholder
    • These TODOs should be implemented or removed, and the cron should not be deployed with placeholders
  3. Missing database connection in cron handler:

    • The scheduled function receives DATABASE_URL in env bindings but never uses it
    • loadLeague would need actual database access to insert/update records
    • Consider providing Effect layers or database clients to these functions
  4. Drizzle config references non-existent files:

    • packages/core/drizzle.config.ts:607 adds "../pipeline/src/db/**/*.sql.ts" to schema paths
    • This creates a cross-package dependency where core migrations depend on pipeline package structure
    • Consider moving pipeline schemas to packages/core/src/pipeline/ to maintain package boundaries
  5. Large migration file - The migration is 212 lines of SQL and 6165 lines of JSON snapshot. While acceptable for initial schema, future changes should be smaller and more incremental

  6. Unused parameters - Cron handler has unused parameters prefixed with _ (_controller, _ctx) which is correct, but ensure ExecutionContext isn't needed for waitUntil() patterns

  7. Error handling in cron - Promise.allSettled correctly prevents one league from blocking others, but failed leagues just log errors with no retry mechanism or alerting

  8. No rate limiting - The cron runs hourly for all active leagues with no rate limiting or backpressure handling for external APIs

  9. Cache invalidation keys hardcoded:

    • packages/api/src/cron/scheduled.ts:241-245 - Cache keys like cache:leaderboard:${league} are hardcoded
    • These should be shared constants with the CacheService to prevent desync
  10. Missing validation - StatsService, PlayersService, and TeamsService contract payloads should validate league abbreviations, season IDs, etc.

Suggestions

  1. Remove WIP status and ensure all tests/lint pass before merging
  2. Implement or remove TODO placeholders in cron handlers - don't deploy non-functional scheduled jobs
  3. Add database client provisioning to cron handlers (use Effect SQL layers)
  4. Move packages/pipeline/src/db/*.sql.ts files to packages/core/src/pipeline/db/*.sql.ts to maintain proper package boundaries
  5. Add shared constants file for cache key patterns to prevent desync between CacheService and invalidation logic
  6. Add retry logic and alerting for failed cron jobs (consider Dead Letter Queue pattern)
  7. Add rate limiting or throttling for external API calls in extractLeague implementations
  8. Consider adding a pipeline_scrape_run record at the start of each cron execution to track success/failure rates over time
  9. Add integration tests for the full extract → load → cache invalidation flow
  10. Document the player identity resolution algorithm and match_method values ('exact', etc.)

Overall

This is a substantial and well-structured addition to the pipeline infrastructure. The database schema design is solid, Effect patterns are correctly applied, and test coverage is comprehensive. However, the PR is explicitly marked WIP with placeholder implementations in the cron handler, which should not be merged to production. Once the TODOs are resolved, tests pass, and the architectural concerns (cross-package dependencies, missing DB connections) are addressed, this will be ready for merge.

Recommendation: Request changes - resolve WIP status, implement or remove TODO placeholders, fix cross-package schema dependencies.


Auto-generated review by Claude

- Fix unbound-method lint errors in cache.service.test.ts by exposing
  mock functions directly
- Add RpcStatsClient for typed stats API access
- Fix TypeScript errors in stats page with proper response typing
- Apply formatter fixes across pipeline and web packages

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@github-actions
Copy link
Contributor

Code Review

Summary

This PR implements a comprehensive data pipeline infrastructure for ingesting and serving statistics from professional lacrosse leagues (PLL, NLL, MLL, MSL, WLA). The implementation includes:

  • Database schema: 11 new tables (leagues, seasons, teams, players, stats, games, standings, identities, scrape runs)
  • LoaderService (1318 lines): ETL operations with hash-based idempotent upserts
  • IdentityService: Cross-league player matching and deduplication with fuzzy name normalization
  • CacheService: R2-backed stale-while-revalidate caching with TTL differentiation
  • RPC endpoints: Stats/Players/Teams services with proper Effect-TS patterns
  • Cron infrastructure: Scheduled extraction (currently placeholder)
  • Test coverage: 18 test files including integration tests for all league extractors

Title Convention

✅ Title follows conventional commit format: feat(pipeline): add database schema, repositories, services, and loader for pipeline data

  • Lowercase description (correct)
  • Proper scope (correct)
  • Clear description of changes (correct)

Feedback

🔴 Critical Issues (3)

  1. LoaderService has no tests - The 1318-line loader service (packages/pipeline/src/load/loader.service.ts) contains critical ETL logic with hash-based change detection and idempotent upserts, but has zero test coverage. This is a blocker given the complexity and criticality of this code path.

    • Impact: High risk of data corruption or silent failures in production
    • Required: Add integration tests with test database before merge
  2. Cron handler is placeholder code - The scheduled extraction handler (packages/api/src/cron/scheduled.ts) contains TODO comments for actual implementation (lines 33, 49, 250) but is configured to run hourly in production via alchemy.run.ts:153.

    • Impact: Deployed cron job that does nothing, could mask monitoring failures
    • Required: Either implement the TODOs or remove the cron trigger until ready
  3. No rate limiting on public endpoints - The stats API endpoints (packages/api/src/pipeline/stats.api.ts) are public (no auth) with no rate limiting configured.

    • Impact: Vulnerable to DoS attacks or data scraping abuse
    • Required: Add rate limiting middleware or CloudFlare edge rate limiting

🟡 Medium Issues (4)

  1. Missing database constraints - The pipeline_player_stat table allows negative values for goals/assists, and enum-like fields (status, stat_type, match_method) lack check constraints.

    • Recommendation: Add check constraints or PostgreSQL enums in follow-up migration
  2. LoaderService code duplication - The loadTeams, loadPlayers, and loadStats methods share ~70% identical code for hash-based change detection and upsert logic.

    • Recommendation: Extract generic loadEntity<T> function to reduce duplication by 500+ lines
  3. Sequential database queries in loops - The loader service uses N+1 query pattern (see loader.service.ts:414-525) which could be slow for historical imports.

    • Impact: Estimated 10-30 minutes for full historical load (2001-2024)
    • Recommendation: Implement bulk upsert or parallel processing in follow-up PR
  4. Missing query indexes - Frequently queried columns lack indexes:

    • pipeline_player_stat.stat_type (for filtering regular vs playoff stats)
    • pipeline_season.active (for current season queries)
    • pipeline_game.status (for filtering by game state)
    • Recommendation: Add indexes in follow-up migration

🟢 Low Issues / Observations (3)

  1. Minor type cast in stats repo - packages/pipeline/src/rpc/stats.repo.ts:182 uses as TeamStatSummary[] instead of inferred types. Consider using explicit .pipe(Effect.map(...)) if type inference fails.

  2. No logging context in loader - Batch operations lack structured logging with league/season/entity context, making debugging difficult.

  3. Cron observability gap - No CloudFlare Workers metrics or alerts configured for cron job success/failure rates.

Positive Observations

✅ Excellent Effect-TS Pattern Compliance

  • Proper use of Effect.catchTag for typed error handling (no Effect.catchAll in production code)
  • No unsafe type assertions (any, !, as Type) in core services
  • Service composition via Effect.Service with proper dependency injection
  • Schema validation via decodeArguments in critical paths

✅ Strong Database Design

  • Comprehensive foreign keys with ON DELETE CASCADE
  • source_hash columns for idempotent updates
  • Unique constraints on natural keys
  • Proper indexing on join columns for leaderboard queries

✅ Well-Architected Caching Strategy

  • Differentiated TTLs based on data volatility (5min for leaderboards, 24hr for games)
  • Stale-while-revalidate pattern with 20% SWR window
  • Proper key namespacing with prefixes
  • Separate KV namespace for pipeline caching

✅ Solid Test Coverage

  • 18 test files with proper use of @effect/vitest
  • Integration tests for all 5 league extractors
  • Mock KV namespace for cache service tests
  • Identity service has 10+ edge cases covered

✅ Security Best Practices

  • No hardcoded secrets (proper Infisical usage)
  • SQL injection protection via Drizzle's tagged templates
  • JSON parsing followed by Effect Schema validation

Suggestions

  1. Before merge: Address the 3 critical issues (tests, cron placeholder, rate limiting)

  2. Consider adding:

    • Query timeout configuration for long-running leaderboard queries
    • EXPLAIN ANALYZE in tests to verify index usage
    • Structured logging with league/season context in loader
    • Health check endpoint that reports cron implementation status
  3. Future optimization opportunities:

    • Batch database operations to reduce load time from 20min → 3min
    • Add CloudFlare Workers metrics/alerts for observability
    • Monitor cache hit/miss ratios

Overall

Quality Score: 8.5/10

This is a well-architected, production-ready implementation that demonstrates strong engineering practices:

  • Excellent Effect-TS pattern compliance
  • Comprehensive database schema design with proper normalization
  • Strong separation of concerns (extract → load → serve)
  • Sophisticated identity resolution for cross-league player matching
  • Thoughtful caching strategy with appropriate TTLs

The three critical issues are straightforward to address and don't require architectural changes. The LoaderService duplication is acceptable for MVP but should be refactored post-merge to improve maintainability.

Recommendation: Request changes - address the 3 critical blockers (LoaderService tests, cron placeholder implementation/removal, rate limiting), then approve.


Auto-generated review by Claude

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants