Skip to content

ScalaCast Large-Scale Scalability Improvement Plan #47

@koke1997

Description

@koke1997

ScalaCast Large-Scale Scalability Improvement Plan

Executive Summary

This plan addresses critical scalability bottlenecks in ScalaCast to support large-scale video streaming deployment using open-source technologies. The current architecture has fundamental limitations that prevent scaling beyond a few hundred concurrent users.

Current Architecture Issues Identified

1. Centralized Server Bottlenecks

  • Blocking I/O with traditional ServerSocket
  • Sequential client request processing
  • Single point of failure
  • No horizontal scaling capability

2. Video Processing Limitations

  • Entire video files loaded into memory
  • Fixed 1MB chunk size regardless of network conditions
  • No caching mechanism for video chunks
  • Synchronous blocking operations

3. Peer Discovery Problems

  • Hardcoded static peer lists
  • No dynamic peer discovery
  • Missing load balancing across peers
  • Potential broadcast storm issues

4. WebSocket Connection Issues

  • Global mutable connection storage
  • No connection limits or cleanup
  • Memory leaks from stale connections
  • Sequential broadcasting bottlenecks

Phase 1: Core Infrastructure Improvements (Weeks 1-4)

1.1 Replace Blocking I/O with Akka Streams

Technology: Akka HTTP + Akka Streams

// Replace current blocking server with:
val route = pathPrefix("api" / "v1") {
  path("video-stream" / Segment) { streamId =>
    get {
      complete(videoStreamingService.getVideoStream(streamId))
    }
  }
}

Benefits:

  • Handle 10,000+ concurrent connections
  • Non-blocking reactive streams
  • Built-in backpressure handling

1.2 Implement Reactive Video Chunking

Technology: Akka Streams + FS2

def chunkVideoStream(videoPath: String): Source[ByteString, NotUsed] = {
  FileIO.fromPath(Paths.get(videoPath))
    .via(Flow[ByteString].grouped(chunkSize))
    .map(_.reduce(_ ++ _))
}

Benefits:

  • Stream processing without loading full video
  • Adaptive chunk sizing based on network conditions
  • Memory-efficient processing

1.3 Add Redis Caching Layer

Technology: Redis + Lettuce (async Redis client)

class VideoChunkCache @Inject()(redis: RedisAsyncCommands[String, ByteArray]) {
  def getChunk(videoId: String, chunkIndex: Int): Future[Option[Array[Byte]]] = {
    redis.get(s"video:$videoId:chunk:$chunkIndex").asScala
  }
}

Benefits:

  • Cache frequently accessed video chunks
  • Reduce disk I/O by 80-90%
  • Distributed caching across multiple instances

Phase 2: Distributed Architecture (Weeks 5-8)

2.1 Implement Service Discovery with Consul

Technology: Consul + Akka Discovery

val discovery = ServiceDiscovery(system).discovery
discovery.lookup("video-streaming-service", 5.seconds)

Benefits:

  • Dynamic peer discovery
  • Health checking
  • Automatic failover

2.2 Add Load Balancing with HAProxy

Configuration:

backend video_servers
    balance roundrobin
    server video1 10.0.1.10:8080 check
    server video2 10.0.1.11:8080 check
    server video3 10.0.1.12:8080 check

Benefits:

  • Distribute load across multiple instances
  • SSL termination
  • Connection pooling

2.3 Message Queue Integration with Apache Kafka

Technology: Kafka + Akka Kafka Connector

val kafkaProducer = Producer.plainSink(producerSettings, system)
val peerDiscoveryEvents = Source
  .tick(30.seconds, 30.seconds, PeerHeartbeat())
  .map(event => ProducerRecord("peer-discovery", event))
  .to(kafkaProducer)

Benefits:

  • Decouple peer communication
  • Reliable message delivery
  • Horizontal scaling of message processing

Phase 3: Performance Optimizations (Weeks 9-12)

3.1 Connection Management with HikariCP

Technology: HikariCP + PostgreSQL

val config = HikariConfig()
config.setMaximumPoolSize(50)
config.setConnectionTimeout(30000)
config.setIdleTimeout(600000)

Benefits:

  • Efficient database connection pooling
  • Automatic connection health checking
  • Optimized for high-throughput applications

3.2 Circuit Breaker Pattern with Akka

Technology: Akka Circuit Breaker

val breaker = new CircuitBreaker(
  scheduler = system.scheduler,
  maxFailures = 5,
  callTimeout = 10.seconds,
  resetTimeout = 1.minute
)

def sendDataWithCircuitBreaker(peer: String, data: String): Future[Unit] = {
  breaker.withCircuitBreaker(sendDataToPeer(peer, data))
}

Benefits:

  • Prevent cascade failures
  • Automatic recovery
  • Improved system resilience

3.3 Video Transcoding with FFmpeg Integration

Technology: FFmpeg + Cats Effect

def transcodeVideo(input: String, output: String, quality: VideoQuality): IO[Unit] = {
  val ffmpegCmd = Seq(
    "ffmpeg", "-i", input,
    "-c:v", "libx264",
    "-preset", "fast",
    "-crf", quality.crf.toString,
    output
  )
  Process(ffmpegCmd).run[IO]
}

Benefits:

  • Multiple bitrate streaming
  • Adaptive quality based on client bandwidth
  • Industry-standard video processing

Phase 4: Monitoring & Observability (Weeks 13-16)

4.1 Metrics with Prometheus + Grafana

Technology: Kamon + Prometheus

val videoStreamCounter = Kamon.counter("video.streams.active")
val chunkRequestsHistogram = Kamon.histogram("video.chunk.request.duration")

// In streaming logic:
videoStreamCounter.increment()
chunkRequestsHistogram.record(processingTime)

Benefits:

  • Real-time performance metrics
  • Visual dashboards
  • Alerting on anomalies

4.2 Distributed Tracing with Jaeger

Technology: OpenTracing + Jaeger

implicit val tracer: Tracer = GlobalTracer.get()

def streamVideo(videoId: String)(implicit span: Span): Future[VideoStream] = {
  val childSpan = tracer.buildSpan("video-processing").asChildOf(span).start()
  // Processing logic
  childSpan.finish()
}

Benefits:

  • Track requests across microservices
  • Identify performance bottlenecks
  • Debug distributed system issues

4.3 Centralized Logging with ELK Stack

Technology: Logback + Elasticsearch + Kibana

val logger = LoggerFactory.getLogger(this.getClass)
logger.info("Video stream started", 
  kv("videoId", videoId),
  kv("clientId", clientId),
  kv("bandwidth", estimatedBandwidth)
)

Benefits:

  • Centralized log aggregation
  • Real-time log search and analysis
  • Error tracking and alerting

Phase 5: Advanced Features (Weeks 17-20)

5.1 CDN Integration with MinIO

Technology: MinIO (S3-compatible object storage)

val minioClient = MinioClient.builder()
  .endpoint("http://localhost:9000")
  .credentials("accessKey", "secretKey")
  .build()

def uploadVideoChunk(videoId: String, chunkIndex: Int, data: Array[Byte]): Future[Unit] = {
  Future {
    val objectName = s"videos/$videoId/chunk_$chunkIndex.mp4"
    minioClient.putObject(
      PutObjectArgs.builder()
        .bucket("video-chunks")
        .`object`(objectName)
        .stream(new ByteArrayInputStream(data), data.length, -1)
        .build()
    )
  }
}

Benefits:

  • Distributed video storage
  • Global content delivery
  • Reduced server bandwidth usage

5.2 Auto-scaling with Kubernetes

Configuration: deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: scalacast-video-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: scalacast-video
  template:
    spec:
      containers:
      - name: video-service
        image: scalacast/video-service:latest
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: scalacast-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: scalacast-video-service
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Benefits:

  • Automatic scaling based on load
  • Zero-downtime deployments
  • Container orchestration

Implementation Timeline

Phase Duration Key Deliverables Expected Capacity Improvement
Phase 1 4 weeks Non-blocking I/O, Video streaming, Caching 10x (100 → 1,000 users)
Phase 2 4 weeks Service discovery, Load balancing, Message queues 10x (1,000 → 10,000 users)
Phase 3 4 weeks Connection pooling, Circuit breakers, Video transcoding 5x (10,000 → 50,000 users)
Phase 4 4 weeks Monitoring, Logging, Tracing Operational excellence
Phase 5 4 weeks CDN, Auto-scaling 10x (50,000 → 500,000 users)

Resource Requirements

Development Environment

  • Docker Compose setup for local development
  • Testcontainers for integration testing
  • JMeter for load testing

Production Infrastructure (Open Source)

  • Kubernetes cluster (3+ nodes)
  • PostgreSQL (primary database)
  • Redis (caching layer)
  • Apache Kafka (message streaming)
  • MinIO (object storage)
  • Consul (service discovery)
  • HAProxy (load balancing)
  • Prometheus + Grafana (monitoring)
  • ELK Stack (logging)
  • Jaeger (distributed tracing)

Expected Performance Improvements

Before Implementation

  • Concurrent Users: 100-200
  • Throughput: 50 Mbps
  • Latency: 500ms+ average
  • Availability: 95% (single point of failure)

After Implementation

  • Concurrent Users: 500,000+
  • Throughput: 10+ Gbps
  • Latency: 50ms average
  • Availability: 99.9% (distributed redundancy)

Risk Mitigation

Technical Risks

  1. Migration Complexity: Implement incremental migration with feature flags
  2. Data Consistency: Use distributed transactions with Saga pattern
  3. Network Partitions: Implement eventual consistency with CRDT

Operational Risks

  1. Learning Curve: Provide team training on new technologies
  2. Monitoring Gaps: Implement comprehensive observability from day one
  3. Deployment Issues: Use blue-green deployments with rollback capability

Success Metrics

Performance KPIs

  • Response Time: P95 < 100ms
  • Throughput: > 1M requests/minute
  • Error Rate: < 0.1%
  • Uptime: > 99.9%

Business KPIs

  • Concurrent Stream Capacity: 500,000+ users
  • Global Latency: < 50ms worldwide
  • Cost per Stream: < $0.01/hour
  • Time to Scale: < 30 seconds

Next Steps

  1. Week 1: Set up development environment with Docker Compose
  2. Week 2: Implement Akka HTTP replacement for blocking I/O
  3. Week 3: Add Redis caching layer for video chunks
  4. Week 4: Implement reactive video streaming with Akka Streams

This plan provides a comprehensive roadmap to transform ScalaCast from a proof-of-concept into a production-ready, large-scale video streaming platform using entirely open-source technologies.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions