Skip to content

Kappa architecture

David Liu edited this page Nov 17, 2024 · 1 revision
  • 通过化批为流的方式实现流批一体
    • Log is the data
    • 与Delta architecture完全相反?
  • 它可以先读取数据库全量数据同步到数仓中,然后自动切换到增量模式,通过 CDC 读 Binlog 进行增量和全量的同步
  • AWS Aurora的migration也采用了类似的思想
  • Proposed by Jay Kreps @Confluent, simplifies the Lambda approach
  • Treating both real-time and batch processing as stream processing

Components

Stream Processing Layer

  • Ingests all data as an immutable log of events

drawback

Costly infrastructure with scalability issues:

  • Storing big data in an event streaming platform can be costly.
  • Solutions
    • use data lake approach from your cloud provider (like AWS S3 or GCP Google Cloud Storage).

Vendors

  • Apache Flink
Clone this wiki locally