-
Notifications
You must be signed in to change notification settings - Fork 0
Home
David Liu edited this page Jan 21, 2025
·
9 revisions
Welcome to the data-integration wiki!
- Streaming format: for sending an arbitrary length sequence. like iterator with lazy execution in yield
- Pros
- The most real-time.
- Cons
-
The format must be processed from start to end, and does not support random access
- For statistics, it needs windowing, cursor in stream analytics
-
- Pros
- Batch format: for serializing a fixed number of record batches. Supports random access
- aka. File format
- It is very useful when used with memory maps
-
data integration vs ETL
- Data integration = ETL + DataWarehouse
-
HevoData point ofview
- ETL is a specific type of data integration
aka. Data munging, Data
- It means data transform
- It closely aligned with ETL