Skip to content

Direct Query against Iceberg source tables #1933

@mbroecheler

Description

@mbroecheler

Suppose I have a shared mutation table that another SQRL job writes to:

/*+engine(iceberg) */
CREATE TABLE SinkTable (
    userid INT NOT NULL,
    total_tokens BIGINT NOT NULL,
    window_end TIMESTAMP_LTZ(3),
    window_time TIMESTAMP_LTZ(3)
);

and now I want to build an application that queries that table directly in iceberg with Duckdb (those are my only enabled database engines):

TokensByUser(userid INT):= SELECT * FROM SinkTable where userid = :userid;

Currently, that still requires a flink engine which essentially copies the table over into a new intermediate iceberg table that DuckDB then queries.
That is a bit wasteful. We could query the underlying table directly and cut Flink out entirely (and leave out as an engine).

This would require:

  • supporting the planner without a configured "stream" engine.
  • allowing "shortcuts" in the dag when planning database tables. If the table to be materialized is a source table and the source is in the same engine as the sink and the engine supports such sharing (only table formats support this), then we can use the source directly. The key is figuring out how to convert that to an EngineCreateTable.
  • support producing deployment assets without stream engine.

On the other hand, I'm wondering if this is generally a useful feature. If this is for one-time querying, you can do that much easier and cheaper outside of DataSQRL (e.g. Athena if tables are registered in Glue). If this is for building an application, you want to make sure that iceberg table is optimally partitioned and sorted for that query workload which often means that you need to copy the table and lay it out differently. Hence, this feature may not provide actual value.

Just putting this hear as food for thought and to collect feedback.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Backlog

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions