Direct Query against Iceberg source tables

Suppose I have a shared mutation table that another SQRL job writes to:
```
/*+engine(iceberg) */
CREATE TABLE SinkTable (
    userid INT NOT NULL,
    total_tokens BIGINT NOT NULL,
    window_end TIMESTAMP_LTZ(3),
    window_time TIMESTAMP_LTZ(3)
);
```
and now I want to build an application that queries that table directly in iceberg with Duckdb (those are my only enabled database engines):
```
TokensByUser(userid INT):= SELECT * FROM SinkTable where userid = :userid;
```
Currently, that still requires a flink engine which essentially copies the table over into a new intermediate iceberg table that DuckDB then queries.
That is a bit wasteful. We could query the underlying table directly and cut Flink out entirely (and leave out as an engine).

This would require:
- supporting the planner without a configured "stream" engine.
- allowing "shortcuts" in the dag when planning database tables. If the table to be materialized is a source table and the source is in the same engine as the sink and the engine supports such sharing (only table formats support this), then we can use the source directly. The key is figuring out how to convert that to an `EngineCreateTable`.
- support producing deployment assets without stream engine.

On the other hand, I'm wondering if this is generally a useful feature. If this is for one-time querying, you can do that much easier and cheaper outside of DataSQRL (e.g. Athena if tables are registered in Glue). If this is for building an application, you want to make sure that iceberg table is optimally partitioned and sorted for that query workload which often means that you need to copy the table and lay it out differently. Hence, this feature may not provide actual value.

Just putting this hear as food for thought and to collect feedback.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Direct Query against Iceberg source tables #1933

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Direct Query against Iceberg source tables #1933

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions