You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the index either loads the whole file from a stream or mmaps the index file and relies on the OS to fetch relevant pieces of the index. This requires the whole index file to be available locally, which does not play well for data lake environments where files are stored on a remote storage.
I was wondering if the index file structure is suitable for partial load, such that traversing the index will request relevant parts of the file through a user-provided interface (similar to how e.g. certain row groups and column chunks are fetched during a parquet file scan)?
Can you contribute to the implementation?
I can contribute
Is your feature request specific to a certain interface?
C++ implementation
Contact Details
No response
Is there an existing issue for this?
I have searched the existing issues
Code of Conduct
I agree to follow this project's Code of Conduct
The text was updated successfully, but these errors were encountered:
Several teams have previously successfully adjusted the core C++ engine to move the storage elsewhere. @agoncharuk, is there a specific project in mind or a more specific set of constraints?
Yes, I saw both the Clickhouse and DuckDB integrations which totally make sense!
I am currently investigating a Trino integration possibility, specifically in Hive/Iceberg deployments where data resides on S3/Hdfs and workers never keep the dataset locally. Fetching the whole index file will certainly work, however, I was wondering if a partial index read is possible, at least in theory.
Describe what you are looking for
Currently, the index either loads the whole file from a stream or mmaps the index file and relies on the OS to fetch relevant pieces of the index. This requires the whole index file to be available locally, which does not play well for data lake environments where files are stored on a remote storage.
I was wondering if the index file structure is suitable for partial load, such that traversing the index will request relevant parts of the file through a user-provided interface (similar to how e.g. certain row groups and column chunks are fetched during a parquet file scan)?
Can you contribute to the implementation?
Is your feature request specific to a certain interface?
C++ implementation
Contact Details
No response
Is there an existing issue for this?
Code of Conduct
The text was updated successfully, but these errors were encountered: