-
Notifications
You must be signed in to change notification settings - Fork 332
QoL: Destructive schema sync after manual column dropping #2909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
anuunchin
wants to merge
7
commits into
devel
Choose a base branch
from
feat/1153-drop-column-sync
base: devel
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
b351b6d
Initial impl of sync_schema_destructively
anuunchin 40a4e17
Formalising dlt schema sync
anuunchin f70077e
Unnecessary inheritance removed, functions moved
anuunchin c0d028a
Duplicate function removed, dummy implements empty update_from_stored…
anuunchin ceb5f04
sync_schema deprecated, storage initialization check
anuunchin 327c25e
Unnecessary abstract class impls removed, no table reflection exception
anuunchin d2ad80e
Better docstrings, var names
anuunchin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,10 +23,13 @@ | |
from dlt.common.metrics import LoadJobMetrics | ||
from dlt.common.schema.exceptions import TableNotFound | ||
from dlt.common.schema.typing import ( | ||
C_DLT_ID, | ||
C_DLT_LOAD_ID, | ||
C_DLT_LOADS_TABLE_LOAD_ID, | ||
TTableFormat, | ||
TTableSchemaColumns, | ||
TSchemaDrop, | ||
TPartialTableSchema, | ||
) | ||
from dlt.common.storages.exceptions import ( | ||
CurrentLoadPackageStateNotAvailable, | ||
|
@@ -60,6 +63,7 @@ | |
StorageSchemaInfo, | ||
StateInfo, | ||
LoadJob, | ||
WithTableReflection, | ||
) | ||
from dlt.common.destination.exceptions import ( | ||
DestinationUndefinedEntity, | ||
|
@@ -279,6 +283,7 @@ class FilesystemClient( | |
WithStagingDataset, | ||
WithStateSync, | ||
SupportsOpenTables, | ||
WithTableReflection, | ||
): | ||
fs_client: AbstractFileSystem | ||
# a path (without the scheme) to a location in the bucket where dataset is present | ||
|
@@ -468,7 +473,12 @@ def drop_tables(self, *tables: str, delete_schema: bool = True) -> None: | |
def get_storage_tables( | ||
self, table_names: Iterable[str] | ||
) -> Iterable[Tuple[str, TTableSchemaColumns]]: | ||
"""Yields tables that have files in storage, returns columns from current schema""" | ||
"""Yield (table_name, column_schemas) pairs for tables that have files in storage. | ||
|
||
For Delta and Iceberg tables, the columns present in the actual table metadata | ||
are returned. For tables using regular file formats, the column schemas come from the | ||
dlt schema instead, since their real schema cannot be reflected directly. | ||
""" | ||
for table_name in table_names: | ||
table_dir = self.get_table_dir(table_name) | ||
if ( | ||
|
@@ -478,7 +488,34 @@ def get_storage_tables( | |
and len(self.list_table_files(table_name)) > 0 | ||
): | ||
if table_name in self.schema.tables: | ||
yield (table_name, self.schema.get_table_columns(table_name)) | ||
# If it's an open table, only actually exsiting columns | ||
if self.is_open_table("iceberg", table_name): | ||
from dlt.common.libs.pyiceberg import ( | ||
get_table_columns as get_iceberg_table_columns, | ||
) | ||
|
||
iceberg_table = self.load_open_table("iceberg", table_name) | ||
col_schemas = get_iceberg_table_columns(iceberg_table) | ||
yield (table_name, col_schemas) | ||
|
||
elif self.is_open_table("delta", table_name): | ||
from dlt.common.libs.deltalake import ( | ||
get_table_columns as get_delta_table_columns, | ||
) | ||
|
||
delta_table = self.load_open_table("delta", table_name) | ||
col_schemas = get_delta_table_columns(delta_table) | ||
yield (table_name, col_schemas) | ||
|
||
else: | ||
logger.warning( | ||
f"Table '{table_name}' does not use a table format and does not support" | ||
" true schema reflection. Returning column schemas from the dlt" | ||
" schema, which may be stale if the underlying files were manually" | ||
" modified. " | ||
) | ||
yield (table_name, self.schema.get_table_columns(table_name)) | ||
|
||
Comment on lines
+511
to
+518
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just realized that for parquet files we can also just use pyarrow and read actual metadata 👀 , but I still don't think people drop columns in parquet files... |
||
else: | ||
yield (table_name, {"_column": {}}) | ||
else: | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.