Skip to content

Conversation

zhangjun0x01
Copy link

@zhangjun0x01 zhangjun0x01 commented Sep 17, 2025

when we do a query without the equality delete fields, it will throw an exception Equality deletes need the relevant columns to be selected, this pr fix this issue.

steps:

  1. add the equality delete fields to projection columns when the query sql do not contain relevant columns .
  2. execute ApplyEqualityDeletes
  3. delete the equality delete fields from output data chunk for query result.

equalities.push_back(expression->Copy());
continue;
}
idx_t index = field_id_to_result_id.at(field_id);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what's happening here, can you explain and add a clarifying comment?
Especially the fact that there's an if, else if but no else, meaning we could have a situation where an equality delete condition is essentially skipped?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think I understand, you're rewriting the expression, so they match with the modifications made to the output_chunk made earlier on

Can't this be done directly when constructing the expressions?
This feels rather error-prone because when additional expressions would be added, they would also need to be added here

In any case, it looks like this is actually exhaustive, so can we change the else if to just an else and turn the condition into an assertion instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the following case: there is a datafile with the following data:

1,'a', 2025-01-01 
2,'b', 2025-01-02

then we add two delete snapshots:

  • Snapshot 1: delete where id = 1
  • Snapshot 2: delete where name = 'b'

Both of these delete snapshots are valid for the datafile. When we execute a query:

// without equality delete field id and name select count(*) from mytable

In the IcebergMultiFileReader::InitializeReader method, we add all equality delete fields (id and name) to the scan columns.

In the function IcebergMultiFileReader::FinalizeChunk, the input_chunk parameter will contain the additional equality delete fields (id and name). We copy these two fields to the output_chunk. At this point, in the output_chunk, the index of the field 'id' should be 1, and the index of the field 'name' should be 2. However, all the equality delete filters we obtain here have an index of 1 for both 'id' and 'name'. Therefore, we rewrite the expression here to be compatible with the above logic.

The reason for this situation is that the delete filters based on the 'id' and 'name' fields come from different snapshots (in IcebergMultiFileList::ScanEqualityDeleteFile). At this point, we do not know the order of these two delete snapshots, so we cannot pass in the correct index when constructing the BoundReferenceExpression expression.

Copy link
Collaborator

@Tishj Tishj Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not entirely following yet
It's been a while since I implemented this logic, so excuse me while I recollect some thoughts.

In ScanEqualityDeleteFile we create the expressions, referencing the index in output_chunk, in order to make it projection-agnostic, so expressions don't have to be rewritten later.

You're adding to the column_indexes (now new_column_indexes) for all the columns that aren't selected but required by the equality deletes.

What I'm hearing is that you're losing the context of the order you're adding these in, to correctly replicate later, here:

	//! Add the extra equality delete fields to output chunk.
	int32_t diff = 0;
	if (executor.expressions.size() != output_chunk.ColumnCount()) {
		diff = executor.expressions.size() - output_chunk.ColumnCount();
		for (int32_t i = diff; i > 0; i--) {
			int32_t index = input_chunk.ColumnCount() - i;
			output_chunk.data.emplace_back(input_chunk.data[index]);
		}
	}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a commit to the branch to remove the need to rewrite expressions
Can you take some time to double check my logic?
I left a FIXME in there because I'm not entirely confident that piece of logic will work in all cases

It assumes a direct relation between new_global_column_ids and the output_chunk sizes, that might break if there is projection pushdown into parquet? I'm not entirely sure

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference, it's about the connection between the logic in ScanEqualityDeleteFile, InitializeReader and FinalizeChunk, those are now using equality_id_to_result_id to synchronize the mapping that need to be done in those methods

@zhangjun0x01
Copy link
Author

@Tishj ,By the way, how do I configure Iceberg to use relative paths for test cases in the data directory? I checked the configuration of Iceberg and it seems that there is no such configuration ,thanks

@Tishj
Copy link
Collaborator

Tishj commented Sep 18, 2025

@Tishj ,By the way, how do I configure Iceberg to use relative paths for test cases in the data directory? I checked the configuration of Iceberg and it seems that there is no such configuration ,thanks

I'm not sure I understand what you're asking, Iceberg works with absolute paths, that's a limitation of the format
We try to hack around that somewhat with allow_moved_paths, which I believe only works on iceberg_scan(...), not on an attached iceberg catalog

@zhangjun0x01
Copy link
Author

allow_moved_paths

in the data/persistent directory,all test cases use the relative paths in v1.metadata.json and xxx.avro , I do not know how to generate the data with relative paths

for example,the location field use relative paths , not /data/persistent/equality_deletes/warehouse/mydb/mytable

{
  "format-version" : 2,
  "table-uuid" : "7c269e29-15d2-48a6-bc83-4919d38e3041",
  "location" : "data/persistent/equality_deletes/warehouse/mydb/mytable",
  "last-sequence-number" : 0,
  "last-updated-ms" : 1746122109432,
  "last-column-id" : 2,
  "current-schema-id" : 0,
xxxxxx

@Tishj
Copy link
Collaborator

Tishj commented Sep 18, 2025

@zhangjun0x01 check out some of the scripts, like scripts/persistent/partition_bool.py, those are the scripts used to generate the persistent data.

I imagine it's because the DATA_GENERATION_DIR is a relative path

@zhangjun0x01
Copy link
Author

@zhangjun0x01 check out some of the scripts, like scripts/persistent/partition_bool.py, those are the scripts used to generate the persistent data.

I imagine it's because the DATA_GENERATION_DIR is a relative path

I test it , when we create iceberg hadoop catalog with relative path, the table will use relative path.

@zhangjun0x01 zhangjun0x01 marked this pull request as draft September 25, 2025 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants