-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-5769] Ensure partitions created by async indexer are not deleted by regular writers #12662
Conversation
243dfa7
to
899cd0a
Compare
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
Show resolved
Hide resolved
899cd0a
to
0c0d9b4
Compare
thanks for the PR desc, it helps. btw, for bullet 4 in your PR desc
woudn't the ingestion writer also log updates for RLI, since the table config's inflight metadata partitions will say RLI. |
can you create a follow up ticket for doc updates. This def needs to go into some runbook in our docs. |
...di-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java
Show resolved
Hide resolved
0c0d9b4
to
179fc3b
Compare
Yes, ingestion writer will update logs for RLI as well. No change in design. Updated the PR description. |
179fc3b
to
0f2dd65
Compare
0f2dd65
to
e15ddba
Compare
Change Logs
Ensure partitions created by async indexer are not deleted by regular writers.
HoodieTable
TestHoodieIndexer
Let's understand the interplay between index enable/disable boolean flag and the new config. For illustration prupose, we will assume,
files
and column_statsare enabled by ingestion writer, and
record_index` is created using async indexer.Existing boolean configs:
hoodie.metadata.enable
,hoodie.metadata.index.column.stats.enable
andhoodie.metadata.record.index.enable
.New string config:
hoodie.metadata.index.drop
(this takes the index name/MDT partition name as value)hoodie.metadata.enable
andhoodie.metadata.index.column.stats.enable
set to true. And,files
andcolumn_stats
index are present.hoodie.metadata.enable
andhoodie.metadata.record.index.enable
to true. It checks the available partitions and automatically sets the columns stats config to true as well -hudi/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
Line 182 in c049e9b
hoodie.metadata.index.drop
config, so even though thehoodie.metadata.record.index.enable
is false in ingestion writer config, the record_index is not deleted.hoodie.metadata.record.index.enable=false
but alsohoodie.metadata.index.drop=record_index
.To summarise, there is a boolean flag for enabling.disabling an index, and there is a string config to explicitly specify what index to delete. The interplay between these two decides finally, whether index needs to be disabled/dropped or not.
Impact
Ensure partitions created by async indexer are not deleted by regular writers.
Risk level (write none, low medium or high below)
low
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist