Skip to content

Conversation

kokila-19
Copy link
Contributor

@kokila-19 kokila-19 commented Sep 23, 2025

What changes were proposed in this pull request?

Added Z-order support for Iceberg tables via CREATE TABLE DDL

Why are the changes needed?

To support zorder indexing which will improve data clustering and query performance on Iceberg tables.

Does this PR introduce any user-facing change?

Yes , new syntax support
CREATE TABLE test_zorder (
id int,
text string)
WRITE LOCALLY ZORDER by id, text
STORED BY iceberg
STORED As orc;

How was this patch tested?

qtest

@kokila-19 kokila-19 changed the title HIVE-29133: Support Z-order indexing for Iceberg tables via CREATE TA… HIVE-29133: Support Z-order for Iceberg tables via CREATE TABLE Sep 23, 2025
@zhangbutao zhangbutao requested a review from Copilot September 27, 2025 13:32
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds Z-order support for Iceberg tables via CREATE TABLE DDL syntax, enabling improved data clustering and query performance. The implementation introduces a new syntax "WRITE LOCALLY ZORDER BY" alongside supporting infrastructure for Z-order indexing.

  • Adds new DDL syntax "WRITE LOCALLY ZORDER BY" for creating Iceberg tables with Z-order sorting
  • Implements Z-order functionality through a custom UDF that uses Iceberg's ZOrderByteUtils
  • Extends the parser and analyzer to handle Z-order specifications in CREATE TABLE statements

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
CreateTableAnalyzer.java Adds Z-order AST processing and JSON serialization for CREATE TABLE
ZorderFields.java New data structure for holding Z-order field metadata
ZOrderFieldDesc.java New descriptor class for individual Z-order columns
HiveParser.g Adds grammar rules for ZORDER syntax parsing
GenericUDFIcebergZorder.java New UDF implementing Z-order value calculation using Iceberg utilities
HiveIcebergStorageHandler.java Integrates Z-order sorting into Iceberg write operations
BaseHiveIcebergMetaHook.java Handles Z-order metadata persistence in table properties

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

set hive.optimize.shared.work.merge.ts.schema=true;

-- Validates z-order on CREATE via clause.
CREATE TABLE default.zorder_it_nulls (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we enable the z-order feature for an already created Iceberg table?
Can we use the ALTER TABLE syntax to enable z-order for an already created Iceberg table?

Copy link
Member

@deniskuzZ deniskuzZ Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangbutao, what are your thoughts on #6095 (comment) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangbutao Current PR implements end-to-end functionality for ZOrder, including support for CREATE and INSERT.
I will have two more commits coming in for alter and compaction zorder support.
Next PR will focus on adding alter table support for Zorder.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangbutao, what are your thoughts on #6095 (comment) ?

I am ok with the syntax.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangbutao Current PR implements end-to-end functionality for ZOrder, including support for CREATE and INSERT. I will have two more commits coming in for alter and compaction zorder support. Next PR will focus on adding alter table support for Zorder.

@kokila-19 That's ok.
BTW, could you please supplement the SQL documentation as well? I don't want such great features to go unused due to a lack of documentation, as it's likely that many users aren't even aware of them.
To be honest, I've even forgotten a lot of the semantics we implemented for Iceberg, haha.
Perhaps it's about time we gradually improve our Hive documentation too. :)
Thanks.

Copy link
Member

@deniskuzZ deniskuzZ Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current PR implements end-to-end functionality for ZOrder, including support for CREATE and INSERT.

@kokila-19, what is the syntax for insert? or insert uses the schema/spec zorder?

INSERT INTO ice_orc
WRITE ORDERED BY ZORDER(id, text)
SELECT * FROM source_table;

Copy link
Contributor Author

@kokila-19 kokila-19 Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deniskuzZ
Insert will use Z-Order spec stored in table properties sort.order and sort.columns which will be provided in INSERT/ALTER command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangbutao Certainly, documentation is a must. I've already created a ticket for it: HIVE-29135 and I plan to add the documentation once the feature is complete.

Perhaps it's about time we gradually improve our Hive documentation too. :)

I agree that it's time we improve Hive's documentation, and I'm happy to contribute to that effort.

CREATE TABLE default.zorder_it_nulls (
id int,
text string)
WRITE LOCALLY ZORDER by id, text
Copy link
Member

@deniskuzZ deniskuzZ Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kokila-19, syntax should be the following:

WRITE [LOCALLY] ORDERED BY zorder (id, text)

Copy link
Member

@deniskuzZ deniskuzZ Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • alter cmd
ALTER TABLE ice_orc 
SET WRITE [LOCALLY] ORDERED BY zorder (id, text);
  • compaction cmd:
OPTIMIZE TABLE ice_orc rewrite data 
zorder by (id, text); // optional, if not specified, should use zorder from table spec/schema

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can kill the LOCALLY or make it optional to be consistent with Spark

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using existing locally ordered by syntax implemented in #5541
If i have to remove locally keyword for zorder, I believe we should remove for this as well to be consistent in hive.
Or make LOCALLY optional for both orders

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OPTIMIZE TABLE ice_orc rewrite data 
zorder by (id, text); // optional, if not specified, should use zorder from table spec/schema

Note during implementation:
If the Zorder columns in table spec is different from spec provided in OPTIMIZE TABLE cmd for compaction, this should throw error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure. delta and spark iceberg support that. we can call optimize with zorder and later alter the write zorder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark only supports Z-order in rewrite_data_files (during compaction), it does not have Z-order support for DDL queries. So, they will not have this scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deniskuzZ
i think we can kill the LOCALLY or make it optional to be consistent with Spark
I will have this change in the follow up PR.

…BLE DDL

- Supported WRITE LOCALLY ZORDER BY (col1, col2, ...) syntax in CREATE TABLE to enable Z-order sort.
- Persisted Z-order information in HMS for iceberg tables using sort.order and sort.columns.
- Implemented GenericUDFIcebergZorder UDF (iceberg_zorder) to compute Z-order values and sort data in ascending order.
- Ensured INSERT operations respect Z-order sorting.
- Added qtests.
Copy link

sonarqubecloud bot commented Oct 7, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants