Skip to content

Conversation

@Weijun-H
Copy link
Member

@Weijun-H Weijun-H commented Nov 7, 2025

Which issue does this PR close?

Closes #18530

Rationale for this change

The make_map function has an overly strict null check that cannot distinguish between:

  • NULL map values (entire map is NULL) - should be allowed
  • Null keys within maps - should be rejected

The premature null check at line 66 (if keys.null_count() > 0) rejects ANY null in the keys array, even when it represents a valid NULL map value. This causes failures when directly calling make_map_batch with Arrow arrays containing NULL list elements.

What changes are included in this PR?

1. Fixed make_map_batch function

  • Removed premature null check (line 66-68) that incorrectly rejected NULL map values
  • Added routing logic: When constant evaluation encounters NULL maps (can_evaluate_to_const && keys.null_count() > 0), routes to make_map_array_internal which handles them correctly
  • Preserved validation: All keys are still validated through validate_map_keys

2. Enhanced make_map_array_internal function

  • Preserves original array metadata (length and nulls bitmap) before list_to_arrays() transformation
  • Correctly builds offset buffer: For NULL maps, offset doesn't advance (creates empty range)
  • Handles all-NULL edge case: Creates empty arrays with correct data types when all maps are NULL
  • Restores nulls bitmap: Ensures NULL map values are properly marked in the final MapArray
  • Validates nested nulls: Checks flattened_keys.null_count() > 0 after concatenation to catch null keys within maps

Are these changes tested?

Yes, comprehensive tests are included:

Unit tests (map.rs):

  • test_make_map_with_null_maps(): Directly tests NULL map handling at the function level
  • test_make_map_with_null_key_within_map_should_fail(): Verifies null keys are still rejected

Existing tests: All existing map-related tests pass, confirming no regression.

Are there any user-facing changes?

No

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Nov 7, 2025
@Weijun-H Weijun-H changed the title feat: Enhance map handling to support NULL map values and ensure unique keys validation feat: Enhance map handling to support NULL map values Nov 8, 2025
@Weijun-H Weijun-H requested a review from Copilot November 8, 2025 10:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes handling of NULL map values (entire maps being NULL, not null keys/values within maps) in DataFusion's map functions. The changes address an issue where NULL map values caused incorrect "map key cannot be null" errors.

Key changes:

  • Refactored make_map_array_internal to properly track and preserve NULL map entries using nulls bitmap
  • Updated validation logic to distinguish between NULL maps and null keys within maps
  • Added comprehensive test coverage for NULL map handling in both memory and Parquet storage

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
datafusion/sqllogictest/test_files/map.slt Removed NULL values from duplicate key test; added extensive tests for NULL map handling including memory tables, map operations, and Parquet storage
datafusion/functions-nested/src/map.rs Refactored map validation and array construction to handle NULL maps correctly by tracking nulls bitmap, building proper offsets, and handling empty array edge cases

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Weijun-H Weijun-H force-pushed the 18530-mix-table-for-map branch from c0fa584 to c735a53 Compare November 8, 2025 14:41
@Weijun-H Weijun-H marked this pull request as draft November 8, 2025 14:49
@Weijun-H Weijun-H force-pushed the 18530-mix-table-for-map branch from d69074c to 2f715b0 Compare November 8, 2025 15:36
@Weijun-H Weijun-H force-pushed the 18530-mix-table-for-map branch from 2f715b0 to 87d6f93 Compare November 8, 2025 15:40
@Weijun-H Weijun-H marked this pull request as ready for review November 9, 2025 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

make_map incorrectly rejects NULL map values due to overly strict null check

1 participant