Skip to content

[Python] Hive partition columns being forced to dict type #47592

@JasonTam

Description

@JasonTam

Describe the bug, including details regarding any error messages, version, and platform.

version: 21.0.0
platform: arm64

When reading data with:

  • hive-style partioning
  • integer dataframe columns

The partition columns get inferred as int type.

from pyarrow.parquet import read_table

path = "gs://project/data/run_date=2025-09-17/job_id=abc123/0.pq"
dataset = read_table(path)
dataset.read()
pyarrow.Table
0: uint32
1: uint32
2: uint32
3: uint32
...
99: uint32
run_date: dictionary<values=string, indices=int32, ordered=0>
job_id: dictionary<values=string, indices=int32, ordered=0>

This causes issues when trying to cast "2025-09-17" to int32 for example

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions