Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Unhandled type for Arrow to Parquet schema conversion: month_day_nano_interval #36798

Closed
FoxHeather opened this issue Jul 21, 2023 · 6 comments

Comments

@FoxHeather
Copy link

FoxHeather commented Jul 21, 2023

Describe the bug, including details regarding any error messages, version, and platform.

Perform Python code to generate a parquet data with interval type, using APACHE ARROW.
Display error as below:

pyarrow.lib.ArrowNotImplementedError:
Unhandled type for Arrow to Parquet schema conversion: month_day_nano_interval

import pyarrow as pa
import pyarrow.parquet as pq

Define Schema

schema = pa.schema([
('itv', pa.month_day_nano_interval())
])

itv = pa.array([ (13,25,1000) ], type = pa.month_day_nano_interval())

Generate Parquet data

batch = pa.RecordBatch.from_arrays(
[itv], schema = schema
)
table = pa.Table.from_batches([batch])

Write Parquet file pqtpitvl.parquet

pq.write_table(table, 'pqtpitvl.parquet')

display error

with ParquetWriter(
     ^^^^^^^^^^^^^^

File "C:\Users\FoxHeather\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyarrow\parquet\core.py", line 966, in init
self.writer = _parquet.ParquetWriter(
^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow_parquet.pyx", line 1748, in pyarrow._parquet.ParquetWriter.cinit
File "pyarrow\error.pxi", line 121, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Unhandled type for Arrow to Parquet schema conversion: month_day_nano_interval

Component(s)

Python

@FoxHeather
Copy link
Author

@emkornfield could you have a look this error?

@FoxHeather
Copy link
Author

@westonpace do you have any idea for this?

@mapleFU
Copy link
Member

mapleFU commented Jul 28, 2023

Seems interval is not handled by FieldToNode:

Status FieldToNode(const std::string& name, const std::shared_ptr<Field>& field,
                   const WriterProperties& properties,
                   const ArrowWriterProperties& arrow_properties, NodePtr* out)

You can:

  1. Cast interval to time64 duration or other supported types, event int64 or extended types
  2. Maybe we can cast it to https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#interval , but it need some further development

@FoxHeather
Copy link
Author

This is whole error message

with ParquetWriter(
     ^^^^^^^^^^^^^^

File "C:\Users\FoxHeather\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyarrow\parquet\core.py", line 966, in init
self.writer = _parquet.ParquetWriter(
^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow_parquet.pyx", line 1748, in pyarrow._parquet.ParquetWriter.cinit
File "pyarrow\error.pxi", line 121, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError:
Unhandled type for Arrow to Parquet schema conversion: month_day_nano_interval

@mapleFU

@mapleFU
Copy link
Member

mapleFU commented Jul 28, 2023

Pyarrow is just a wrapper, finally it would goes here: https://github.com/apache/arrow/blob/main/cpp/src/parquet/arrow/schema.cc#L433-L438

@jorisvandenbossche
Copy link
Member

Closing as duplicate of #36799

@jorisvandenbossche jorisvandenbossche closed this as not planned Won't fix, can't repro, duplicate, stale Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants