-
Notifications
You must be signed in to change notification settings - Fork 134
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
pip recently switched to installing datafusion with version string '35.0.0'. Compared to a previous installation of version '34.0.0', creating an external table from hive-partitioned parquet data following the [https://arrow.apache.org/datafusion/user-guide/sql/ddl.html](documented instructions) does not work. While all the partition columns show up as columns of the table, the columns from the parquet data themselves do not appear.
To Reproduce
# prepare fake data
import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd
data = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
table = pa.Table.from_pandas(data)
import os
os.mkdir("fake=0")
pq.write_table(table,"./fake=0/data.parquet")
# load into datafusion
import datafusion as df
ctx = df.SessionContext()
ctx.sql("""
CREATE EXTERNAL TABLE data
STORED AS PARQUET
PARTITIONED BY (fake)
LOCATION './*/data.parquet'
""")
The loaded data is missing col1 and col2:
>>> ctx.sql("SELECT * FROM data")
DataFrame()
+------+
| fake |
+------+
| 0 |
| 0 |
+------+
>>> ctx.sql("SELECT table_name, column_name FROM information_schema.columns")
DataFrame()
+------------+-------------+
| table_name | column_name |
+------------+-------------+
| data | fake |
+------------+-------------+
Expected behavior
The same steps with DataFusion 34.0.0 produce the following output:
>>> ctx.sql("SELECT * FROM data");
DataFrame()
+------+------+------+
| col1 | col2 | fake |
+------+------+------+
| 1 | 3 | 0 |
| 2 | 4 | 0 |
+------+------+------+
>>> ctx.sql("SELECT table_name, column_name FROM information_schema.columns")
DataFrame()
+------------+-------------+
| table_name | column_name |
+------------+-------------+
| data | col1 |
| data | col2 |
| data | fake |
+------------+-------------+
Additional context
Operating system: Rocky 8
Python version: 3.10.11
DataFusion version: 35.0.0, recently installed via pip
pyarrow version: 15.0.0
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working