Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support complex type in Python Table Engine like Array and Map, Object #251

Open
auxten opened this issue Jul 24, 2024 · 2 comments · Fixed by #250
Open

Support complex type in Python Table Engine like Array and Map, Object #251

auxten opened this issue Jul 24, 2024 · 2 comments · Fixed by #250
Assignees
Labels
Json JSON column type support

Comments

@auxten
Copy link
Member

auxten commented Jul 24, 2024

Code snippet to reproduce:

df = pd.read_parquet(f'{GRAPHRAG_FOLDER}/create_final_community_reports.parquet',
                     columns=["id","community","level","title","summary", "findings","rank","rank_explanation"])

batched_import(statement, df)
df.head(2)

df.dtypes

import chdb

df=df.astype({'id': 'string'})
df2=df[["id","level","rank"]]
chdb.query("FROM Python(df2) SELECT max(id),sum(level),avg(rank)  group by all").show()
df.dtypes

parquet-test-data.zip

@auxten auxten self-assigned this Jul 24, 2024
@auxten
Copy link
Member Author

auxten commented Jul 29, 2024

Data seems contain dict type, debuging

                                      id community  level  ...                                           findings rank                                   rank_explanation
0   e7822326-4da8-4954-afa9-be7f4f5791a5        42      2  ...  [{'explanation': 'Marley's Ghost plays a cruci...  8.0  The impact severity rating is high due to the ...
1   8a5afac1-99ef-4f01-a1b1-f044ce392ff9        43      2  ...  [{'explanation': 'The Ghost, identified at tim...  8.5  The impact severity rating is high due to the ...
2   83fc3bf3-ce93-4feb-8bb1-b278b062c672        44      2  ...  [{'explanation': 'Mr. Scrooge's character unde...  8.5  The high impact severity rating reflects the p...
3   86d8af0d-738c-45b4-a36c-ec9cd25b56dd        45      2  ...  [{'explanation': 'Bob Cratchit is depicted as ...  7.5  The impact severity rating is relatively high ...
4   77882794-545d-4aba-98c7-2f1294a9e8df        46      2  ...  [{'explanation': 'Martha is a central figure i...  7.5  The impact severity rating is moderately high ...
5   829803bc-0928-4825-9c5a-d07284f8309d        41      2  ...  [{'explanation': 'Ebenezer Scrooge's story beg...  8.5  The high impact severity rating reflects the p...
6   834f9502-3bfd-4500-91c5-0cb14a7ae254        20      1  ...  [{'explanation': 'Jacob Marley's appearance to...  7.5  The impact severity rating is high due to the ...
7   1df0b304-686f-4ac0-92ce-5a9244ad36cf        21      1  ...  [{'explanation': 'The Ghost, often interpreted...  8.5  The impact severity rating is high due to The ...
8   24802e92-18e3-4638-90d0-3872159110dd        22      1  ...  [{'explanation': 'The firm 'Scrooge and Marley...  4.0  The impact severity rating is moderate due to ...
9   bd71ffb7-2324-4987-9947-dc3c49809e18        23      1  ...  [{'explanation': 'Scrooge wields considerable ...  4.0  The impact severity rating is moderate, reflec...

@auxten auxten linked a pull request Jul 29, 2024 that will close this issue
@auxten auxten changed the title Crash on query Python(dataframe) Support complex type in Python Table Engine like Array and Map, Object Aug 1, 2024
@auxten
Copy link
Member Author

auxten commented Aug 6, 2024

Currently, ClickHouse will read the column "findings" as Array(Tuple(explanation Nullable(String), summary Nullable(String))) from parquet. If the Semistructured Columns is done. Maybe chDB can use JSON type for this kind of column which is more generic and easy to handle in Python.

So, I will wait for ClickHouse/ClickHouse#66444

@auxten auxten reopened this Aug 8, 2024
@auxten auxten added the Json JSON column type support label Aug 30, 2024
@auxten auxten moved this to Done in chDB 2024 Q4 Sep 29, 2024
@auxten auxten closed this as completed by moving to Done in chDB 2024 Q4 Sep 29, 2024
@auxten auxten removed this from chDB 2024 Q4 Sep 29, 2024
@auxten auxten reopened this Sep 29, 2024
@auxten auxten moved this to Done in chDB 2024 Q4 Sep 29, 2024
@auxten auxten closed this as completed by moving to Done in chDB 2024 Q4 Sep 29, 2024
@auxten auxten closed this as completed by moving to Done in chDB 2024 Q4 Sep 29, 2024
@auxten auxten reopened this Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Json JSON column type support
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

1 participant