Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duckdb.sql(prql.compile()) v.s. df.prql.query()? #156

Open
eitsupi opened this issue Mar 21, 2023 · 6 comments
Open

duckdb.sql(prql.compile()) v.s. df.prql.query()? #156

eitsupi opened this issue Mar 21, 2023 · 6 comments

Comments

@eitsupi
Copy link
Member

eitsupi commented Mar 21, 2023

They work almost identically for pandas.DataFrame, and the former would work for polars.DataFrame and pyarrow.Table.

import duckdb
import polars as pl
import prql_python as prql

df = pl.DataFrame({'a': 42})
opts = prql.CompileOptions(target="sql.duckdb")

duckdb.sql(prql.compile("from df", options=opts)).pl()

Probably needs to be mentioned somewhere... (Related to #151)

@eitsupi eitsupi changed the title duckdb::sql(prql.compile()) v.s. df.prql.query()? duckdb.sql(prql.compile()) v.s. df.prql.query()? Mar 21, 2023
@max-sixty
Copy link
Member

Great, to confirm — do you mean we should mention this as an option in the docs? Or we should use duckdb.sql to do our pandas querying?

@eitsupi
Copy link
Member Author

eitsupi commented Mar 22, 2023

I intended to update only the documentation for now.

But I think it is worth creating a new function based on duckdb.sql and replacing df.prql.query, since they provide almost the same functionality.
(I don't know what name that function should have... pyprql.duckdb_query?)

@max-sixty
Copy link
Member

IIUC, the current df.prql.query is based on duckdb (

return duckdb.query_df(
).

The accessor offers a method on a DataFrame, which is often more convenient than running duckdb.sql(prql.compile..., even if it's a similar functionality.

Does this make sense or am I misunderstanding?

@snth
Copy link
Member

snth commented Mar 22, 2023

I agree that the df.prql.query is more convenient and we should keep it.

Do we need the .query part? Could we shorten this to just df.prql(...)?

Are there any other members of df.prql?

@eitsupi
Copy link
Member Author

eitsupi commented Mar 23, 2023

Yes, I agree that methods are sometimes more convenient.
So I intended to only update the documentation at this time.

@eitsupi
Copy link
Member Author

eitsupi commented Mar 24, 2023

Could we shorten this to just df.prql(...)?

That does not seem to be allowed.
https://pandas.pydata.org/docs/reference/api/pandas.api.extensions.register_dataframe_accessor.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants