Is polars or pandas required to work with duckdb in marimo? #3447

sskagemo · 2025-01-15T16:02:58Z

sskagemo
Jan 15, 2025

I'm sorry if I have missed the main point of marimo, but I'm trying to understand if it is possible to read data from a file, for instance json, and work with it using sql in marimo, which I understand are based on duckdb, without also having a polars or pandas dataframe?

I thought I would be able to store the data in a in-memory duckdb-database. But when using SQL to read the data from a file, the result is a polars dataframe.

I'm just curious to find out if I'm doing something wrong, or if this is how it is meant to be.

This is my code:

import marimo as mo

er = mo.sql(
    f"""
    SELECT *
    FROM read_json('https://wiki.mozilla.org/images/f/ff/Example.json.gz', 
    auto_detect=true, compression="gzip", format='newline_delimited');
    """
)
print(type(er))

Output: <class 'polars.dataframe.frame.DataFrame'>

Answered by mscolnick

Jan 20, 2025

hi @sskagemo, using mo.sql is a special function that will use duckdb under the hood (currently its duckdb, but may be other drivers/dialects in future). It does return a polars dataframe.

If you want to just use duckdb, you can do that too with:

import duckdb

er = duckdb.sql(
    f"""
    SELECT *
    FROM read_json('https://wiki.mozilla.org/images/f/ff/Example.json.gz', 
    auto_detect=true, compression="gzip", format='newline_delimited');
    "
)
er # DuckDBRelation

The reason we return a df instead of the DuckDBRelation in our own .sql is so that we have the flexibility to change the underlying driver, while still returning a dataframe.

View full answer

mscolnick · 2025-01-20T04:14:54Z

mscolnick
Jan 20, 2025
Maintainer

hi @sskagemo, using mo.sql is a special function that will use duckdb under the hood (currently its duckdb, but may be other drivers/dialects in future). It does return a polars dataframe.

If you want to just use duckdb, you can do that too with:

import duckdb

er = duckdb.sql(
    f"""
    SELECT *
    FROM read_json('https://wiki.mozilla.org/images/f/ff/Example.json.gz', 
    auto_detect=true, compression="gzip", format='newline_delimited');
    "
)
er # DuckDBRelation

The reason we return a df instead of the DuckDBRelation in our own .sql is so that we have the flexibility to change the underlying driver, while still returning a dataframe.

1 reply

sskagemo Jan 20, 2025
Author

Thank you very much for explaining this to me, and also help with how to do it with only duckdb.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is polars or pandas required to work with duckdb in marimo? #3447

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Is polars or pandas required to work with duckdb in marimo? #3447

sskagemo Jan 15, 2025

Replies: 1 comment · 1 reply

mscolnick Jan 20, 2025 Maintainer

sskagemo Jan 20, 2025 Author

sskagemo
Jan 15, 2025

Replies: 1 comment 1 reply

mscolnick
Jan 20, 2025
Maintainer

sskagemo Jan 20, 2025
Author