[18362773160] Introduce `EXPERIMENTAL_POLARS` output format #2769

IvoDD · 2025-11-20T14:00:37Z

Reference Issues/PRs

Monday ref: 18362773160

What does this implement or fix?

Polars output format is just a thin wrapper around the arrow output format. We create the polars dataframe zero-copy from the pyarrow table.

Also improves some docs.

Adds just a few extra tests for polars because:

Extensive arrow testing covers most arrow related logic
Parametrizing many tests to work with polars is difficult because polars.DataFrame does not have any concept of pandas metadata.

Any other comments?

I decided to go through pyarrow even though we could avoid the pyarrow dependency by using a PyCapsule, because:

This would require rewriting our arrow denormalization (which currently relies on pyarrow APIs)
This would require extra testing coverage of polars output format. And it is harder to parametrize our existing tests because polars doesn't have a concept a pandas metadata.

Also needed to clean up some space for conda build. This is done in the conda workflow. A successful run with workflow from this branch can be seen here.

Checklist

Checklist for code changes...

Have you updated the relevant docstrings, documentation and copyright notice?
Is this contribution tested against all ArcticDB's features?
Do all exceptions introduced raise appropriate error messages?
Are API changes highlighted in the PR description?
Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

alexowens90 · 2025-11-21T15:53:04Z

python/arcticdb/options.py

        return InternalOutputFormat.PANDAS
-    elif output_format.lower() == OutputFormat.EXPERIMENTAL_ARROW.lower():
+    elif output_format.lower() in [OutputFormat.EXPERIMENTAL_ARROW.lower(), OutputFormat.EXPERIMENTAL_POLARS.lower()]:
        if not _PYARROW_AVAILABLE:


I don't think any dependency resolver would let us get into a position where pyarrow is available and polars isn't, but we may as well check just in case

alexowens90 · 2025-11-21T15:54:40Z

python/arcticdb/version_store/_store.py

        return self.version_store.get_column_stats_info_version(symbol, version_query).to_map()

-    def _batch_read_keys(self, atom_keys, read_options):
+    def _batch_read_keys(self, atom_keys, read_options, output_format):


@phoebusm has removed this function entirely as part if the recursive normalizers performance improvement PR, so whoever is merging second will get a merge conflict

alexowens90 · 2025-11-21T15:57:31Z

python/arcticdb/version_store/_store.py

        query_builder = copy.deepcopy(query_builder)
        read_queries = self._get_read_queries(len(symbols), date_ranges, row_ranges, columns, query_builder)
-        batch_read_options = self._get_batch_read_options(
+        batch_read_options, output_format = self._get_batch_read_options(


Couldn't we expose the output_format as a read-only property of the batch_read_options C++ object?

It is exposed but both the ReadOptions and BatchReadOptions have only the internal c++ InternalOutputFormat which in both cases is just ARROW.

Here I need to differenciate between the python level OutputFormat which is different for PYARROW and POLARS.
I think I'd like to keep it this way because C++ layer doesn't need to know whether it's pyarrow or polars. Both are only python suger on top of the arrow c structures.

IvoDD requested review from alexowens90 and poodlewars as code owners November 20, 2025 14:00

IvoDD force-pushed the polars-output-format branch 2 times, most recently from a91a646 to c5d0d19 Compare November 20, 2025 14:35

IvoDD force-pushed the batch-configurable-strings branch from ccd25c4 to acb1887 Compare November 20, 2025 15:19

IvoDD force-pushed the polars-output-format branch 2 times, most recently from f4fbf82 to d2c32dc Compare November 20, 2025 15:40

Base automatically changed from batch-configurable-strings to master November 21, 2025 08:08

IvoDD force-pushed the polars-output-format branch from d2c32dc to 44b11a2 Compare November 21, 2025 08:12

IvoDD added patch Small change, should increase patch version no-release-notes This PR shouldn't be added to release notes. labels Nov 21, 2025

alexowens90 reviewed Nov 21, 2025

View reviewed changes

alexowens90 approved these changes Nov 21, 2025

View reviewed changes

IvoDD force-pushed the polars-output-format branch from 44b11a2 to 3c821ed Compare November 24, 2025 08:56

Polars output format without docs

a5299e0

IvoDD force-pushed the polars-output-format branch from 3c821ed to a5299e0 Compare November 24, 2025 09:25

vasil-pashov approved these changes Nov 24, 2025

View reviewed changes

IvoDD force-pushed the polars-output-format branch from c0d37b1 to ba7bff2 Compare November 24, 2025 13:49

Free disk space when building with conda

d654d02

IvoDD force-pushed the polars-output-format branch from ba7bff2 to d654d02 Compare November 24, 2025 13:57

IvoDD merged commit 8e6763a into master Nov 24, 2025
185 of 186 checks passed

IvoDD deleted the polars-output-format branch November 24, 2025 15:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[18362773160] Introduce `EXPERIMENTAL_POLARS` output format #2769

[18362773160] Introduce `EXPERIMENTAL_POLARS` output format #2769

IvoDD commented Nov 20, 2025 •

edited

Loading

Uh oh!

alexowens90 Nov 21, 2025

Uh oh!

alexowens90 Nov 21, 2025

Uh oh!

alexowens90 Nov 21, 2025

Uh oh!

IvoDD Nov 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[18362773160] Introduce EXPERIMENTAL_POLARS output format #2769

[18362773160] Introduce EXPERIMENTAL_POLARS output format #2769

Conversation

IvoDD commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement or fix?

Any other comments?

Checklist

Uh oh!

alexowens90 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

alexowens90 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

alexowens90 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

IvoDD Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[18362773160] Introduce `EXPERIMENTAL_POLARS` output format #2769

[18362773160] Introduce `EXPERIMENTAL_POLARS` output format #2769

IvoDD commented Nov 20, 2025 •

edited

Loading

IvoDD Nov 24, 2025 •

edited

Loading