Skip to content

create built in way to visualize dataframe in Jupyter#132

Merged
universalmind303 merged 1 commit intopola-rs:mainfrom
rgbkrk:jupyter.display
Oct 12, 2023
Merged

create built in way to visualize dataframe in Jupyter#132
universalmind303 merged 1 commit intopola-rs:mainfrom
rgbkrk:jupyter.display

Conversation

@rgbkrk
Copy link
Copy Markdown
Contributor

@rgbkrk rgbkrk commented Oct 11, 2023

Hi! 👋🏻

This introduces a basic way to display Polars DataFrames from a deno kernel in a Jupyter Notebook.

image

While I've baked it into display.js, which is on its way to be part of Deno itself (denoland/deno#20819), I'd love to get a solid implementation inside of nodejs-polars. There are two mediatypes exported here via the Rich Content Output Symbol, Symbol.for("Jupyter.display")

  • text/html - a standard static table we're all pretty used to
  • application/vnd.dataresource+json - Row oriented JSON for frontends to implement scrollable grids for a table (and additional data visualization).

Here's an example of what's possible with the data resource media type:

image

Blog post featuring nodejs-polars: https://blog.jupyter.org/bringing-modern-javascript-to-the-jupyter-notebook-fc998095081e

Once this is in I'll update the blog post screenshots to use the implementation that relies on nodejs-polars.

Background on the Data Resource Schema

Using the schema from https://specs.frictionlessdata.io/schemas/tabular-data-resource.json, we can emit a format that frontends can use for much richer tables.

As an example, in pandas you enable it with:

pd.set_option("display.html.table_schema", True)

and then just type df in cells. Here's some more sample (python) code for looking at this format

import pandas as pd

df = pd.read_csv(
    "https://raw.githubusercontent.com/allisonhorst/palmerpenguins/main/inst/extdata/penguins.csv"
)
# Enable { schema, data } output as JSON
pd.set_option("display.html.table_schema", True)

# Now to demonstrate what's inside
import json

ip = get_ipython()

# Enable { schema, data } output as JSON
pd.set_option("display.html.table_schema", True)
data, metadata = ip.display_formatter.format(df.head(5))

data_resource = data['application/vnd.dataresource+json']

print(json.dumps(data_resource, indent=2))
{
  "schema": {
    "fields": [
      {
        "name": "index",
        "type": "integer"
      },
      {
        "name": "species",
        "type": "string"
      },
      {
        "name": "island",
        "type": "string"
      },
      {
        "name": "bill_length_mm",
        "type": "number"
      },
      {
        "name": "bill_depth_mm",
        "type": "number"
      },
      {
        "name": "flipper_length_mm",
        "type": "number"
      },
      {
        "name": "body_mass_g",
        "type": "number"
      },
      {
        "name": "sex",
        "type": "string"
      },
      {
        "name": "year",
        "type": "integer"
      }
    ],
    "primaryKey": [
      "index"
    ],
    "pandas_version": "1.4.0"
  },
  "data": [
    {
      "index": 0,
      "species": "Adelie",
      "island": "Torgersen",
      "bill_length_mm": 39.1,
      "bill_depth_mm": 18.7,
      "flipper_length_mm": 181.0,
      "body_mass_g": 3750.0,
      "sex": "male",
      "year": 2007
    },
    {
      "index": 1,
      "species": "Adelie",
      "island": "Torgersen",
      "bill_length_mm": 39.5,
      "bill_depth_mm": 17.4,
      "flipper_length_mm": 186.0,
      "body_mass_g": 3800.0,
      "sex": "female",
      "year": 2007
    },
    {
      "index": 2,
      "species": "Adelie",
      "island": "Torgersen",
      "bill_length_mm": 40.3,
      "bill_depth_mm": 18.0,
      "flipper_length_mm": 195.0,
      "body_mass_g": 3250.0,
      "sex": "female",
      "year": 2007
    },
    {
      "index": 3,
      "species": "Adelie",
      "island": "Torgersen",
      "bill_length_mm": null,
      "bill_depth_mm": null,
      "flipper_length_mm": null,
      "body_mass_g": null,
      "sex": null,
      "year": 2007
    },
    {
      "index": 4,
      "species": "Adelie",
      "island": "Torgersen",
      "bill_length_mm": 36.7,
      "bill_depth_mm": 19.3,
      "flipper_length_mm": 193.0,
      "body_mass_g": 3450.0,
      "sex": "female",
      "year": 2007
    }
  ]
}

@rgbkrk
Copy link
Copy Markdown
Contributor Author

rgbkrk commented Oct 11, 2023

Note: I could not actually build this branch due to an issue with local builds, as mentioned on Discord.

Hey all! I'm trying to make a PR to nodejs-polars and haven't been able to figure out how to get the toolchain working well:

...
active toolchain
----------------

nightly-2023-07-27-aarch64-apple-darwin (overridden by '/Users/kai/code/src/github.com/pola-rs/nodejs-polars/rust-toolchain')
rustc 1.73.0-nightly (0d95f9132 2023-07-26)


nodejs-polars on  main [$!?] is 📦 v0.8.2 via ⬢ v18.12.0 via 🦀 v1.72.1 took 24s
➜ yarn build && yarn build:ts
...
   Compiling polars-lazy v0.32.0 (https://github.com/pola-rs/polars.git?rev=ec0c91f93fcd1ac355c667d6c3c3f30b257ea0a6#ec0c91f9)
error[E0554]: `#![feature]` may not be used on the stable release channel
  --> /Users/kai/.cargo/registry/src/index.crates.io-6f17d22bba15001f/argminmax-0.6.1/src/lib.rs:72:39
   |
72 | #![cfg_attr(feature = "nightly_simd", feature(stdsimd))]
   |                                       ^^^^^^^^^^^^^^^^

error[E0554]: `#![feature]` may not be used on the stable release channel
  --> /Users/kai/.cargo/registry/src/index.crates.io-6f17d22bba15001f/argminmax-0.6.1/src/lib.rs:73:39
   |
73 | #![cfg_attr(feature = "nightly_simd", feature(avx512_target_feature))]
   |                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error[E0554]: `#![feature]` may not be used on the stable release channel
  --> /Users/kai/.cargo/registry/src/index.crates.io-6f17d22bba15001f/argminmax-0.6.1/src/lib.rs:74:39
   |
74 | #![cfg_attr(feature = "nightly_simd", feature(arm_target_feature))]
   |                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^

error[E0554]: `#![feature]` may not be used on the stable release channel
  --> /Users/kai/.cargo/registry/src/index.crates.io-6f17d22bba15001f/argminmax-0.6.1/src/lib.rs:72:47
   |
72 | #![cfg_attr(feature = "nightly_simd", feature(stdsimd))]
   |                                               ^^^^^^^

   Compiling enum_dispatch v0.3.12
...

Does anyone else run into this or know how to get to a decent state for a local build on MacOS? I'm extremely new to Rust so I might be missing something super simple.

@Bidek56
Copy link
Copy Markdown
Collaborator

Bidek56 commented Oct 11, 2023

@rgbkrk please run "yarn run lint:ts:fix" on your PR and push the changes. Thx

@rgbkrk
Copy link
Copy Markdown
Contributor Author

rgbkrk commented Oct 11, 2023

@rgbkrk please run "yarn run lint:ts:fix" on your PR and push the changes. Thx

Done.

@Bidek56
Copy link
Copy Markdown
Collaborator

Bidek56 commented Oct 11, 2023

@rgbkrk Do you want to add some documentation around this feature? Thx

@rgbkrk
Copy link
Copy Markdown
Contributor Author

rgbkrk commented Oct 11, 2023

Totally. Where do you want it? Other than the README I'm only seeing jsdocs.

@Bidek56
Copy link
Copy Markdown
Collaborator

Bidek56 commented Oct 11, 2023

Probably in the Readme below webpack, unless you find a better place. Thx

@rgbkrk
Copy link
Copy Markdown
Contributor Author

rgbkrk commented Oct 11, 2023

README is a great start! I'll get that submitted soon.

In a separate PR I'd love to get some user guides together to help people get started with polars in JavaScript / Typescript.

@rgbkrk
Copy link
Copy Markdown
Contributor Author

rgbkrk commented Oct 11, 2023

Done! Let me know what you think.

@Bidek56
Copy link
Copy Markdown
Collaborator

Bidek56 commented Oct 11, 2023

LGTM, but I am just wondering 2 things:

  1. Is there a need for JS/TS DF library in the Jupyter ecosystem since everyone is using Python?
  2. If we are adding Deno to our docs, do we need to add Deno to our CI tests? We are currently using Bun and Node.

@rgbkrk
Copy link
Copy Markdown
Contributor Author

rgbkrk commented Oct 11, 2023

Is there a need for JS/TS DF library in the Jupyter ecosystem since everyone is using Python?

Isn't... that what this library is?

I use Python everyday. There are also a lot of JavaScript developers out there and you've all made something that is really solid interop between the javascript ecosystem and arrow underneath. I can use D3 directly with Polars DataFrames to visualize data. In this example polars notebook, I'm able to pass columns directly to D3 to create the scales I need, server side.

image

I'm ready to enable JavaScript developers to use polars.

@Bidek56
Copy link
Copy Markdown
Collaborator

Bidek56 commented Oct 11, 2023

I think this library is more for backend service use in Node/Bun/Deno then DS, but that's for community to decide. :)

Copy link
Copy Markdown
Collaborator

@universalmind303 universalmind303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work @rgbkrk. Really excited to see more data science & analytics tools embracing JS. (_ I personally strongly dislike python syntax_).

Also excited to see that polars works on deno now. I tried using polars with deno a long time ago & it was unsupported.

Just a few small comments on the PR.

Comment thread polars/dataframe.ts Outdated
Comment thread polars/dataframe.ts Outdated
Comment thread polars/dataframe.ts Outdated
Comment thread polars/dataframe.ts Outdated
Comment thread polars/dataframe.ts Outdated
Comment thread polars/dataframe.ts Outdated
@rgbkrk rgbkrk force-pushed the jupyter.display branch 3 times, most recently from b7de10f to 1d6df04 Compare October 12, 2023 03:49
Copy link
Copy Markdown
Collaborator

@universalmind303 universalmind303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for this? otherwise looks like it's good to go!

@Bidek56
Copy link
Copy Markdown
Collaborator

Bidek56 commented Oct 12, 2023

@universalmind303 Do we need a CI test using Deno now that we are using it?

@rgbkrk
Copy link
Copy Markdown
Contributor Author

rgbkrk commented Oct 12, 2023

Ok, the test is in and this is working really nicely. Ready for another look over in a final review.


🦕 I'm not sure how you're going to want to test it with Deno but it seems like a good follow up PR.

The way I'm exercising this right now for local hacking is with a mod.ts (for Deno) that contains:

import { createRequire } from "node:module"

const require = createRequire(import.meta.url)

export const pl = require('./bin/index.js');

export default pl;

Then I just import it in a notebook like this:

import pl from "./mod.ts"

let df = new pl.DataFrame({
    fruit: ['Apples', 'Oranges', 'Ooples & Banoonoos'],
    value: [0, 1, 0.5],
    comparability: ["<", ">", "="],
    "< 0.5": [true, false, false]
})

df
image

When this is shipped we'll be able to just use import pl from npm:nodejs-polars. I only have this mod.ts to make it easy to work with the local node package without linking or import maps.

CC @sigmaSd @bartlomieju

@universalmind303 universalmind303 merged commit 0387b7d into pola-rs:main Oct 12, 2023
@universalmind303
Copy link
Copy Markdown
Collaborator

Thanks again @rgbkrk. I'm very excited to see more JS + notebook integrations maturing.

@rgbkrk
Copy link
Copy Markdown
Contributor Author

rgbkrk commented Oct 12, 2023

Thank you for the review and the stewardship of polars for JavaScript! I'm excited to see what people can do with this.

@rgbkrk rgbkrk deleted the jupyter.display branch October 20, 2023 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants