Pre-Proposal: Transmit Cell Metadata #68

jflam · 2021-04-22T20:43:25Z

Transmit Cell Metadata

Hello!

We (@willingc and @MSeal) would like to propose a new JEP that describes transmitting cell metadata to kernels. Below is a summary and a link to the full proposal as well.

Thx!
-John, Carol, Matt

Summary

This proposal discusses the transmission of cell metadata with execute message requests and would modify the Jupyter Messaging Protocol. Individual kernels would interpret or ignore this metadata. This enables flexibility in different usage scenarios implemented in various front-end clients.

Motivation

By transmitting cell metadata inline with the execute message request, Jupyter implementations will have a reliable channel to transmit additional metadata to the kernel in a standard way.

Notebook extensions can also use this channel to transmit additional information that was often transmitted using magic commands.

Some use cases which motivated this proposal are:

Route requests automatically to an appropriate kernel via libraries like [allthekernels]https://github.com/minrk/allthekernels) without need for additional metadata within the cell itself
Create or find a conda environment without needing to use magics, like pick
Support polyglot (more than one language/kernel within a single notebook) scenarios, like [sos]https://vatlab.github.io/sos-docs/)
Provide hints to the kernel for localization purposes, like how the ACCEPT_LANGUAGE HTTP header works
Provide hints to the kernel about client capabilities, similar to how hints of a web browser client's capabilities work

A link to the full proposal can be found here

The text was updated successfully, but these errors were encountered:

blois · 2021-04-22T21:46:05Z

Clients can already include metadata in execute requests- for example JupyterLab includes cellId and recordTiming.

The scenarios outlined in the full proposal seem to require cooperation between the client and kernel- for example polyglot editors would want to support per-cell languages and the kernel needs to know how to handle the executions. What is preventing polyglot notebooks from passing the language/kernel alongside cellId and recordTiming today?

MSeal · 2021-04-23T05:12:42Z

Clients can already include metadata in execute requests- for example JupyterLab includes cellId and recordTiming.

The goal is to make such information a part of the standard of minimal information passed so that we get less "This only works in JupyterLab (or nteract, or Colab, etc)". Without it being consistent across the clients you get client fragmentation where particular kernels can't work outside of particular interfaces. If cell metadata is always passed along you have a place where notebook extensions or specialized interfaces can place information to ensure the notebook still executes successfully in other Jupyter interfaces that have the designated kernel installed, without that interface needing to know about the capability.

The scenarios outlined in the full proposal seem to require cooperation between the client and kernel- for example polyglot editors would want to support per-cell languages and the kernel needs to know how to handle the executions.

If the associated metadata is always passed along, the client only needs to be kernel aware for generating such metadata the first time. So yes the polyglot notebook experience would be a degraded experience in an editor that wasn't aware of the kernel specific metadata consumers, it'd still function. This also makes tools like papermill able to execute consistently without kernel-coupled extensions in it's execution plan as well.

minrk · 2021-04-23T06:23:30Z

I think sending cell metadata makes good sense. My initial inclination was to use the existing message metadata already in the protocol, but your argument against doing that is convincing!

(for using the existing metadata field):

pro: it's an existing metadata field, so a small protocol change to "suggest clients and kernels use it"
con: message-level metadata might more logically be a place for message-independent metadata
con (implementation detail): the ipykernel implementation only passes doesn't include the metadata when wrapping messages into message handlers, so this might be a bit more work to implement. But that's only because there hasn't been relevant info here before. I suspect this pattern is not uncommon across kernels and clients.

Here's a question: does it make sense to include cell metadata in things like inspect / completion requests in a cell (basically any request that's logically associated with a cell)? The allthekernels example makes me think it would. If so, then I might lean toward using the existing message-level metadata field, again. If it does end up making sense to use message metadata, I might put it in a single (or metadata.cell) namespace rather than mapping cell metadata keys directly to message metadata keys to reduce conflicts with possible other sources of message metadata (counter-argument to that: if other message-level metadata were used for something, setting it in cell metadata could be useful! Also maybe dangerous..).

You mention a "client extension" for setting the cell metadata (like the cell tags editor), which I think makes sense. Have you thought about ways of setting cell metadata by the kernel, e.g. in message replies? Are there cases where that would make sense? Fine if not, but I thought it might be appropriate to consider both sides of connecting kernels to cell metadata: reading and writing.

jflam · 2021-04-23T17:07:15Z

does it make sense to include cell metadata in things like inspect / completion requests in a cell

Yes, I think it does make sense to send it for inspect_request / complete_request messages, particularly for polyglot scenarios as you rightly pointed out. Where I'm not following is why this would favor a message-level metadata field ... could you elaborate more on that?

Have you thought about ways of setting cell metadata by the kernel, e.g. in message replies?

We haven't thought about that, but thanks for looking at things from the other direction - that's always helpful! When we came up with the ACCEPT_LANGUAGE header example in the motivation section, we were inspired by HTTP. Following that line of thinking, and off the top of my head, one example that comes to mind is a caching directive for results?

minrk · 2021-04-23T20:41:45Z

Where I'm not following is why this would favor a message-level metadata field ... could you elaborate more on that?

Message-level metadata is something all messages have. Content is the area where all messages are different, and don't share things, though there can be duplication. If it's generic enough that "messages relating to a cell should arrive with the cell's metadata" that seems like a general message metadata pattern to me, rather than separately changing the content spec of each message that should attach cell metadata. Specifying it at the message-metadata level effectively attaches it to all future messages about a cell (debug messages, etc.) and seems like it might be simpler to implement and understand as a protocol-wide principle.

Ultimately, it's an implementation detail, as the same information is getting to the same place at the same time.

We haven't thought about that,

No problem, and it's AOK for it to be out of scope for this proposal, it just seemed like it might be relevant. I was trying to think of examples like dataflow kernels which handle things like replay and dependency - they could perhaps record their position in the execution graph more easily if they could set metadata keys on completion without the need for an extension.

minrk · 2021-04-28T09:20:17Z

Re-reading, it looks like there might be one more thing to consider. In these two cases:

Provide hints to the kernel for localization purposes, like how the ACCEPT_LANGUAGE HTTP header works

Provide hints to the kernel about client capabilities, similar to how hints of a web browser client's capabilities work

It is not cell metadata that logically houses this information, but rather properties of the client. I wouldn't want to encourage cloning an identical 'language' field onto every cell's metadata, for instance. Further, this logically would not be confined to 'cell' messages like execute/inspect/complete, but could also apply to kernel_info, and everything, really. That points me a little more toward encouraging use of the message-level metadata field, where:

message.metadata.cell contains a cell's metadata verbatim if a message can reasonably be associated with a cell (execute, etc.)
message.metadata.client contains possible client environment metadata, such that it might be used for localization, capabilities, etc.

What do you think?

jflam · 2021-04-29T18:09:11Z

Thanks for the additional insights, Min! I'm somewhat embarrassed that I missed that (localization hints != cell metadata) while brainstorming other use cases. As you might have inferred, I was really focused on the polyglot scenarios over these other ones.

That said, I like how you've been able to generalize the idea better, and I think it fits better with message-level metadata.

What do others think?

MSeal · 2021-05-05T23:15:05Z

@minrk Would you be comfortable with being assigned as the shepard for the JEP PR to follow? I think the PR will draw more eyes to the change and we can promote more engagement on the topic there. Thoughts?

minrk · 2021-05-06T05:57:43Z

Yes, happy to!

MSeal · 2021-05-07T06:09:57Z

Great, thanks Min!

blois mentioned this issue Jun 1, 2021

JEP: Add a dirty state to code cells in notebook format #69

Open

jasongrout linked a pull request Jul 7, 2021 that will close this issue

Proposal: Transmit Cell Metadata #70

Open

psychemedia mentioned this issue Sep 21, 2021

Parse question and answer from cell metadata jmshea/jupyterquiz#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-Proposal: Transmit Cell Metadata #68

Pre-Proposal: Transmit Cell Metadata #68

jflam commented Apr 22, 2021

blois commented Apr 22, 2021

MSeal commented Apr 23, 2021

minrk commented Apr 23, 2021

jflam commented Apr 23, 2021

minrk commented Apr 23, 2021

minrk commented Apr 28, 2021

jflam commented Apr 29, 2021

MSeal commented May 5, 2021

minrk commented May 6, 2021

MSeal commented May 7, 2021

Pre-Proposal: Transmit Cell Metadata #68

Pre-Proposal: Transmit Cell Metadata #68

Comments

jflam commented Apr 22, 2021

Transmit Cell Metadata

Summary

Motivation

blois commented Apr 22, 2021

MSeal commented Apr 23, 2021

minrk commented Apr 23, 2021

jflam commented Apr 23, 2021

minrk commented Apr 23, 2021

minrk commented Apr 28, 2021

jflam commented Apr 29, 2021

MSeal commented May 5, 2021

minrk commented May 6, 2021

MSeal commented May 7, 2021