Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An event system for Jupyter #780

Open
Tracked by #789
afshin opened this issue Apr 7, 2022 · 16 comments
Open
Tracked by #789

An event system for Jupyter #780

afshin opened this issue Apr 7, 2022 · 16 comments

Comments

@afshin
Copy link
Contributor

afshin commented Apr 7, 2022

This is a draft document, please feel free to comment and help.

We have (at least) two concurrent efforts that overlap but are not full implementations of a generic event system for Jupyter in themselves: jupyter-telemetry and jupyterlab-notifications.

A synthesis of these extensions with generic endpoints (i.e., not specifically designed and named for telemetry or notifications) would yield a flexible general-purpose event bus for jupyter-server-based applications.

cc: @andrii-i @3coins

Architecture of events API

REST Endpoints

  • POST /api/events - create new events
  • GET /api/events/schemas - query/list registered schemas
    (maybe -- needs discussion)
  • POST /api/events/schemas - register schemas
    (maybe -- needs discussion)

WebSocket endpoints (WebsocketHandler)

  • /api/events/subscribe - fire hose of all events -- perhaps accept filters? (see open question below)
  • /api/events/subscribe/notification -- subscribe to events of type notification

Open Question: Should the WebSocket handler support making a request for multiple filters to be applied instead of just the one proposed in the URL scheme above?

Depends on jupyter_events package

  • exports EventLogger object in (formerly EventLog in jupyter_telemetry)

Case Study: JupyterLab Notifications

Server-side functionality

  • Subscribes to all notification events that pass through the event bus
  • Adds each notification as a row in a SQLite database on the server with a key for the recipient identity as well as an ID
    • notification events with multiple recipients can be de-normalized here and written as multiple rows
  • REST API

    • GET /api/notifications - retrieve a list of all notifications that authenticated user can see
    • GET /api/notifications/{ID} - retrieve a specific notification
    • DELETE /api/notifications/{ID} - delete a specific notification

Client-side functionality

  • Subscribe to the /api/events/notifications WebSocket
    • Throttle its incoming messages at some reasonable rate (on the order of 0.5-1 seconds)
  • Treat incoming messages from the events API as a notifier only -- check the /api/notifications endpoint for the actual list of messages
  • Render the badge and the notification center UI inside JupyterLab/Notebook

JupyterLab 4 extension

  • A Token (e.g., INotifications or IEvents) that exposes an IDataConnector for event CRUD and an ISignal for event subscription
  • A visual UI for an event notification center

Jupyter Notebook 7 extension

  • The Token from the JupyterLab extension
  • A version of the JupyterLab UI for notifications
@3coins
Copy link
Contributor

3coins commented Apr 7, 2022

@afshin
Should we just add this to the server or need a new server extension package? Is anyone assigned to this task?

@afshin
Copy link
Contributor Author

afshin commented Apr 7, 2022

@3coins, this should be in the core server.

Currently, no one is specifically assigned. I'd like to see the user interface portion of this landing in JupyterLab and I am happy to work on any part of the stack that helps get us there.

I think that the work on this already done in the telemetry space might be farther along than the server extension from the notifications extension, so grafting those handlers into jupyter-server might be the best way of bringing this into core.

What are you thinking? Let's have a conversation about this with all the people who have interest and bandwidth to work on it.

@3coins
Copy link
Contributor

3coins commented Apr 7, 2022

@afshin

What are you thinking? Let's have a conversation about this with all the people who have interest and bandwidth to work on it.

Agree, let me know if you want to have an offline discussion including anyone else who wants to work on this; personally, I would like to get some experience on the server side, but happy to work on any part of the stack. Is there an expected time frame to get these changes done?

@afshin
Copy link
Contributor Author

afshin commented Apr 7, 2022

I think Zach is rounding up interested folks (including you) for a conversation.

We are targeting jupyter-server v2 and jupyterlab v4 (so late June, early July).

@rahul26goyal
Copy link

@afshin / @Zsailer :
please include me on any meeting that might happen related to this. I am interested to learn more on this area and contribute anyway I can.

@3coins
Copy link
Contributor

3coins commented May 5, 2022

As discussed in the server meeting on 5/5/2022, here is an initial list of tasks for the event notification system. This list is by no means final, feel free to add comments or feedback.

  1. Event Bus - #820

    • A central event bus to relay events
    • /api/events/subscribe - Websocket for subscribing to events
    • A default handler for consuming events
  2. Rest API Endpoints

    • POST /api/events - Rest api to create new events
    • GET /api/events/schemas - Rest api to query/list registered schemas (Optional)
  3. Event buffer

    • A queue/buffer to store undelivered event messages
  4. JupyterLab 4 Event Client (jupyterlab-events)

    • Reuse jupyterlab-telemetry repo, either rename or copy to jupyterlab-events
    • Remove server endpoints, any redundant server code
    • Update client handlers to use the rest api endpoints
    • Add websocket handler to enable subscribtion to events
  5. Add Default events in server

    • Add default events e.g., content handler, kernel events in jupyter server
  6. JupyterLab 4 Updates

    • Add jupyterlab-events as dependency inside JupyterLab
    • Subscribe to default events
  7. Event Notification UI (JupyterLab)

    • UI updates for event notification
  8. Jupyter Notebook 7 Updates

    • Add jupyterlab-events as dependency
    • Subscribe to default events
    • Can we reuse event notification UI from JupyterLab?

@afshin
Copy link
Contributor Author

afshin commented May 13, 2022

Here is a document we can collaboratively edit so that the front-matter of this issue can have a canonical version that we edit once it is ready: https://hackmd.io/q4Rkq2BaS1SIXvyzt8j1yA

@davidbrochart
Copy link
Contributor

Since the event system is a new service that we are just starting to develop, how about making it as much as possible backend-agnostic? By that I mean that most of the logic should be usable in both jupyter-server and jupyverse.
But it is currently very tied to jupyter-server, Tornado and traitlets, which we don't want to depend on in jupyverse.

@Zsailer
Copy link
Member

Zsailer commented May 31, 2022

Thanks for bringing this up, @davidbrochart! I think we're going to see this question/conversation come up multiple times moving forward as we continue pushing Jupyter Server forward, while trying to bring jupyverse to the front.

Let me start by saying—technically, the event system is backend agnostic. We just defined a REST + websocket API for posting/subscribing to events. These are schema/protocol driven. Jupyverse can/should create an implementation of this API. I don't think there is anything tied specifically to Tornado here. Any server implementation will always have to write some server-library-specific code to make it work. Consider, if we started this in jupyverse, how would we port it to jupyter_server? We would have to re-implement the handlers in Tornado and drop the FastAPI specific logic.

That said, under the hood, we depend on jupyter_telemetry (hopefully, switching to jupyter_events soon) and you are correct—jupyter_telemetry/events depends on traitlets.

That's because we needed the Event System API to be configurable. I don't see a way around using traitlets for this without switching to some other backwards compatible, backend-agnostic, config-based library. For example, it looks to me that jupyverse/FPS is implementing its own (non-backend agnostic) configuration system, fps.config. While I believe FPS offers a much cleaner way to handle config, it's not backwards compatible with Jupyter Server. This might be a place we can improve.

Unfortunately, at this time, I don't see a single solution that would work for both. And while I see jupyverse as our future (it's awesome!), I don't think we should block jupyter_server from making advancements using the older dependencies at this point in time.

Do you have ideas how to reconcile this?

@davidbrochart
Copy link
Contributor

You're right Zach, jupyverse also has implemented specific logic for configuration, and I guess depending on FastAPI makes it kind of specific to this framework too.
I'm thinking about some low-level logic (functions, classes...) that would be called from either a Tornado handler or a FastAPI router, with all configuration already resolved at this point, and passed as generic arguments.

@Zsailer
Copy link
Member

Zsailer commented May 31, 2022

"backwards compatible, backend-agnostic, config-based library"

To me, this is the "holy grail".

We could probably get pretty close by

  1. writing logic that translates traitlets config into a pydantic BaseModel.
  2. handling traits/fields that "observe" other traits/fields.

@davidbrochart
Copy link
Contributor

I meant something more simple, like this GET handler calls this get method. If we can have the logic in the get method in a separate package, that's a great step towards backend agnosticism.

@JasonWeill
Copy link

Is an event intended to notify the user visually? If so, will we distinguish between read and unread notifications, high-priority and low-priority, notifications, etc.? I'm also curious about whether notifications might be transmitted via other means, such as e-mail or SMS.

@afshin
Copy link
Contributor Author

afshin commented Jun 15, 2022

@jweill-aws the "case study" above is about notifications and the idea is that it becomes an extension's job to manage its state. In the case of notifications, the extension will write events it cares about from the event bus into a SQL database and it will be the job of the client to call DELETE to remove those items from the database (i.e., make them "read").

@Zsailer
Copy link
Member

Zsailer commented Jul 22, 2022

In jupyter/jupyter_events#2, we've have been discussing the handling of sensitive data in the event system. I'm confident that these are already "solved problems" in other systems, so I need some help gathering information about how to properly do it here.

In jupyter/jupyter_events#2, I added a required field to every schema, "redactionPolices", that is used to describe the sensitivity of every event property. The event logger can be configured to redact sensitive policies from all data in all events. This data is redacted before the event is ever emitted. This provides a simple way to ensure that sensitive data is never persisted.

On the other hand, if a client (e.g. JupyterLab) builds features that depend on the event system, and these features depend on receiving all of the data, redacted events/data breaks these features. This makes the event system unusable to these features when launching in a data-conscious (i.e. most) environments.

To make the event system useful, we need to a secure way to handle sensitive data in transit, specifically when moving between Jupyter Server and its clients. Today, the event bus added in #820 shuttles raw events to the client across the websocket. Any authenticated websocket client can connect to this websocket and "see" all event data—this obviously isn't a secure approach.

This is where I need some help. What are some known patterns for handling sensitive data in transit from server to client? If we encrypt the data in the server, how do we secure decrypt it in something like JupyterLab?

@Zsailer
Copy link
Member

Zsailer commented Sep 1, 2022

The basic "plumbing" for Jupyter server's event system landed here: #862

We've started logging some events from the contents here: #954

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants