-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add redactionPolicies field to Jupyter Event schemas #2
Changes from 7 commits
1090f7d
91a9501
9c36c10
c42cbcd
433832d
4b83963
7f17a63
b972ed3
be49d43
a59619e
ea70c1d
5339800
b90ce6e
3b98e7c
f8f848c
492f5ae
8390457
98ac7c7
461fa76
f92a066
8d43c5b
4659ddf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
.. \_using-events: | ||
|
||
# Using Jupyter Events in Jupyter applications | ||
|
||
Most people will use `jupyter_events` to log events data from Jupyter | ||
applications, (e.g. JupyterLab, Jupyter Server, JupyterHub, etc). | ||
|
||
In this case, you'll be able to record events provided by schemas within | ||
those applications. To start, you'll need to configure each | ||
application's `EventLogger` object. | ||
|
||
This usually means two things: | ||
|
||
1. Define a set of `logging` handlers (from Python's standard library) | ||
to tell Jupyter Events where to send your event data | ||
(e.g. file, remote storage, etc.) | ||
2. List redacted policies to remove sensitive data from any events. | ||
|
||
Here is an example of a Jupyter configuration file, e.g. `jupyter_config.d`, | ||
that demonstrates how to configure an eventlog. | ||
|
||
```python | ||
from logging import FileHandler | ||
|
||
# Log events to a local file on disk. | ||
handler = FileHandler('events.txt') | ||
|
||
# Explicitly list the types of events | ||
# to record and what properties or what categories | ||
# of data to begin collecting. | ||
c.EventLogger.handlers = [handler] | ||
c.EventLogger.redacted_policies = ["user-identifiable-information", "user-identifier"] | ||
``` |
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import piplite\n", | ||
"\n", | ||
"piplite.install(\"jupyter_events\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from jupyter_events.logger import EventLogger\n" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"language_info": { | ||
"name": "python" | ||
}, | ||
"orig_nbformat": 4 | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# Jupyter Events Demo | ||
|
||
```{retrolite} demo-notebook.ipynb | ||
--- | ||
width: 100% | ||
height: 600px | ||
--- | ||
``` |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,5 @@ | ||||||
# Redacting Sensitive Data | ||||||
|
||||||
Jupyter Events might possible include sensitive data, specifically personally identifiable information (PII). To reduce | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
the risk of capturing unwanted PII, Jupyter Events requires _every_ registered event to explicitly list its | ||||||
`redactionPolicies`. Data labeled with a redacted policed will be removed from an event by Jupyter Events **before** before being emitted. Schemas that list properties without an explicit `redactionPolicies` list will fail validation. |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,58 @@ | ||||||
# Writing a schema for Jupyter Events | ||||||
|
||||||
Jupyter Event Schemas must be valid [JSON schema](https://json-schema.org/) and can be written in valid | ||||||
YAML or JSON. Every schema is validated against Jupyter Event's "meta"-JSON schema, [here](). | ||||||
|
||||||
At a minimum, valid Jupyter Event schema requires have the following keys: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
- `$id` : a URI to identify (and possibly locate) the schema. | ||||||
- `version` : the schema version. | ||||||
- `redactionPolicies`: a list of labels representing the personal data sensitivity of this event. The main logger can be configured to redact any events or event properties that might contain sensitive information. Set this value to `"unrestricted"` if emitting that this event happen does not reveal any person data. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The last phrase doesn't parse for me. Take the suggested change with a grain of salt...
Suggested change
|
||||||
- `properties` : attributes of the event being emitted. | ||||||
|
||||||
Each property should have the following attributes: | ||||||
|
||||||
- `title` : name of the property | ||||||
- `redactionPolicies`: a list of labels representing the personal data sensitivity of this property. This field will be redacted from the emitted event if the policy is not allowed. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
- `required`: list of required properties. | ||||||
|
||||||
Here is a minimal example of a valid JSON schema for an event. | ||||||
|
||||||
```yaml | ||||||
$id: event.jupyter.org/example-event | ||||||
version: 1 | ||||||
title: My Event | ||||||
description: | | ||||||
All events must have a name property | ||||||
type: object | ||||||
redactionPolicy: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
- category.jupyter.org/unrestricted | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the set of valid labels? This URI format seems to imply they are can be supplied by the "event provider". |
||||||
properties: | ||||||
thing: | ||||||
title: Thing | ||||||
redactionPolicy: | ||||||
- category.jupyter.org/unrestricted | ||||||
description: A random thing. | ||||||
user: | ||||||
title: User name | ||||||
redactionPolicies: | ||||||
- category.jupyter.org/user-identifier | ||||||
description: Name of user who initiated event | ||||||
required: | ||||||
- thing | ||||||
- user | ||||||
``` | ||||||
|
||||||
## Redaction Policies | ||||||
|
||||||
Each property can be labelled with `redactionPolicies` field. This makes it easier to | ||||||
filter properties based on a category. We recommend that schema authors use valid | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ?
Suggested change
|
||||||
URIs for these labels, e.g. something like `category.jupyter.org/unrestricted`. | ||||||
|
||||||
Below is a list of common category labels that Jupyter Events recommends using: | ||||||
|
||||||
- `category.jupyter.org/unrestricted` | ||||||
- `category.jupyter.org/user-identifier` | ||||||
- `category.jupyter.org/user-identifiable-information` | ||||||
- `category.jupyter.org/action-timestamp` | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm. If different event providers have their own definition of what is PII (including their own labels, but even perhaps not), how does an Operator: |
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,2 @@ | ||
sphinx_rtd_theme | ||
myst_parser | ||
pydata_sphinx_theme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment doesn't seem to fit the code below it, probably due to the switch from categories to redaction policies and the fact the default behavior has also changed.