Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add redactionPolicies field to Jupyter Event schemas #2

Merged
merged 22 commits into from
Aug 11, 2022
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/python-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ jobs:
- name: Install the Python dependencies
run: |
pip install -e ".[test]" codecov
pip list
- name: Run the tests
if: ${{ !startsWith(matrix.python-version, 'pypy') && !startsWith(matrix.os, 'windows') }}
run: |
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ repos:
hooks:
- id: mypy
exclude: examples/simple/setup.py
additional_dependencies: [types-requests]
additional_dependencies: [types-requests, types-PyYAML]

- repo: https://github.com/pre-commit/mirrors-prettier
rev: v2.6.2
Expand Down
10 changes: 8 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,14 @@
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions: List = []
extensions: List = ["myst_parser", "jupyterlite_sphinx"]

# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]

source_suffix = [".rst", ".md"]


# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
Expand All @@ -45,10 +48,13 @@
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = "sphinx_rtd_theme"
html_theme = "pydata_sphinx_theme"

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]
master_doc = "index"

# Configure jupyterlite to import jupyter_events package
jupyterlite_contents = ["pages/demo-notebook.ipynb"]
2 changes: 2 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ Jupyter's Events library can be installed from PyPI.
pages/configure
pages/application
pages/schemas
pages/redaction_policies
pages/demo

Indices and tables
------------------
Expand Down
7 changes: 3 additions & 4 deletions docs/pages/application.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,22 +30,21 @@ EventLogger has two configurable traits:

- ``handlers``: a list of Python's logging handlers that
handle the recording of incoming events.
- ``allowed_schemas``: a dictionary of options for each schema
describing what data should be collected.
- ``redacted_policies``: a list of `redactionPolicies` that will be removed from all emitted events.

Next, you'll need to register event schemas for your application.
You can register schemas using the ``register_schema_file``
(JSON or YAML format) or ``register_schema`` methods.


Once your have an instance of ``EventLogger`` and your registered
schemas, you can use the ``record_event`` method to log that event.
schemas, you can use the ``emit`` method to log that event.

.. code-block:: python

# Record an example event.
event = {'name': 'example event'}
self.eventlogger.record_event(
self.eventlogger.emit(
schema_id='url.to.event.schema',
version=1,
event=event
Expand Down
33 changes: 33 additions & 0 deletions docs/pages/configure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
.. \_using-events:

# Using Jupyter Events in Jupyter applications

Most people will use `jupyter_events` to log events data from Jupyter
applications, (e.g. JupyterLab, Jupyter Server, JupyterHub, etc).

In this case, you'll be able to record events provided by schemas within
those applications. To start, you'll need to configure each
application's `EventLogger` object.

This usually means two things:

1. Define a set of `logging` handlers (from Python's standard library)
to tell Jupyter Events where to send your event data
(e.g. file, remote storage, etc.)
2. List redacted policies to remove sensitive data from any events.

Here is an example of a Jupyter configuration file, e.g. `jupyter_config.d`,
that demonstrates how to configure an eventlog.

```python
from logging import FileHandler

# Log events to a local file on disk.
handler = FileHandler('events.txt')

# Explicitly list the types of events
# to record and what properties or what categories
# of data to begin collecting.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't seem to fit the code below it, probably due to the switch from categories to redaction policies and the fact the default behavior has also changed.

c.EventLogger.handlers = [handler]
c.EventLogger.redacted_policies = ["user-identifiable-information", "user-identifier"]
```
42 changes: 0 additions & 42 deletions docs/pages/configure.rst

This file was deleted.

32 changes: 32 additions & 0 deletions docs/pages/demo-notebook.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import piplite\n",
"\n",
"piplite.install(\"jupyter_events\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from jupyter_events.logger import EventLogger\n"
]
}
],
"metadata": {
"language_info": {
"name": "python"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
8 changes: 8 additions & 0 deletions docs/pages/demo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Jupyter Events Demo

```{retrolite} demo-notebook.ipynb
---
width: 100%
height: 600px
---
```
5 changes: 5 additions & 0 deletions docs/pages/redaction_policies.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Redacting Sensitive Data

Jupyter Events might possible include sensitive data, specifically personally identifiable information (PII). To reduce
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Jupyter Events might possible include sensitive data, specifically personally identifiable information (PII). To reduce
Jupyter Events might possibly include sensitive data, specifically personally identifiable information (PII). To reduce

the risk of capturing unwanted PII, Jupyter Events requires _every_ registered event to explicitly list its
`redactionPolicies`. Data labeled with a redacted policed will be removed from an event by Jupyter Events **before** before being emitted. Schemas that list properties without an explicit `redactionPolicies` list will fail validation.
58 changes: 58 additions & 0 deletions docs/pages/schemas.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Writing a schema for Jupyter Events

Jupyter Event Schemas must be valid [JSON schema](https://json-schema.org/) and can be written in valid
YAML or JSON. Every schema is validated against Jupyter Event's "meta"-JSON schema, [here]().

At a minimum, valid Jupyter Event schema requires have the following keys:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
At a minimum, valid Jupyter Event schema requires have the following keys:
At a minimum, valid Jupyter Event schema requires the following keys:


- `$id` : a URI to identify (and possibly locate) the schema.
- `version` : the schema version.
- `redactionPolicies`: a list of labels representing the personal data sensitivity of this event. The main logger can be configured to redact any events or event properties that might contain sensitive information. Set this value to `"unrestricted"` if emitting that this event happen does not reveal any person data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last phrase doesn't parse for me. Take the suggested change with a grain of salt...

Suggested change
- `redactionPolicies`: a list of labels representing the personal data sensitivity of this event. The main logger can be configured to redact any events or event properties that might contain sensitive information. Set this value to `"unrestricted"` if emitting that this event happen does not reveal any person data.
- `redactionPolicies`: a list of labels representing the personal data sensitivity of this event. The main logger can be configured to redact any events or event properties that might contain sensitive information. Set this value to `"unrestricted"` if emitting this event does not reveal any personal data.

- `properties` : attributes of the event being emitted.

Each property should have the following attributes:

- `title` : name of the property
- `redactionPolicies`: a list of labels representing the personal data sensitivity of this property. This field will be redacted from the emitted event if the policy is not allowed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `redactionPolicies`: a list of labels representing the personal data sensitivity of this property. This field will be redacted from the emitted event if the policy is not allowed.
- `redactionPolicies`: a list of labels representing the personal data sensitivity of this property. This field will be redacted from the emitted event if any of its `redactionPolicies` labels are listed in the event logger's `redactedPolicies` set.


- `required`: list of required properties.

Here is a minimal example of a valid JSON schema for an event.

```yaml
$id: event.jupyter.org/example-event
version: 1
title: My Event
description: |
All events must have a name property
type: object
redactionPolicy:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
redactionPolicy:
redactionPolicies:

- category.jupyter.org/unrestricted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the set of valid labels? This URI format seems to imply they are can be supplied by the "event provider".

properties:
thing:
title: Thing
redactionPolicy:
- category.jupyter.org/unrestricted
description: A random thing.
user:
title: User name
redactionPolicies:
- category.jupyter.org/user-identifier
description: Name of user who initiated event
required:
- thing
- user
```

## Redaction Policies

Each property can be labelled with `redactionPolicies` field. This makes it easier to
filter properties based on a category. We recommend that schema authors use valid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Suggested change
filter properties based on a category. We recommend that schema authors use valid
filter out properties based on a redaction policy. We recommend that schema authors use valid

URIs for these labels, e.g. something like `category.jupyter.org/unrestricted`.

Below is a list of common category labels that Jupyter Events recommends using:

- `category.jupyter.org/unrestricted`
- `category.jupyter.org/user-identifier`
- `category.jupyter.org/user-identifiable-information`
- `category.jupyter.org/action-timestamp`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. If different event providers have their own definition of what is PII (including their own labels, but even perhaps not), how does an Operator:
a) determine what set of labels to add to the redactedPolicies property on the event logger?
b) change a given property's redaction criteria because they happen to deem the current settings inadequate?

64 changes: 0 additions & 64 deletions docs/pages/schemas.rst

This file was deleted.

3 changes: 2 additions & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
sphinx_rtd_theme
myst_parser
pydata_sphinx_theme
Loading