Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add redactionPolicies field to Jupyter Event schemas #3

Closed
wants to merge 14 commits into from
1 change: 1 addition & 0 deletions .github/workflows/python-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ jobs:
- name: Install the Python dependencies
run: |
pip install -e ".[test]" codecov
pip list
- name: Run the tests
if: ${{ !startsWith(matrix.python-version, 'pypy') && !startsWith(matrix.os, 'windows') }}
run: |
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ repos:
hooks:
- id: mypy
exclude: examples/simple/setup.py
additional_dependencies: [types-requests]
additional_dependencies: [types-requests, types-PyYAML]

- repo: https://github.com/pre-commit/mirrors-prettier
rev: v2.6.2
Expand Down
16 changes: 7 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,6 @@ logger = EventLogger(
handlers=[
logging.FileHandler('events.log')
],
# List schemas of events that should be recorded.
allowed_schemas=[
'uri.to.event.schema'
]
)
```

Expand All @@ -49,14 +45,16 @@ Event schemas must be registered with the `EventLogger` for events to be recorde
"title": "My Event",
"description": "All events must have a name property.",
"type": "object",
"redactionPolicies": ["unrestricted"],
"properties": {
"name": {
"title": "Name",
"event_name": {
"title": "Event Name",
"description": "Name of event",
"type": "string"
"type": "string",
"redactionPolicies": ["unrestricted"]
}
},
"required": ["name"],
"required": ["event_name"],
"version": 1
}
```
Expand All @@ -79,7 +77,7 @@ Events are recorded using the `record_event` method. This method validates the e

```python
# Record an example event.
event = {'name': 'example event'}
event = {'event_name': 'example event'}
logger.record_event(
schema_id='url.to.event.schema',
version=1,
Expand Down
10 changes: 8 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,14 @@
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions: List = []
extensions: List = ["myst_parser", "jupyterlite_sphinx"]

# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]

source_suffix = [".rst", ".md"]


# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
Expand All @@ -45,10 +48,13 @@
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = "sphinx_rtd_theme"
html_theme = "pydata_sphinx_theme"

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]
master_doc = "index"

# Configure jupyterlite to import jupyter_events package
jupyterlite_contents = ["pages/demo-notebook.ipynb"]
2 changes: 2 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ Jupyter's Events library can be installed from PyPI.
pages/configure
pages/application
pages/schemas
pages/redaction_policies
pages/demo

Indices and tables
------------------
Expand Down
7 changes: 3 additions & 4 deletions docs/pages/application.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,22 +30,21 @@ EventLogger has two configurable traits:

- ``handlers``: a list of Python's logging handlers that
handle the recording of incoming events.
- ``allowed_schemas``: a dictionary of options for each schema
describing what data should be collected.
- ``redacted_policies``: a list of `redactionPolicies` that will be removed from all emitted events.

Next, you'll need to register event schemas for your application.
You can register schemas using the ``register_schema_file``
(JSON or YAML format) or ``register_schema`` methods.


Once your have an instance of ``EventLogger`` and your registered
schemas, you can use the ``record_event`` method to log that event.
schemas, you can use the ``emit`` method to log that event.

.. code-block:: python

# Record an example event.
event = {'name': 'example event'}
self.eventlogger.record_event(
self.eventlogger.emit(
schema_id='url.to.event.schema',
version=1,
event=event
Expand Down
33 changes: 33 additions & 0 deletions docs/pages/configure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
.. \_using-events:

# Using Jupyter Events in Jupyter applications

Most people will use `jupyter_events` to log events data from Jupyter
applications, (e.g. JupyterLab, Jupyter Server, JupyterHub, etc).

In this case, you'll be able to record events provided by schemas within
those applications. To start, you'll need to configure each
application's `EventLogger` object.

This usually means two things:

1. Define a set of `logging` handlers (from Python's standard library)
to tell Jupyter Events where to send your event data
(e.g. file, remote storage, etc.)
2. List redacted policies to remove sensitive data from any events.

Here is an example of a Jupyter configuration file, e.g. `jupyter_config.d`,
that demonstrates how to configure an eventlog.

```python
from logging import FileHandler

# Log events to a local file on disk.
handler = FileHandler('events.txt')

# Explicitly list the types of events
# to record and what properties or what categories
# of data to begin collecting.
c.EventLogger.handlers = [handler]
c.EventLogger.redacted_policies = ["user-identifiable-information", "user-identifier"]
```
42 changes: 0 additions & 42 deletions docs/pages/configure.rst

This file was deleted.

32 changes: 32 additions & 0 deletions docs/pages/demo-notebook.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import piplite\n",
"\n",
"piplite.install(\"jupyter_events\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from jupyter_events.logger import EventLogger\n"
]
}
],
"metadata": {
"language_info": {
"name": "python"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
8 changes: 8 additions & 0 deletions docs/pages/demo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Jupyter Events Demo

```{retrolite} demo-notebook.ipynb
---
width: 100%
height: 600px
---
```
5 changes: 5 additions & 0 deletions docs/pages/redaction_policies.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Redacting Sensitive Data

Jupyter Events might possible include sensitive data, specifically personally identifiable information (PII). To reduce
the risk of capturing unwanted PII, Jupyter Events requires _every_ registered event to explicitly list its
`redactionPolicies`. Data labeled with a redacted policed will be removed from an event by Jupyter Events **before** before being emitted. Schemas that list properties without an explicit `redactionPolicies` list will fail validation.
58 changes: 58 additions & 0 deletions docs/pages/schemas.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Writing a schema for Jupyter Events

Jupyter Event Schemas must be valid [JSON schema](https://json-schema.org/) and can be written in valid
YAML or JSON. Every schema is validated against Jupyter Event's "meta"-JSON schema, [here]().

At a minimum, valid Jupyter Event schema requires have the following keys:

- `$id` : a URI to identify (and possibly locate) the schema.
- `version` : the schema version.
- `redactionPolicies`: a list of labels representing the personal data sensitivity of this event. The main logger can be configured to redact any events or event properties that might contain sensitive information. Set this value to `"unrestricted"` if emitting that this event happen does not reveal any person data.
- `properties` : attributes of the event being emitted.

Each property should have the following attributes:

- `title` : name of the property
- `redactionPolicies`: a list of labels representing the personal data sensitivity of this property. This field will be redacted from the emitted event if the policy is not allowed.

- `required`: list of required properties.

Here is a minimal example of a valid JSON schema for an event.

```yaml
$id: event.jupyter.org/example-event
version: 1
title: My Event
description: |
All events must have a name property
type: object
redactionPolicy:
- category.jupyter.org/unrestricted
properties:
thing:
title: Thing
redactionPolicy:
- category.jupyter.org/unrestricted
description: A random thing.
user:
title: User name
redactionPolicies:
- category.jupyter.org/user-identifier
description: Name of user who initiated event
required:
- thing
- user
```

## Redaction Policies

Each property can be labelled with `redactionPolicies` field. This makes it easier to
filter properties based on a category. We recommend that schema authors use valid
URIs for these labels, e.g. something like `category.jupyter.org/unrestricted`.

Below is a list of common category labels that Jupyter Events recommends using:

- `category.jupyter.org/unrestricted`
- `category.jupyter.org/user-identifier`
- `category.jupyter.org/user-identifiable-information`
- `category.jupyter.org/action-timestamp`
64 changes: 0 additions & 64 deletions docs/pages/schemas.rst

This file was deleted.

3 changes: 2 additions & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
sphinx_rtd_theme
myst_parser
pydata_sphinx_theme
Loading