Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle category-filtered fields #61

Open
kiendang opened this issue May 3, 2021 · 3 comments
Open

Handle category-filtered fields #61

kiendang opened this issue May 3, 2021 · 3 comments
Milestone

Comments

@kiendang
Copy link
Member

kiendang commented May 3, 2021

This was originally mentioned in #59 and briefly discussed during jupyter-server meeting jupyter-server/team-compass#4 (comment).

In summary, there are 2 decisions to make:

  1. How exactly do we "filter" the fields? 2 options being discussed are
  • Set the property to null, equivalent to removing the data and keep the key
  • Delete the property, i.e. removing both the key and the data
  1. Do we inform users of the fields being filtered?

How to filter properties

Currently fields that are filtered out due to not having all of their categories allowed are set to null instead of being deleted. The rationale was to preserve the "shape" of the data.

During the meeting, it was proposed by some to just delete the fields. The "shape" of the data is not totally preserved in the case of nested fields where the parent properties are filtered. For example

user:
  email: [email protected]
  name: someone

if the parent property user is to be filtered, the whole property would become

user: null

where the user.email and user.name keys are not preserved.

Listing filtered fields

It was suggested in #59 that we have another metadata field in the event to list out all the properties being filtered, e.g.

"__masked__": [
  ["user", "email"],
  ["user", "id"]
]

This would be useful if we decide to set the filtered properties to null. This helps differentiate whether a property is null because its value is actually null or because it is hidden due to categories.

This would alse be useful if we go with the second approach of deleting the field altogether too. Since there are json schemas that allow additional properties. This would inform whether the missing properties are not there due to being filtered by categories or not included in the original data.

@Zsailer
Copy link
Member

Zsailer commented Jun 1, 2021

  1. How exactly do we "filter" the fields? 2 options being discussed are
  • Set the property to null, equivalent to removing the data and keep the key
  • Delete the property, i.e. removing both the key and the data
  1. Do we inform users of the fields being filtered?
  1. I don’t think it matters either way, because as you mentioned, preserving the data shape isn’t really a valid reason to keep the fields around. My preference, then, would be to drop the fields and add them to a masked field.

  2. Adding a masked field seems like the right way to go here, no matter what we decide in (1). We should be explicit about what fields were masked, even in the case where we drop fields. This makes it easier for consumers of the emitted data to know exactly what happened to the data (and build logic around that).

This brings up another question. What about non-required properties? Do we need a missing field in the event capsule in the spirit of being explicit? Right now, it’s implied that missing, optional properties were not supplied by the event source, rather than being dropped by the filter. This can be inferred+verified by the checking the masked field, of course, so missing would be redundant but explicit.

@kiendang
Copy link
Member Author

kiendang commented Jun 2, 2021

  1. I don’t think it matters either way, because as you mentioned, preserving the data shape isn’t really a valid reason to keep the fields around. My preference, then, would be to drop the fields and add them to a masked field.

  2. Adding a masked field seems like the right way to go here, no matter what we decide in (1). We should be explicit about what fields were masked, even in the case where we drop fields. This makes it easier for consumers of the emitted data to know exactly what happened to the data (and build logic around that).

This is what I think too.

This brings up another question. What about non-required properties? Do we need a missing field in the event capsule in the spirit of being explicit? Right now, it’s implied that missing, optional properties were not supplied by the event source, rather than being dropped by the filter. This can be inferred+verified by the checking the masked field, of course, so missing would be redundant but explicit.

I think it's better to do this post-hoc so we don't add more computation when recording events. For now we can provide a code example in the doc on how to extract missing, not filtered fields from the emitted events.

I would prefer the recording event logic to remain lean and only include things we have to do and can only do during event recording. To me this falls under operations/processing you can do on the emitted events after receiving them. There might be dedicated tooling for this in the future, by us and third parties, that arises from usage of telemetry.

@Zsailer
Copy link
Member

Zsailer commented Jun 3, 2021

Sounds good. We can always offer tooling (i.e. some helpful functions) inside jupyter/telemetry (in a later PR) to help process emitted events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants