Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] External masks library #240

Open
adrienaury opened this issue Jun 19, 2023 · 4 comments
Open

[PROPOSAL] External masks library #240

adrienaury opened this issue Jun 19, 2023 · 4 comments

Comments

@adrienaury
Copy link
Member

adrienaury commented Jun 19, 2023

Definitions

A masking definition contains the following parts :

  • the generator : describe the process to generate a new value
  • the coherence context : describe the level of coherence expected for the new value (consistency with other current values or previous values)
  • the location : where the value will be written in the json data

The generator is usually defined by the mask part of the masking.yml, except for "hash" and "hashInUri" masks which contains a coherence element.

The coherence is usually defined by some properties added to the mask : seed, cache or the hash part in "hash" and "hashInUri" masks.

The location is defined by the selector part.

What we need to store in a masking library, is only the generator part. When applied in a given context, we can choose where we apply it (selector) and how we handle consistency (cache, seed, hash + what source field is used).

Note: we can allow coherence information in some dedicated masks.
Note: we can allow selector information in case of multiple fields output.

Examples

This generator :

- randomChoiceInUri: "pimo://nameFR"

Can be used in differnt contexts :

# synthesize new data :
- selector:
    jsonpath: "name1"
  masks:
    - add: ""
    - randomChoiceInUri: "pimo://nameFR"

# synthesize new data consistently with another field:
- selector:
    jsonpath: "name2"
  masks:
    - add: ""
    - randomChoiceInUri: "pimo://nameFR"
  seed:
    field: "id"

# pseudonymize consistently with another field:
- selector:
    jsonpath: "name3"
  mask:
    randomChoiceInUri: "pimo://nameFR"
  seed:
    field: "id"

...

How to define a mask library

The library should expose a variety of data types

  • how to generate a french familly name (locale fr_FR)
  • how to generate a french siret
  • how to generate a birth date
  • etc ...

This can be done by storing a single file for each data type, that contains the list of masks to apply.

filename : person_name_fr_FR.yml

version: "1":
masking:
- selector:
    jsonpath: "."
  mask:
    randomChoiceInUri: "pimo://nameFR"

It's similar to a normal masking. Except for the "." jsonpath that allow to write on the current location in the json stream (where the mask is applied).

Some generators can take parameters

filename : nir.yml

masking:
  - selector:
      jsonpath: "gender"         #if present then gender is used a parameter 
    masks:
      - add: true                       #add parameter if not present 
      - randomChoice: [1, 2]
    preserve: "value"               #preserve parameter value if present 
# other parameters ...
  - selector:
      jsonpath: "nir"
    masks:
      - add: true  #in this example, the result will be created in a new subfield
      - template: '{{if eq .gender "M" }}1{{else}}2{{end}}{{.birth_date | substr 8 10}}{{.birth_date | substr 3 5}}{{.department_code | printf "%02d"}}{{.city_code | printf "%03d"}}{{.order | printf "%03d"}}'
      - template: '{{ sub 97 (mod (int64 .nir_start)  97)}}'

How to use masks library

The library can be a folder, a git repository, a website, ...

A new property need to be created to load the library, in the masking.yml

version: "1"
librairies:
- "http://domain.org/mylibrary"
- "pimo://internal-library"
- "https+git://github.com/repo/[email protected]"
- "file://mylocalibrary"

Then a mask from library can be used via a new type of mask

- selector:
    jsonpath: "nir"
  mask:
    generate:
      using: "nir" # name of the yaml file in the library

Passing parameters : option 1

- selector:
    jsonpath: "nir"
  mask:
    generate:
      using: "nir" # name of the yaml file in the library
      with:
        gender: "M"

or, if we want to use an existing field as parameter

- selector:
    jsonpath: "nir"
  mask:
    generate:
      using: "nir"
      with:
        gender: { from: "gender" }

Passing parameters : option 2

# precreate a param with a value
- selector:
    jsonpath: "gender"
  mask:
    constant: "M"
# call mask on the current document (selector: ".")
- selector:
    jsonpath: "."
  mask:
    generate:
      using: "nir" # name of the yaml file in the library
@youen
Copy link
Member

youen commented Jun 20, 2023

Some suggestions:

In this context, "generator" is a list, so I suggest using the plural form:

generators:
-
- 

Or this the generator (singular) that is defined with a list of masks ?

Make the git support explicit in the URL scheme:

version: "1"
load:
- "http://domain.org/mylibrary"
- "https+git://github.com/repo/[email protected]"
- "file://mylocalibrary"

@youen
Copy link
Member

youen commented Jun 20, 2023

To embed the generator in a binary and expose it using the "pimo://" scheme, consider the following example:

version: "1"
load:
- "http://domain.org/mylibrary"
- "https+git://github.com/repo/[email protected]"
- "file://mylocalibrary"
- "pimo://embedded_generator"

This way, you can include the generator within the pimo binary and access it using the "pimo://" scheme.

@adrienaury
Copy link
Member Author

Or this the generator (singular) that is defined with a list of masks ?

Yes, the generator is defined by the whole list

@adrienaury
Copy link
Member Author

adrienaury commented Jun 20, 2023

Note: first post updated

A generator could also be defined like this

filename : nir.yml

masking:
  - selector:
      jsonpath: "gender"         #if present then gender is used a parameter 
    masks:
      - add: true                       #add parameter if not present 
      - randomChoice: [1, 2]
    preserve: "value"               #preserve parameter value if present 
# other parameters ...
  - selector:
      jsonpath: "nir"
    masks:
      - add: true  #in this example, the result will be created in a new subfield
      - template: '{{if eq .gender "M" }}1{{else}}2{{end}}{{.birth_date | substr 8 10}}{{.birth_date | substr 3 5}}{{.department_code | printf "%02d"}}{{.city_code | printf "%03d"}}{{.order | printf "%03d"}}'
      - template: '{{ sub 97 (mod (int64 .nir_start)  97)}}'

This is a normal masking definition except for the preserve "value" option that does not exist yet.

The call to the generator :

- selector:
    jsonpath: "nir"
  mask:
    generate:
      using: "nir"
      with:
        gender:  # this field is of type MaskType
          - constant: 2

MaskType :

type MaskType struct {

This way, generator can use other generators, for example

person.yml

version: "1"
masking:
  - selector:
      jsonpath: "first_name"
    mask:
      - add: true
      - generate:
          using: "first_name_fr_FR"
  - selector:
      jsonpath: "last_name"
    mask:
      - add: true
      - generate:
          using: "last_name_fr_FR"
  - selector:
      jsonpath: "." # generate in the current document
    mask:
      - add: true
      - generate:
          using: "nir"

person-with-coherence.yml

version: "1"
masking:
  - selector:
      jsonpath: "."
    mask:
      - add: true
      - generate:
          using: "person"
    seed: "."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants