Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update with_sensitive to remove PII #81

Open
ibevers opened this issue Oct 3, 2024 · 3 comments
Open

Update with_sensitive to remove PII #81

ibevers opened this issue Oct 3, 2024 · 3 comments

Comments

@ibevers
Copy link
Contributor

ibevers commented Oct 3, 2024

Should have options to remove a user-specified list of files or files of a certain type (e.g. all free speech files).

@ibevers
Copy link
Contributor Author

ibevers commented Oct 9, 2024

Design approaches:

  1. Exclude files during conversion process.
  • memory-efficient
  • disk-efficient
  • requires later reconversion if someone changes their mind
  1. Create second Bids tree without PII.
  • disk inefficient
  1. Exclude in data bundling.
  • includes files in non-bundled data
  • memory efficient
  • somewhat disk inefficient

@ibevers
Copy link
Contributor Author

ibevers commented Oct 9, 2024

Going to go with 3.

@ibevers
Copy link
Contributor Author

ibevers commented Oct 9, 2024

Interface options:

1.with_sensitive parameter

  • Requires separate parameter for the specific files or classes of files to remove.
  • Simple.
  1. Path to file with list of UUIDs or files to exclude:
  • Annoying if you want to exclude a class of files.
  1. Path to file with list of patterns to exclude:
  • Class based and individual file based control.
  • Might exclude more than intended.

3 requires a bit of carefulness, but it offers the best balance of simplicity and customization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant