Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extended reporting #11

Open
8 of 12 tasks
ivbeg opened this issue Aug 5, 2022 · 0 comments
Open
8 of 12 tasks

Add extended reporting #11

ivbeg opened this issue Aug 5, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@ivbeg
Copy link
Collaborator

ivbeg commented Aug 5, 2022

Right now report include only: field name, data type, tags, semantic type id and registry URL.
Sometimes additional information required and it's collected during matching process.

Consider to add to report following data (already collected):

  • number of unique values
  • share of unique values
  • minimal length
  • max length
  • average length
  • minimal value
  • maximum value

Consider to add and to collect following info:

  • has alphas
  • has digits
  • has special chars

If possible, add following:

  • reconstucted regexp - regular expression reconstucted from data sample
  • named entities - named entities extracted by one of named entities detection tools like Microsoft Presidio or Slovnet or others
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

1 participant