Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add analysis of schema structure decomposition of field keys and subtypes #12

Open
ivbeg opened this issue Aug 6, 2022 · 0 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@ivbeg
Copy link
Collaborator

ivbeg commented Aug 6, 2022

Flat table datasets (CSV) files, database tables, and sometimes objects with nested objects ofter include elements that could be grouped.

For example CSV file Zaara_D.csv
includes following fields: title, text, date, place, placeURL, placeLocation, placeType, reviewScore, avgScore

We could find that prefix 'place' is a subtype identifier. It could be decomposed as
place:

  • Name
  • Location
  • URL
  • Type

And postfix Score identifies value type, whether integer or float.

Most data tables use case change or "_" symbol as dividers. Very rarely is the '-' symbol also used.

Detection of field groups and decomposition of field names could help with:

  • additional rules to detect semantic data types
  • automatic context identification

Add group detection to the final report as field_group property.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

1 participant