Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide on supported input formats #1

Open
daniel-thom opened this issue Sep 13, 2024 · 0 comments
Open

Decide on supported input formats #1

daniel-thom opened this issue Sep 13, 2024 · 0 comments

Comments

@daniel-thom
Copy link
Collaborator

daniel-thom commented Sep 13, 2024

Database vs User Schemas

We plan to always store data in the database with the same format.

  • One or more timestamp columns, usually date time
  • One or more columns that designate unique time arrays. This can be an integer ID or set of dimension columns.
  • One value column. The concept of a pivoted table is not supported internally, but can be passed as input data.

Database format

timestamp id value
2020-01-01 01:00:00 1 1.0
2020-01-01 02:00:00 1 2.0
2020-01-01 03:00:00 1 3.0
2020-01-01 01:00:00 2 11.0
2020-01-01 02:00:00 2 12.0
2020-01-01 03:00:00 2 13.0

Input Formats

  1. Create a view from data stored in files matching the format above. No copying required.
  2. Create a table from data stored in files matching the format above.
  3. Array of floats, optional timestamps and IDs. Start time and frequency are required without timestamps. Add rows to existing table or create a new one. Auto-generate IDs and timestamps as necessary.
  4. Create a table from files with data stored with a pivoted dimension. Unpivot the data for final storage. Timestamps are optional. If they exist in Parquet files must have timestamp type. If they existing in CSV files, str-format is required unless it matches the default ISO format.
timestamp device1 device2
2020-01-01 01:00:00 1.0 11.0
2020-01-01 02:00:00 1.0 12.0
2020-01-01 03:00:00 3.0 13.0

That would be converted into

timestamp device_name value
2020-01-01 01:00:00 device1 1.0
2020-01-01 02:00:00 device1 2.0
2020-01-01 03:00:00 device1 3.0
2020-01-01 01:00:00 device2 11.0
2020-01-01 02:00:00 device2 12.0
2020-01-01 03:00:00 device2 13.0

Required file formats

  • Parquet
  • Arrow
  • CSV

TBD

  • Units
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant