You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An unintended feature of our method for reading data arrays from CSVs is that multiple data variables can be stored in extra columns in a single file.
E.g. population and GVA might share a region dimension and be defined over the same timesteps, so a CSV with timestep,region,pop,gva as a header could be read to load a pop data array or a gva data array.
The smif prepare-convert command reads all data arrays associated with a model run and writes them to parquet, one by one. When a CSV file contains more than one data array, the corresponding parquet file will be written twice or more, and will only contain the last data array to be read and re-written.
Approaches:
maintain the unintended feature, allow in parquet too - convert would need to be aware of all files with multiple data arrays, and to do some recombination before writing
avoid the unintended feature - would need to clean all data in any smif user's projects to separate out datasets
smif csv2parquet is a simpler and less flexible workaround (see f951de5) that sets up a useable binary data store from csv. Sticking with this for now
The text was updated successfully, but these errors were encountered:
An unintended feature of our method for reading data arrays from CSVs is that multiple data variables can be stored in extra columns in a single file.
E.g. population and GVA might share a
region
dimension and be defined over the sametimesteps
, so a CSV withtimestep,region,pop,gva
as a header could be read to load apop
data array or agva
data array.The
smif prepare-convert
command reads all data arrays associated with a model run and writes them to parquet, one by one. When a CSV file contains more than one data array, the corresponding parquet file will be written twice or more, and will only contain the last data array to be read and re-written.Approaches:
smif csv2parquet
is a simpler and less flexible workaround (see f951de5) that sets up a useable binary data store from csv. Sticking with this for nowThe text was updated successfully, but these errors were encountered: