-
Notifications
You must be signed in to change notification settings - Fork 36
Conversion process phase: csv ify
[up](Conversion process phases)
(this should be renamed to "Conversion process phase: tweak source data")
NOTE changing cell delimiters can be done with enhancement parameters and should NOT be done manually. See Conversion process phase: create conversion trigger.
The conversion cockpit's manual/ directory should be used to store the results of any manual tweaks of original source/ data. If no tweaks need to be made, the data can be converted directly from source/.
As discussed in Conversion process phase: retrieve, a conversion cockpit's source/ directory holds an unmodified copy of the data that you received from the source (e.g. rpi-edu-lebot, whitehouse-gov). If you used pcurl.sh, then you also captured the provenance justifying the file on disk by citing the authoritative URL from which it came. You SHOULD NOT modify any of the files you get from your source. If you need to, make a modified copy in manual/.
A simple example of this is to convert a tab or pipe delimited file into csv:
bash-3.2$ cat source/some.tsv | sed -e 's/^/"/' -e 's/|/","/g' -e 's/$/"/' > manual/some.tsv.csv
Note that we are making a new file in manual/ that parallels the original file in source/. When naming the new file,
we use a convention to append the new file extension (.csv) to the entire file name of the original file (some.tsv). This helps one to trace the lineage using only file names and without any overhead of metadata. However, the metadata is still useful, so we can generate it using:
bash-3.2$ justify.sh source/some.tsv manual/some.tsv.csv redelimit
This creates manual/some.tsv.csv.pml.ttl and records that manual/some.tsv.csv came from source/some.tsv using a method known as redelimit.
If the file that you obtained is already in the CSV format, then you do NOT need to duplicate the file into manual/. You can create the conversion trigger directly from the files in source/. Check out the next phase Conversion process phase: create conversion trigger for more about that.
- Conversion process phase: create conversion trigger
- Conversion process phase: pull conversion trigger
- ... (rinse and repeat; flavor to taste) ...
- Conversion process phase: tweak enhancement parameters
- Conversion process phase: pull conversion trigger
- Conversion process phase: publish