Skip to content

Conversion process phase: csv ify

timrdf edited this page Feb 19, 2011 · 41 revisions

[up](Conversion process phases)

What's first?

source/ vs. manual/

As discussed in Conversion process phase: retrieve, a conversion cockpit's source/ directory holds an unmodified copy of the data that you received from the source (e.g. rpi-edu-lebot, whitehouse-gov). If you used pcurl.sh, then you also captured the provenance justifying the file on disk by citing the authoritative URL from which it came. You SHOULD NOT modify any of the files you get from your source. If you need to, make a modified copy in manual/.

How do I csv-ify?

A simple example of this is to convert a tab or pipe delimited file into csv:

bash-3.2$ cat source/some.tsv | sed -e 's/^/"/' -e 's/|/","/g' -e 's/$/"/' > manual/some.tsv.csv

Here, we use a convention to append the new file extension (.csv) to the entire file name of the original file (some.tsv). This helps one to trace the lineage using only file names and without any overhead of metadata. However, the metadata is still useful, so we can generate it using:

bash-3.2$ justify.sh source/some.tsv manual/some.tsv.csv redelimit

This creates manual/some.tsv.csv.pml.ttl and records that manual/some.tsv.csv came from source/some.tsv using a method known as redelimit.

bash-3-2$ cd ~/Desktop/source/rpi-edu-lebot/exercise-jogging-statistics/version/2011-Jan-24

What's next?

Clone this wiki locally