Skip to content
Joel Natividad edited this page Oct 8, 2021 · 18 revisions

Welcome to the qsv wiki!

Cookbook

Please feel free to add qsv recipes to the Cookbook!

using qsv, ckanapi, jq and xargs.

  • get a CSV of datasets/users/groups/orgs in a CKAN instance
$ ckanapi -r https://demo.ckan.org dump datasets --all | qsv jsonl > datasets.csv
$ ckanapi -r https://demo.ckan.org dump users --all | qsv jsonl > users.csv
$ ckanapi -r https://demo.ckan.org dump groups --all | qsv jsonl > groups.csv
$ ckanapi -r https://demo.ckan.org dump organizations --all | qsv jsonl > organizations.csv
  • get a CSV of resources for a given dataset
$ ckanapi -r https://catalog.data.gov action package_show \
id=low-altitude-aerial-imagery-obtained-with-unmanned-aerial-systems-uas-flights-over-black-beach \
| jq -c '.resources[]' \
| qsv jsonl \
> resources.csv
$ ckanapi -r https://data.cnra.ca.gov action package_show id="wellstar-oil-and-gas-wells1" > wellstar-oil-and-gas-wells.json
$ cat wellstar-oil-and-gas-wells.json \
| jq -c '.resources[] | select(.name=="CSV") | .url' \
| xargs -L 1 wget -O wellstar.csv
$ qsv stats --everything wellstar.csv > wellstar-stats.csv

Date Normalization/Enrichment

$ qsv lua map Quarter -x -f getquarter.lua nyc311samp.csv > result-qtr.csv

Note: Change the column name on line 7 of getquarter.lua to adapt it for your use.
Also, you need to have the date.lua module in the same directory.

  • Partition 311 files by Quarter. Create the files in the nyc311byqtr directory
$ qsv partition Quarter nyc311byqtr result-qtr.csv 
  • convert "Created Date" to ISO-8601 format
$ qsv apply datefmt "Created Date" nyc311samp.csv > result-iso8601.csv
  • format "Created Date" using the format string ""%a %b %e %T %Y %z"
$ qsv apply datefmt "Created Date" --formatstr ""%a %b %e %T %Y %z" nycs11samp.csv > result-datefmt.csv
  • create a "Created Year" column from "Created Date"
$ qsv apply datefmt "Created Date" --formatstr "%Y" --new-column "Created Year" nyc311samp.csv > result-year.csv
$ qsv lua map TAT -x -f turnaroundtime.lua nyc311samp.csv > result-tat.csv
  • What is the average turnaround time in Brooklyn for 311 calls?
$ qsv search --select City --ignore-case brooklyn nyc311samp.csv \
| qsv lua map TAT -x -f turnaroundtime.lua \
| tee brooklyn-311-details.csv \
| qsv stats --everything > result-brooklyn311-stats.csv

NOTE: The tee command reads from stdin and writes to both stdout and one or more files at the same time. We do this so we can create the brooklyn-311-details.csv file and pipe the same data to the qsv stats command.

Geocoding

  • using the "Location" column, geocode the nearest city in a new column named "City"
$ qsv apply geocode Location --new-column City nyc311samp.csv > result-geocoded.csv

Note: The bundled static geocoder uses the reverse-geocoder crate. It uses the geonames cities database, and geocodes to the closest city.

  • geocode the county in a new column
$ qsv apply geocode Location --new-column County --formatstr county nyc311samp.csv > result-county.csv
Clone this wiki locally