-
Notifications
You must be signed in to change notification settings - Fork 74
Home
Joel Natividad edited this page Oct 8, 2021
·
18 revisions
Welcome to the qsv wiki!
Please feel free to add qsv recipes to the Cookbook!
using qsv, ckanapi, jq and xargs.
- get a CSV of datasets/users/groups/orgs in a CKAN instance
$ ckanapi -r https://demo.ckan.org dump datasets --all | qsv jsonl > datasets.csv
$ ckanapi -r https://demo.ckan.org dump users --all | qsv jsonl > users.csv
$ ckanapi -r https://demo.ckan.org dump groups --all | qsv jsonl > groups.csv
$ ckanapi -r https://demo.ckan.org dump organizations --all | qsv jsonl > organizations.csv
- get a CSV of resources for a given dataset
$ ckanapi -r https://catalog.data.gov action package_show \
id=low-altitude-aerial-imagery-obtained-with-unmanned-aerial-systems-uas-flights-over-black-beach \
| jq -c '.resources[]' \
| qsv jsonl \
> resources.csv
- get the latest version of a CSV resource for
wellstar-oil-and-gas-wells1
on https://data.cnra.ca.gov, run stats on it
$ ckanapi -r https://data.cnra.ca.gov action package_show id="wellstar-oil-and-gas-wells1" > wellstar-oil-and-gas-wells.json
$ cat wellstar-oil-and-gas-wells.json \
| jq -c '.resources[] | select(.name=="CSV") | .url' \
| xargs -L 1 wget -O wellstar.csv
$ qsv stats --everything wellstar.csv > wellstar-stats.csv
- add a column with Quarter information calculated from "Created Date"
Ingredients:- nyc311samp.csv
- getquarter.lua
-
date.lua
module - can be downloaded fromsrc
at https://github.com/Tieske/date.
$ qsv lua map Quarter -x -f getquarter.lua nyc311samp.csv > result-qtr.csv
Note: Change the column name on line 7 of
getquarter.lua
to adapt it for your use.
Also, you need to have thedate.lua
module in the same directory.
- Partition 311 files by Quarter. Create the files in the
nyc311byqtr
directory
$ qsv partition Quarter nyc311byqtr result-qtr.csv
- convert "Created Date" to ISO-8601 format
$ qsv apply datefmt "Created Date" nyc311samp.csv > result-iso8601.csv
- format "Created Date" using the format string ""%a %b %e %T %Y %z"
$ qsv apply datefmt "Created Date" --formatstr ""%a %b %e %T %Y %z" nycs11samp.csv > result-datefmt.csv
- create a "Created Year" column from "Created Date"
$ qsv apply datefmt "Created Date" --formatstr "%Y" --new-column "Created Year" nyc311samp.csv > result-year.csv
- compute Turnaround Time with a new column named "TAT"
Ingredient:
$ qsv lua map TAT -x -f turnaroundtime.lua nyc311samp.csv > result-tat.csv
- What is the average turnaround time in Brooklyn for 311 calls?
$ qsv search --select City --ignore-case brooklyn nyc311samp.csv \
| qsv lua map TAT -x -f turnaroundtime.lua \
| tee brooklyn-311-details.csv \
| qsv stats --everything > result-brooklyn311-stats.csv
NOTE: The
tee
command reads from stdin and writes to both stdout and one or more files at the same time. We do this so we can create thebrooklyn-311-details.csv
file and pipe the same data to theqsv stats
command.
- using the "Location" column, geocode the nearest city in a new column named "City"
$ qsv apply geocode Location --new-column City nyc311samp.csv > result-geocoded.csv
Note: The bundled static geocoder uses the
reverse-geocoder
crate. It uses the geonames cities database, and geocodes to the closest city.
- geocode the county in a new column
$ qsv apply geocode Location --new-column County --formatstr county nyc311samp.csv > result-county.csv