Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create scaper for pulling new rows out of data.dc.gov #200

Open
mkalish opened this issue Feb 22, 2017 · 12 comments
Open

Create scaper for pulling new rows out of data.dc.gov #200

mkalish opened this issue Feb 22, 2017 · 12 comments

Comments

@mkalish
Copy link
Member

mkalish commented Feb 22, 2017

The dataset is too big to be proxied and will not be queryable. Currently, the data has been pushed through 1/1/2017 but a scaper needs to be written to do this regularly.

Tasks:

  • Write a scaper that will pull data off the dc data site limited to the last date recorded in data.codefordc.org
  • Push that data to the the data portal

Upload idead using existing tools from esri to geojson to csv to data portal
https://www.npmjs.com/package/esri-dump
https://www.npmjs.com/package/json2csv
https://www.npmjs.com/package/ckan

Get the last imported data in data portal
Check if esri data exist beyond the last imported data
If data exist, attempt to get a dump from that start point to the end
Since the data in ckan is current csv, convert the data
Upload data to data portal

This script could then run to sync data portal info with esri.

@romoy
Copy link
Collaborator

romoy commented Mar 6, 2017

This seems to be a pre requisite, so researching this one instead.

@romoy
Copy link
Collaborator

romoy commented Mar 7, 2017

Added ocf-expenditures.geojson to http://data.codefordc.org/dataset/dc-campaign-expenditures-ocf

@romoy
Copy link
Collaborator

romoy commented Mar 7, 2017

Attempt an add of ocf-contributions but failed with 413 response; will try to split the file and upload.

@mkalish
Copy link
Member Author

mkalish commented Mar 7, 2017

Looking good. I would take a look at the datastore API that can more gracefully handle pushing a lot of rows

@romoy
Copy link
Collaborator

romoy commented Apr 28, 2017

@romoy
Copy link
Collaborator

romoy commented May 1, 2017

Upload idea using existing tools from esri to geojson to csv to data portal
https://www.npmjs.com/package/esri-dump
https://www.npmjs.com/package/json2csv
https://www.npmjs.com/package/ckan

@romoy
Copy link
Collaborator

romoy commented May 4, 2017

@mkalish
Copy link
Member Author

mkalish commented May 4, 2017

Does that have an API?

@romoy
Copy link
Collaborator

romoy commented May 23, 2017

@romoy
Copy link
Collaborator

romoy commented May 23, 2017

@romoy
Copy link
Collaborator

romoy commented May 23, 2017

@mkalish mkalish self-assigned this Jan 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants