-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full ARCOS data is incomplete #10
Comments
@unoriginaluid thanks for posting and highlighting this. Could you please post some more information about the issue? For example, what county returns 'full' data via the web API but not via the wrapper? There are known issues with the data associated with some data being so large that they will not work with the wrapper. |
There are a few counties that return complete data via the web API. As a note, we're focusing on Florida in our work, so I can only speak to FL counties. I was able to use pull the "full" data for the following counties using the web api (as a procedural point, I used the county_data_drug query on the web API to pull data for these):
Using the county_raw() wrapper on Clay and Duval, I was only able to get 2006-2012. Re: the point of a broken function, drug_county_raw() doesn't work at all. |
@unoriginaluid this is the same issue raised here. The 2013 and 2014 data was not part of the original 2006-2012 data dump, and so it is likely that the API has not been comprehensively updated to access this data quite yet. The issue is on the radar! |
Thanks for following up on this, Jeff. I greatly appreciate the assistance. |
Alright, I've updated the API and R package so large files should no longer time out. Am currently running scripts to update the data that these functions are pulling from to replace on our server so we can have everything through 2014. Should take a week to run and swap out everything. |
Amazing, thanks so much! |
Hi, I am recently working with the full ARCOS dataset (downloaded from this link https://wpinvestigative.github.io/arcos/#download-the-raw-data) as well. However, from this data, I cannot observe the information of the year, and it only shows 42 columns. I was curious whether it is due to the results that I only open the first few thousand rows, or there is another raw dataset that provides all kinds of information such as year, county, drug name. Would you mind guiding me for the full dataset? Thanks for your time and help! |
Date is inferred from the column
Got your email - responding soon! |
@jeffcsauer Thanks for your quick response and helpful reply! |
@andrewbtran Is it possible for you to post the file size of the FULL ARCOS data set? I would like to make sure that we are using the correct data set. I am having issues with verifying the size. Also, do you know if there are any updates in the courts that they will be releasing any more years soon? Or does a motion have to be filed for them to do so? |
file has been updated to include 2013 and 2014 https://d2ty8gaf6rmowa.cloudfront.net/dea-pain-pill-database/bulk/arcos_all.tsv.gz |
Not a direct issue with the R or Python APIs, but the full ARCOS dataset is incomplete. Both the links on the WaPo landing page and this repo only contain data for the dates 2006-2012. The API functions also, while documented, do not necessarily return what may be expected. It seems some of the county queries will only return TAB data between 2006-2012.
Using the web API, it is possible to pull county data by drug for the period 2006-2014. I have not been able to do this with either the R or Python API. It also seems the wrapper for the county drug query is broken.
The text was updated successfully, but these errors were encountered: