Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full ARCOS data is incomplete #10

Open
jr-free opened this issue Dec 19, 2020 · 11 comments
Open

Full ARCOS data is incomplete #10

jr-free opened this issue Dec 19, 2020 · 11 comments

Comments

@jr-free
Copy link

jr-free commented Dec 19, 2020

Not a direct issue with the R or Python APIs, but the full ARCOS dataset is incomplete. Both the links on the WaPo landing page and this repo only contain data for the dates 2006-2012. The API functions also, while documented, do not necessarily return what may be expected. It seems some of the county queries will only return TAB data between 2006-2012.

Using the web API, it is possible to pull county data by drug for the period 2006-2014. I have not been able to do this with either the R or Python API. It also seems the wrapper for the county drug query is broken.

@jeffcsauer
Copy link

@unoriginaluid thanks for posting and highlighting this. Could you please post some more information about the issue?

For example, what county returns 'full' data via the web API but not via the wrapper?

There are known issues with the data associated with some data being so large that they will not work with the wrapper.

@jr-free
Copy link
Author

jr-free commented Dec 20, 2020

There are a few counties that return complete data via the web API. As a note, we're focusing on Florida in our work, so I can only speak to FL counties. I was able to use pull the "full" data for the following counties using the web api (as a procedural point, I used the county_data_drug query on the web API to pull data for these):

'Clay', 'Duval', 'Baker', 'Saint Johns', 'Flagler', 'Putnam', 'Columbia', 'Bradford',
'Union', 'Lake', 'Seminole', 'Marion', 'Alachua', 'Gilchrist', 'Nassau'

Using the county_raw() wrapper on Clay and Duval, I was only able to get 2006-2012.

Re: the point of a broken function, drug_county_raw() doesn't work at all.

@jeffcsauer
Copy link

jeffcsauer commented Dec 20, 2020

@unoriginaluid this is the same issue raised here.

The 2013 and 2014 data was not part of the original 2006-2012 data dump, and so it is likely that the API has not been comprehensively updated to access this data quite yet. The issue is on the radar!

@jr-free
Copy link
Author

jr-free commented Dec 20, 2020

Thanks for following up on this, Jeff. I greatly appreciate the assistance.

@andrewbtran
Copy link
Collaborator

andrewbtran commented Dec 30, 2020

Alright, I've updated the API and R package so large files should no longer time out. Am currently running scripts to update the data that these functions are pulling from to replace on our server so we can have everything through 2014. Should take a week to run and swap out everything.

@jeffcsauer
Copy link

Amazing, thanks so much!

@MLSun-A
Copy link

MLSun-A commented Apr 2, 2021

Not a direct issue with the R or Python APIs, but the full ARCOS dataset is incomplete. Both the links on the WaPo landing page and this repo only contain data for the dates 2006-2012. The API functions also, while documented, do not necessarily return what may be expected. It seems some of the county queries will only return TAB data between 2006-2012.

Using the web API, it is possible to pull county data by drug for the period 2006-2014. I have not been able to do this with either the R or Python API. It also seems the wrapper for the county drug query is broken.

Hi, I am recently working with the full ARCOS dataset (downloaded from this link https://wpinvestigative.github.io/arcos/#download-the-raw-data) as well. However, from this data, I cannot observe the information of the year, and it only shows 42 columns. I was curious whether it is due to the results that I only open the first few thousand rows, or there is another raw dataset that provides all kinds of information such as year, county, drug name. Would you mind guiding me for the full dataset?

Thanks for your time and help!

@jeffcsauer
Copy link

jeffcsauer commented Apr 2, 2021

@MLSun-A

Date is inferred from the column TRANSACTION_DATE. For example, you could create year by:

# After loading your data or subset of data into a dataframe called temp:
temp$Year <- as.numeric(str_sub(temp$TRANSACTION_DATE,-4,-1))

Got your email - responding soon!

@MLSun-A
Copy link

MLSun-A commented Apr 2, 2021

@jeffcsauer Thanks for your quick response and helpful reply!
Appreciate your help.

@accessarcos
Copy link

accessarcos commented Oct 26, 2021

@andrewbtran Is it possible for you to post the file size of the FULL ARCOS data set? I would like to make sure that we are using the correct data set. I am having issues with verifying the size. Also, do you know if there are any updates in the courts that they will be releasing any more years soon? Or does a motion have to be filed for them to do so?
I know that many have moved on (between COVID, etc.), but this data is vital for so many studies that are being done. I work with jr-free and have spoken with jeff sauer. We're all academics. Thanks! ~ Mischa

@andrewbtran
Copy link
Collaborator

file has been updated to include 2013 and 2014 https://d2ty8gaf6rmowa.cloudfront.net/dea-pain-pill-database/bulk/arcos_all.tsv.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants