-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correctly parse issue field for pub_covid_hosp_state_timeseries
#220
Conversation
covidcast_hosp_state_timeseries
pub_covid_hosp_state_timeseries
) { | ||
cli::cli_warn( | ||
c( | ||
"Not all return columns are specified as expected epidata fields", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding missing specs for public endpoints in another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait, so #223 specifies the headers of every endpoint? That seems likely to require quite a bit of attention to keep up to date. I don't quite get the motivation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most endpoint fields were already fully specified. #223 adds some that weren't included. @lcbrooks or @dshemetov any context on why we specify headers?
FWIW headers shouldn't change often/fast. Most endpoints aren't adding new fields. Many aren't being updated anymore and at least covidcast uses the same header for every signal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just worries me as yet another place where we're duplicating state. I hadn't caught that it was already pretty thoroughly covered though; in for a penny, in for a pound I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's something we inherited from Sam's original development effort. I am guessing he ran into issues where he couldn't rely on automatic type inference from the JSON output and went in for hard coding all data types.
It's definitely a maintenance burden. We could try parsing without it and see where the issues come up, if they still do. Another possibility is generating these type annotations directly from the SQL schemas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that duplicating state is undesirable and adds maintenance load, but relying on auto-classing isn't robust (and there will inevitably be some fields that aren't parsed correctly that we will have to manually specify -- better to specify everything in that case).
generating these type annotations directly from the SQL schemas
That would be ideal. What approach are you imagining @dshemetov ? Would this be parsing the server code or can we query the DB directly for schemas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, couple small suggestions to tighten up the type expectations in create_epidata_call
) { | ||
cli::cli_warn( | ||
c( | ||
"Not all return columns are specified as expected epidata fields", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait, so #223 specifies the headers of every endpoint? That seems likely to require quite a bit of attention to keep up to date. I don't quite get the motivation.
Closes #202
The main issue here was that the spec for
pub_covid_hosp_state_timeseries
listedissue
twice. The first time theissue
field was parsed, it was correctly converted from int (20230101) to date (2023-01-01). The second time the value was visited, the format was wrong (dashes had been added, YYYY-MM-DD) and it couldn't be parsed, since we expect a YYYYMMDD format.The date-parsing code now skips the parsing step if the input value is already a date.
I've added a couple general warnings and errors to prevent other parsing errors from happening. They are aimed at developers and don't depend on user input. Correctly specifying epidata field info in the code will suppress them.