Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly parse issue field for pub_covid_hosp_state_timeseries #220

Merged
merged 10 commits into from
Nov 30, 2023

Conversation

nmdefries
Copy link
Contributor

@nmdefries nmdefries commented Nov 27, 2023

Closes #202

The main issue here was that the spec for pub_covid_hosp_state_timeseries listed issue twice. The first time the issue field was parsed, it was correctly converted from int (20230101) to date (2023-01-01). The second time the value was visited, the format was wrong (dashes had been added, YYYY-MM-DD) and it couldn't be parsed, since we expect a YYYYMMDD format.

The date-parsing code now skips the parsing step if the input value is already a date.

I've added a couple general warnings and errors to prevent other parsing errors from happening. They are aimed at developers and don't depend on user input. Correctly specifying epidata field info in the code will suppress them.

Base automatically changed from ndefries/hosp-state-take_as_of to dev November 28, 2023 15:01
@nmdefries nmdefries changed the title Correctly parse issue field for covidcast_hosp_state_timeseries Correctly parse issue field for pub_covid_hosp_state_timeseries Nov 28, 2023
) {
cli::cli_warn(
c(
"Not all return columns are specified as expected epidata fields",
Copy link
Contributor Author

@nmdefries nmdefries Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding missing specs for public endpoints in another PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, so #223 specifies the headers of every endpoint? That seems likely to require quite a bit of attention to keep up to date. I don't quite get the motivation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most endpoint fields were already fully specified. #223 adds some that weren't included. @lcbrooks or @dshemetov any context on why we specify headers?

FWIW headers shouldn't change often/fast. Most endpoints aren't adding new fields. Many aren't being updated anymore and at least covidcast uses the same header for every signal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just worries me as yet another place where we're duplicating state. I hadn't caught that it was already pretty thoroughly covered though; in for a penny, in for a pound I guess

Copy link
Contributor

@dshemetov dshemetov Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's something we inherited from Sam's original development effort. I am guessing he ran into issues where he couldn't rely on automatic type inference from the JSON output and went in for hard coding all data types.

It's definitely a maintenance burden. We could try parsing without it and see where the issues come up, if they still do. Another possibility is generating these type annotations directly from the SQL schemas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that duplicating state is undesirable and adds maintenance load, but relying on auto-classing isn't robust (and there will inevitably be some fields that aren't parsed correctly that we will have to manually specify -- better to specify everything in that case).

generating these type annotations directly from the SQL schemas

That would be ideal. What approach are you imagining @dshemetov ? Would this be parsing the server code or can we query the DB directly for schemas?

@nmdefries nmdefries marked this pull request as ready for review November 28, 2023 17:17
Copy link
Contributor

@dshemetov dshemetov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, couple small suggestions to tighten up the type expectations in create_epidata_call

R/epidatacall.R Outdated Show resolved Hide resolved
R/model.R Outdated Show resolved Hide resolved
@nmdefries nmdefries requested a review from dshemetov November 29, 2023 17:18
) {
cli::cli_warn(
c(
"Not all return columns are specified as expected epidata fields",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, so #223 specifies the headers of every endpoint? That seems likely to require quite a bit of attention to keep up to date. I don't quite get the motivation.

@nmdefries nmdefries merged commit 78c381f into dev Nov 30, 2023
10 checks passed
@nmdefries nmdefries deleted the ndefries/parse-date-cols branch November 30, 2023 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants