Correctly parse issue field for `pub_covid_hosp_state_timeseries` #220

nmdefries · 2023-11-27T22:08:42Z

Closes #202

The main issue here was that the spec for pub_covid_hosp_state_timeseries listed issue twice. The first time the issue field was parsed, it was correctly converted from int (20230101) to date (2023-01-01). The second time the value was visited, the format was wrong (dashes had been added, YYYY-MM-DD) and it couldn't be parsed, since we expect a YYYYMMDD format.

The date-parsing code now skips the parsing step if the input value is already a date.

I've added a couple general warnings and errors to prevent other parsing errors from happening. They are aimed at developers and don't depend on user input. Correctly specifying epidata field info in the code will suppress them.

…rn values have diff number of fields

… subset

nmdefries · 2023-11-28T17:17:04Z

R/model.R

+  ) {
+    cli::cli_warn(
+      c(
+        "Not all return columns are specified as expected epidata fields",


Adding missing specs for public endpoints in another PR

wait, so #223 specifies the headers of every endpoint? That seems likely to require quite a bit of attention to keep up to date. I don't quite get the motivation.

Most endpoint fields were already fully specified. #223 adds some that weren't included. @lcbrooks or @dshemetov any context on why we specify headers?

FWIW headers shouldn't change often/fast. Most endpoints aren't adding new fields. Many aren't being updated anymore and at least covidcast uses the same header for every signal.

It just worries me as yet another place where we're duplicating state. I hadn't caught that it was already pretty thoroughly covered though; in for a penny, in for a pound I guess

It's something we inherited from Sam's original development effort. I am guessing he ran into issues where he couldn't rely on automatic type inference from the JSON output and went in for hard coding all data types.

It's definitely a maintenance burden. We could try parsing without it and see where the issues come up, if they still do. Another possibility is generating these type annotations directly from the SQL schemas.

I agree that duplicating state is undesirable and adds maintenance load, but relying on auto-classing isn't robust (and there will inevitably be some fields that aren't parsed correctly that we will have to manually specify -- better to specify everything in that case).

generating these type annotations directly from the SQL schemas

That would be ideal. What approach are you imagining @dshemetov ? Would this be parsing the server code or can we query the DB directly for schemas?

dshemetov

lgtm, couple small suggestions to tighten up the type expectations in create_epidata_call

R/epidatacall.R

R/model.R

dsweber2 · 2023-11-28T23:55:37Z

R/model.R

+  ) {
+    cli::cli_warn(
+      c(
+        "Not all return columns are specified as expected epidata fields",


wait, so #223 specifies the headers of every endpoint? That seems likely to require quite a bit of attention to keep up to date. I don't quite get the motivation.

nmdefries added 3 commits November 27, 2023 16:48

remove duplicate spec for issue field in hosp state timeseries

737e2cb

error if epidata meta has duplicates

a43ea6f

don't try to convert date fields to date again; warn if meta and retu…

83d7813

…rn values have diff number of fields

Base automatically changed from ndefries/hosp-state-take_as_of to dev November 28, 2023 15:01

nmdefries changed the title ~~Correctly parse issue field for covidcast_hosp_state_timeseries~~ Correctly parse issue field for pub_covid_hosp_state_timeseries Nov 28, 2023

nmdefries added 6 commits November 28, 2023 10:32

check diff of expected and actual field names, since user can request…

2d4f1d4

… subset

provide error and warning class names

0f6b003

test create_epidata_call success and failures

c093c11

test parse_data_frame

a716b27

linting

073ac5b

list unspecified fields in warning message

4228e79

nmdefries commented Nov 28, 2023

View reviewed changes

nmdefries marked this pull request as ready for review November 28, 2023 17:17

nmdefries requested review from dshemetov, brookslogan and dsweber2 as code owners November 28, 2023 17:17

dshemetov approved these changes Nov 29, 2023

View reviewed changes

R/epidatacall.R Outdated Show resolved Hide resolved

R/model.R Outdated Show resolved Hide resolved

verify that field specs are all EpidataFieldInfo objs

8e4fb6c

nmdefries requested a review from dshemetov November 29, 2023 17:18

dsweber2 approved these changes Nov 29, 2023

View reviewed changes

nmdefries merged commit 78c381f into dev Nov 30, 2023
10 checks passed

nmdefries deleted the ndefries/parse-date-cols branch November 30, 2023 14:27

nmdefries mentioned this pull request Dec 7, 2023

Set endpoint fetch types using the SQL schemas directly? #229

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctly parse issue field for `pub_covid_hosp_state_timeseries` #220

Correctly parse issue field for `pub_covid_hosp_state_timeseries` #220

nmdefries commented Nov 27, 2023 •

edited

Loading

nmdefries Nov 28, 2023 •

edited

Loading

nmdefries Nov 28, 2023

dsweber2 Nov 28, 2023

nmdefries Nov 29, 2023

dsweber2 Nov 29, 2023

dshemetov Nov 29, 2023 •

edited

Loading

nmdefries Nov 30, 2023

dshemetov left a comment

dsweber2 Nov 28, 2023

Correctly parse issue field for pub_covid_hosp_state_timeseries #220

Correctly parse issue field for pub_covid_hosp_state_timeseries #220

Conversation

nmdefries commented Nov 27, 2023 • edited Loading

nmdefries Nov 28, 2023 • edited Loading

Choose a reason for hiding this comment

nmdefries Nov 28, 2023

Choose a reason for hiding this comment

dsweber2 Nov 28, 2023

Choose a reason for hiding this comment

nmdefries Nov 29, 2023

Choose a reason for hiding this comment

dsweber2 Nov 29, 2023

Choose a reason for hiding this comment

dshemetov Nov 29, 2023 • edited Loading

Choose a reason for hiding this comment

nmdefries Nov 30, 2023

Choose a reason for hiding this comment

dshemetov left a comment

Choose a reason for hiding this comment

dsweber2 Nov 28, 2023

Choose a reason for hiding this comment

Correctly parse issue field for `pub_covid_hosp_state_timeseries` #220

Correctly parse issue field for `pub_covid_hosp_state_timeseries` #220

nmdefries commented Nov 27, 2023 •

edited

Loading

nmdefries Nov 28, 2023 •

edited

Loading

dshemetov Nov 29, 2023 •

edited

Loading