Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: refactor medication tables to use modern denormalization #228

Merged
merged 1 commit into from
Apr 29, 2024

Conversation

mikix
Copy link
Contributor

@mikix mikix commented Apr 29, 2024

core__medication:

  • Template cleanup to use standard denorm tables rather than custom extraction code.
  • If contained resources provide actual codes, we pull them out now (previously we only looked at the contained Concept.text)
  • No longer pull Concept.text from contained Medications, until we decide on an approach for Concept.text across the board)

core__medicationrequest:

  • No longer ignores rows without dosageInstruction.text

General CodeableConcept denormalization:

  • Add userSelected boolean field to all denorm tables
  • Allow the Python code to request extra fields to include along with the coding information (used for capturing contained.id fields)

Misc:

  • Depend on cumulus-fhir-support 1.1 for its contained schema support

Fixes #218

Checklist

  • Consider if documentation (like in docs/) needs to be updated
  • Consider if tests should be added
  • Update template repo if there are changes to study configuration

@mikix mikix force-pushed the mikix/medreq-dosage branch from a18350c to b8a2521 Compare April 29, 2024 14:07
Comment on lines +54 to +55
Note: this test was written against the found behavior at the time.
It's not clear this is how we *want* this table to behave.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callout for this test and the one below - I'm encoding current behavior in these tests, but it's not clear to me what our vision for these tables is. I filed #227 for that discussion.

Comment on lines 72 to 74
u.coding.code,
{#- Prefer coding.display, but fall back to concept.text #}
COALESCE(u.coding.display, s.{{ field_alias }}.text) AS display,
u.coding.system AS code_system,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to talk about this a bit - my first blush reaction is I'd rather introduce text as a separate column that force this coalesce to happen here. There may be cases where a study would want to ignore raw text in favor of well coded items (and we :could: get a code without a display, though it's unlikely), or explicitly look at user entered text in combination with a field coded a specific way. This might also eliminate the need for the extra union on line 111.

But I may not be thinking of something else - pushback welcome

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Goals of display/text

Well, let's get alignment on the point of these labels. In my head, it's mostly just so we have something to show in the dashboard as presentation.

On a clinical level, I think we want studies to look at codes & systems (where possible). Trying to parse display or text will always be fraught across EHRs etc.

There is a big loophole in that "where possible", granted. But those values will be always be so fuzzy, I'm having a hard time imagining a case where the study knows the good stuff is in text but not display.

Scenarios
What's your vibes on each of these?

  1. Display but no text: easy, we want to use display
  2. Display and text: in my head, display will always be more relevant to the code in question
  3. No display but text: I think using text in the CUBE is more friendly than cumulus__none and probably what the EHR intended consumers to do (you say this is unlikely, and I'm sure we could get stats from Cerner at least, but I don't put anything past EHRs)
  4. No codings at all, but text: the "Cerner contained Medication" scenario and honestly, plenty of other places - I figured we'd want these to show up in the CUBE and be available for studies to attempt to parse if they want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we want to eventually expose text alongside display, that is arguably a separate question from using text as a fallback for display today - like, in 3 and 4 above - I'd rather have a fallback than nothing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question: will 4 flood CUBEs with lots of one-off random texts? I dunno how EHRs populate those kind of uncoded fields. Like, if it's usually a dropdown (no problem then) or user-entered text (which seems like it could be a problem).

So in that case, maybe we don't want the fallback?

Copy link
Contributor Author

@mikix mikix Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(In my perfect future world, we'd never examine display or text and just always get a human label from vocab or similar, but that doesn't fix 4 anyway)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. easy, agree
  2. agree
  3. I agree with this outside of core.

inside of core, I think it's more nuanced. We are still trying to do a bit of 'here's the representation of your data as we found it' presentation here - we've backed out coalescing to 'None' in favor of 'cumulus_none' for this reason. This feels like going back to that same well of massaging data in core rather than extracting & re-organizing it. But I think we let a study author basically decide which approach makes sense to them.

4 I think you're right in that 4 will probably generate a wall of junk in cubes, and more likely cause them to get dropped on the floor more than anything else. But I would leave them in for the same detail reasons in 3.

Re: just getting these from a know good source - we might be close with the file upload stuff i'm working on, so maybe we can make this problem go away before too long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are still trying to do a bit of 'here's the representation of your data as we found it' presentation here

There is a bit of tension here between this usage of core and the fact that core itself also generates counts tables - like for its own counts table, it probably needs an opinion on this kind of coalescing that it doesn't need to be a source of truth for studies.

So I'm hearing "let's not do this coalescing for now" from you - as well as a "maybe we'll add text in the future? who knows"

I can easily take out the scenario 3 coalesce() - that was a whole cloth invention of this PR.

But scenario 4 was used in for contained Cerner Medications. (I think Cerner also does that in other places, like Device.type etc) Do we want to leave it in place for that one Medication use case, or take a stance that we don't do that, and if you want that data, wait for us to do the text thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And by "Do we want to leave it in place for that one Medication use case" - I mean, only do it for that one code path, and not do it for general codeable concepts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bit of tension here between this usage of core and the fact that core itself also generates counts tables - like for its own counts table, it probably needs an opinion on this kind of coalescing that it doesn't need to be a source of truth for studies.

I have been thinking of core counts like this:

  • If you're generating core counts, it's the first time you're looking at the data through whatever version of the library you're running, for a given set of ETL extracts
  • There are two uses for core tables - one is for cross-site population statistics, the other is as a 'what does this data even look like' check
    • The latter makes preserving nulls from the source system as is valuable in counts; coalescing them dilutes that value

So yeah, I'm in favor of removing number 3.

The cerner contained thing I don't think we can make a call on changing without at least Jamie in the loop - since he's out this week, I'd propose leaving it as is and making a spike ticket so we remember to discuss it?

Copy link
Contributor Author

@mikix mikix Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked a bit more in slack on this. I'll remove both 3 and 4 from this PR and this means this PR removes a tiny feature from the core__medication table, which is that text-only contained Medications aren't represented anymore (contained Medications with codes are now represented, but weren't before - though we don't know of an EHR that makes those yet).

They were only used by Cerner, were not in any CUBEs exported by the core study, and weren't really usable (just text, no code).

Until we figure out how to approach text, this felt like the sanest/most-consistent path.

Further text thinking issue filed here: #229

@mikix mikix force-pushed the mikix/medreq-dosage branch 2 times, most recently from 274a0cb to a4257a5 Compare April 29, 2024 18:45
core__medication:
- Template cleanup to use standard denorm tables rather than custom
  extraction code.
- If contained resources provide actual codes, we pull them out now
  (previously we only looked at the contained Concept.text)
- No longer pull Concept.text from contained Medications, until we
  decide on an approach for Concept.text across the board)

core__medicationrequest:
- No longer ignores rows without dosageInstruction.text

General CodeableConcept denormalization:
- Add userSelected boolean field to all denorm tables
- Allow the Python code to request extra fields to include along with
  the coding information (used for capturing contained.id fields)

Misc:
- Depend on cumulus-fhir-support 1.1 for its contained schema support
@mikix mikix force-pushed the mikix/medreq-dosage branch from a4257a5 to 64f171c Compare April 29, 2024 18:56
@mikix mikix merged commit ca83935 into main Apr 29, 2024
3 checks passed
@mikix mikix deleted the mikix/medreq-dosage branch April 29, 2024 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

core__medicationrequest ignores rows without dosage instructions
2 participants