core: refactor medication tables to use modern denormalization #228

mikix · 2024-04-29T14:07:08Z

core__medication:

Template cleanup to use standard denorm tables rather than custom extraction code.
If contained resources provide actual codes, we pull them out now (previously we only looked at the contained Concept.text)
No longer pull Concept.text from contained Medications, until we decide on an approach for Concept.text across the board)

core__medicationrequest:

No longer ignores rows without dosageInstruction.text

General CodeableConcept denormalization:

Add userSelected boolean field to all denorm tables
Allow the Python code to request extra fields to include along with the coding information (used for capturing contained.id fields)

Misc:

Depend on cumulus-fhir-support 1.1 for its contained schema support

Fixes #218

Checklist

Consider if documentation (like in docs/) needs to be updated
Consider if tests should be added
Update template repo if there are changes to study configuration

mikix · 2024-04-29T14:13:24Z

tests/core/test_core_meds.py

+    Note: this test was written against the found behavior at the time.
+    It's not clear this is how we *want* this table to behave.


Callout for this test and the one below - I'm encoding current behavior in these tests, but it's not clear to me what our vision for these tables is. I filed #227 for that discussion.

dogversioning · 2024-04-29T14:32:46Z

cumulus_library/template_sql/codeable_concept_denormalize.sql.jinja

+            u.coding.code,
+            {#- Prefer coding.display, but fall back to concept.text #}
+            COALESCE(u.coding.display, s.{{ field_alias }}.text) AS display,
+            u.coding.system AS code_system,


I want to talk about this a bit - my first blush reaction is I'd rather introduce text as a separate column that force this coalesce to happen here. There may be cases where a study would want to ignore raw text in favor of well coded items (and we :could: get a code without a display, though it's unlikely), or explicitly look at user entered text in combination with a field coded a specific way. This might also eliminate the need for the extra union on line 111.

But I may not be thinking of something else - pushback welcome

Goals of display/text

Well, let's get alignment on the point of these labels. In my head, it's mostly just so we have something to show in the dashboard as presentation.

On a clinical level, I think we want studies to look at codes & systems (where possible). Trying to parse display or text will always be fraught across EHRs etc.

There is a big loophole in that "where possible", granted. But those values will be always be so fuzzy, I'm having a hard time imagining a case where the study knows the good stuff is in text but not display.

Scenarios
What's your vibes on each of these?

Display but no text: easy, we want to use display

Display and text: in my head, display will always be more relevant to the code in question

No display but text: I think using text in the CUBE is more friendly than cumulus__none and probably what the EHR intended consumers to do (you say this is unlikely, and I'm sure we could get stats from Cerner at least, but I don't put anything past EHRs)

No codings at all, but text: the "Cerner contained Medication" scenario and honestly, plenty of other places - I figured we'd want these to show up in the CUBE and be available for studies to attempt to parse if they want.

Even if we want to eventually expose text alongside display, that is arguably a separate question from using text as a fallback for display today - like, in 3 and 4 above - I'd rather have a fallback than nothing.

Another question: will 4 flood CUBEs with lots of one-off random texts? I dunno how EHRs populate those kind of uncoded fields. Like, if it's usually a dropdown (no problem then) or user-entered text (which seems like it could be a problem).

So in that case, maybe we don't want the fallback?

(In my perfect future world, we'd never examine display or text and just always get a human label from vocab or similar, but that doesn't fix 4 anyway)

easy, agree

agree

I agree with this outside of core.

inside of core, I think it's more nuanced. We are still trying to do a bit of 'here's the representation of your data as we found it' presentation here - we've backed out coalescing to 'None' in favor of 'cumulus_none' for this reason. This feels like going back to that same well of massaging data in core rather than extracting & re-organizing it. But I think we let a study author basically decide which approach makes sense to them.

4 I think you're right in that 4 will probably generate a wall of junk in cubes, and more likely cause them to get dropped on the floor more than anything else. But I would leave them in for the same detail reasons in 3.

Re: just getting these from a know good source - we might be close with the file upload stuff i'm working on, so maybe we can make this problem go away before too long.

We are still trying to do a bit of 'here's the representation of your data as we found it' presentation here

There is a bit of tension here between this usage of core and the fact that core itself also generates counts tables - like for its own counts table, it probably needs an opinion on this kind of coalescing that it doesn't need to be a source of truth for studies.

So I'm hearing "let's not do this coalescing for now" from you - as well as a "maybe we'll add text in the future? who knows"

I can easily take out the scenario 3 coalesce() - that was a whole cloth invention of this PR.

But scenario 4 was used in for contained Cerner Medications. (I think Cerner also does that in other places, like Device.type etc) Do we want to leave it in place for that one Medication use case, or take a stance that we don't do that, and if you want that data, wait for us to do the text thing?

And by "Do we want to leave it in place for that one Medication use case" - I mean, only do it for that one code path, and not do it for general codeable concepts.

There is a bit of tension here between this usage of core and the fact that core itself also generates counts tables - like for its own counts table, it probably needs an opinion on this kind of coalescing that it doesn't need to be a source of truth for studies.

I have been thinking of core counts like this:

If you're generating core counts, it's the first time you're looking at the data through whatever version of the library you're running, for a given set of ETL extracts

There are two uses for core tables - one is for cross-site population statistics, the other is as a 'what does this data even look like' check

The latter makes preserving nulls from the source system as is valuable in counts; coalescing them dilutes that value

So yeah, I'm in favor of removing number 3.

The cerner contained thing I don't think we can make a call on changing without at least Jamie in the loop - since he's out this week, I'd propose leaving it as is and making a spike ticket so we remember to discuss it?

Talked a bit more in slack on this. I'll remove both 3 and 4 from this PR and this means this PR removes a tiny feature from the core__medication table, which is that text-only contained Medications aren't represented anymore (contained Medications with codes are now represented, but weren't before - though we don't know of an EHR that makes those yet).

They were only used by Cerner, were not in any CUBEs exported by the core study, and weren't really usable (just text, no code).

Until we figure out how to approach text, this felt like the sanest/most-consistent path.

Further text thinking issue filed here: #229

core__medication: - Template cleanup to use standard denorm tables rather than custom extraction code. - If contained resources provide actual codes, we pull them out now (previously we only looked at the contained Concept.text) - No longer pull Concept.text from contained Medications, until we decide on an approach for Concept.text across the board) core__medicationrequest: - No longer ignores rows without dosageInstruction.text General CodeableConcept denormalization: - Add userSelected boolean field to all denorm tables - Allow the Python code to request extra fields to include along with the coding information (used for capturing contained.id fields) Misc: - Depend on cumulus-fhir-support 1.1 for its contained schema support

mikix force-pushed the mikix/medreq-dosage branch from a18350c to b8a2521 Compare April 29, 2024 14:07

mikix commented Apr 29, 2024

View reviewed changes

dogversioning approved these changes Apr 29, 2024

View reviewed changes

mikix force-pushed the mikix/medreq-dosage branch 2 times, most recently from 274a0cb to a4257a5 Compare April 29, 2024 18:45

mikix force-pushed the mikix/medreq-dosage branch from a4257a5 to 64f171c Compare April 29, 2024 18:56

mikix merged commit ca83935 into main Apr 29, 2024
3 checks passed

mikix deleted the mikix/medreq-dosage branch April 29, 2024 19:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core: refactor medication tables to use modern denormalization #228

core: refactor medication tables to use modern denormalization #228

mikix commented Apr 29, 2024 •

edited

Loading

mikix Apr 29, 2024

dogversioning Apr 29, 2024

mikix Apr 29, 2024

mikix Apr 29, 2024

mikix Apr 29, 2024

mikix Apr 29, 2024 •

edited

Loading

dogversioning Apr 29, 2024

mikix Apr 29, 2024

mikix Apr 29, 2024

dogversioning Apr 29, 2024

mikix Apr 29, 2024 •

edited

Loading

		Note: this test was written against the found behavior at the time.
		It's not clear this is how we want this table to behave.

core: refactor medication tables to use modern denormalization #228

core: refactor medication tables to use modern denormalization #228

Conversation

mikix commented Apr 29, 2024 • edited Loading

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikix Apr 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikix Apr 29, 2024 • edited Loading

Choose a reason for hiding this comment

mikix commented Apr 29, 2024 •

edited

Loading

mikix Apr 29, 2024 •

edited

Loading

mikix Apr 29, 2024 •

edited

Loading