-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: refactor medication tables to use modern denormalization #228
Conversation
a18350c
to
b8a2521
Compare
Note: this test was written against the found behavior at the time. | ||
It's not clear this is how we *want* this table to behave. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Callout for this test and the one below - I'm encoding current behavior in these tests, but it's not clear to me what our vision for these tables is. I filed #227 for that discussion.
u.coding.code, | ||
{#- Prefer coding.display, but fall back to concept.text #} | ||
COALESCE(u.coding.display, s.{{ field_alias }}.text) AS display, | ||
u.coding.system AS code_system, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to talk about this a bit - my first blush reaction is I'd rather introduce text as a separate column that force this coalesce to happen here. There may be cases where a study would want to ignore raw text in favor of well coded items (and we :could: get a code without a display, though it's unlikely), or explicitly look at user entered text in combination with a field coded a specific way. This might also eliminate the need for the extra union on line 111.
But I may not be thinking of something else - pushback welcome
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Goals of display/text
Well, let's get alignment on the point of these labels. In my head, it's mostly just so we have something to show in the dashboard as presentation.
On a clinical level, I think we want studies to look at codes & systems (where possible). Trying to parse display
or text
will always be fraught across EHRs etc.
There is a big loophole in that "where possible", granted. But those values will be always be so fuzzy, I'm having a hard time imagining a case where the study knows the good stuff is in text
but not display
.
Scenarios
What's your vibes on each of these?
- Display but no text: easy, we want to use display
- Display and text: in my head, display will always be more relevant to the code in question
- No display but text: I think using
text
in the CUBE is more friendly thancumulus__none
and probably what the EHR intended consumers to do (you say this is unlikely, and I'm sure we could get stats from Cerner at least, but I don't put anything past EHRs) - No codings at all, but text: the "Cerner contained Medication" scenario and honestly, plenty of other places - I figured we'd want these to show up in the CUBE and be available for studies to attempt to parse if they want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if we want to eventually expose text
alongside display
, that is arguably a separate question from using text
as a fallback for display
today - like, in 3 and 4 above - I'd rather have a fallback than nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question: will 4 flood CUBEs with lots of one-off random texts? I dunno how EHRs populate those kind of uncoded fields. Like, if it's usually a dropdown (no problem then) or user-entered text (which seems like it could be a problem).
So in that case, maybe we don't want the fallback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(In my perfect future world, we'd never examine display
or text
and just always get a human label from vocab
or similar, but that doesn't fix 4 anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- easy, agree
- agree
- I agree with this outside of core.
inside of core, I think it's more nuanced. We are still trying to do a bit of 'here's the representation of your data as we found it' presentation here - we've backed out coalescing to 'None'
in favor of 'cumulus_none'
for this reason. This feels like going back to that same well of massaging data in core rather than extracting & re-organizing it. But I think we let a study author basically decide which approach makes sense to them.
4 I think you're right in that 4 will probably generate a wall of junk in cubes, and more likely cause them to get dropped on the floor more than anything else. But I would leave them in for the same detail reasons in 3.
Re: just getting these from a know good source - we might be close with the file upload stuff i'm working on, so maybe we can make this problem go away before too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are still trying to do a bit of 'here's the representation of your data as we found it' presentation here
There is a bit of tension here between this usage of core
and the fact that core
itself also generates counts tables - like for its own counts table, it probably needs an opinion on this kind of coalescing that it doesn't need to be a source of truth for studies.
So I'm hearing "let's not do this coalescing for now" from you - as well as a "maybe we'll add text in the future? who knows"
I can easily take out the scenario 3 coalesce() - that was a whole cloth invention of this PR.
But scenario 4 was used in for contained Cerner Medications. (I think Cerner also does that in other places, like Device.type etc) Do we want to leave it in place for that one Medication use case, or take a stance that we don't do that, and if you want that data, wait for us to do the text
thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And by "Do we want to leave it in place for that one Medication use case" - I mean, only do it for that one code path, and not do it for general codeable concepts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a bit of tension here between this usage of core and the fact that core itself also generates counts tables - like for its own counts table, it probably needs an opinion on this kind of coalescing that it doesn't need to be a source of truth for studies.
I have been thinking of core counts like this:
- If you're generating core counts, it's the first time you're looking at the data through whatever version of the library you're running, for a given set of ETL extracts
- There are two uses for core tables - one is for cross-site population statistics, the other is as a 'what does this data even look like' check
- The latter makes preserving nulls from the source system as is valuable in counts; coalescing them dilutes that value
So yeah, I'm in favor of removing number 3.
The cerner contained thing I don't think we can make a call on changing without at least Jamie in the loop - since he's out this week, I'd propose leaving it as is and making a spike ticket so we remember to discuss it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked a bit more in slack on this. I'll remove both 3 and 4 from this PR and this means this PR removes a tiny feature from the core__medication
table, which is that text-only contained Medications aren't represented anymore (contained Medications with codes are now represented, but weren't before - though we don't know of an EHR that makes those yet).
They were only used by Cerner, were not in any CUBEs exported by the core
study, and weren't really usable (just text, no code).
Until we figure out how to approach text
, this felt like the sanest/most-consistent path.
Further text thinking issue filed here: #229
274a0cb
to
a4257a5
Compare
core__medication: - Template cleanup to use standard denorm tables rather than custom extraction code. - If contained resources provide actual codes, we pull them out now (previously we only looked at the contained Concept.text) - No longer pull Concept.text from contained Medications, until we decide on an approach for Concept.text across the board) core__medicationrequest: - No longer ignores rows without dosageInstruction.text General CodeableConcept denormalization: - Add userSelected boolean field to all denorm tables - Allow the Python code to request extra fields to include along with the coding information (used for capturing contained.id fields) Misc: - Depend on cumulus-fhir-support 1.1 for its contained schema support
a4257a5
to
64f171c
Compare
core__medication:
core__medicationrequest:
General CodeableConcept denormalization:
Misc:
Fixes #218
Checklist
docs/
) needs to be updated