-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
oac fetcher / mapper #418
base: main
Are you sure you want to change the base?
oac fetcher / mapper #418
Conversation
326ada7
to
e922e75
Compare
bd99980
to
1a9c016
Compare
5915a65
to
686652f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mapper uses models of thinking from the dpla-ingestion codebase. We would like to strip out all use of exists
, getprop
, iterify
, and get_vals
. These functions accomodate unknown data structures, where we would like to enforce known data structures. Mapping should be a process of moving from one expected data structures to our normalized well-defined data structure. The gymnastics done in these functions obfuscate where data came from, and where it goes to.
I know that I had previously written an OAC fetcher/mapper pair, and this work all seems to build on what I had previously written. I do think it might be worthwhile to start from scratch and to write more specific mapping functions.
Also, we'd love to get rid of the enrichment select_oac_id
- defined in mapper.py - could you move the mapping logic done there into this mapper, and make the select_oac_id
enrichment a no-op? (The function definition still needs to exist, since we cannot remove it from our enrichment chain in the collection registry due to legacy harvester operations.)
I hear what you're saying here @amywieliczka. Since the
or would you rather see a I assume that because that the text node extraction logic is ok to be duplicated in several places in order to keep the mapper denormalized and simple. Is that a reasonable assumption? The replacements for those fields that currently use |
@amywieliczka I threw some spaghetti at the wall here. |
You know, I really don't mind the hyper-locality of implementing "include" inside each of these functions. It makes a data stack trace for any single one of them really straightforward, which is a huge boon. I think I'm a-okay with this approach. @barbarahui what do you think? I am still seeing one instance of |
I like this approach! Very readable (and traceable, as Amy points out), even if a little redundant. This is a big improvement over what was there before. |
5664e9b
to
d833812
Compare
b68eff9
to
3bd6b7c
Compare
This is a combo fetcher / mapper PR.