Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OAI: If item identifier has special characters, temp metadata filename doesn't match filegetter #508

Open
bondjimbond opened this issue Mar 11, 2020 · 4 comments

Comments

@bondjimbond
Copy link
Collaborator

I'm doing an OAI migration, and running into problems in src/fetchers/Oaipmh.php.

The Fetcher assumes that the $identifier and $record_key are the same value, but they aren't necessarily.

If the item identifier contains special characters (e.g. oai:thisvancouver.vpl.ca:islandora_1910), MIK treats it differently in different contexts.

When writing the temporary metadata files: https://github.com/MarcusBarnes/mik/blob/master/src/fetchers/Oaipmh.php#L80-L82

Resulting filename: oai%3Athisvancouver.vpl.ca%3Aislandora_1910.metadata.

But the $record_key that is used everywhere else in the code looks like this: oai_thisvancouver.vpl.ca_islandora_1910.

So you end up with problems like this:

ErrorException.ERROR: ErrorException {"message":"file_get_contents(/Volumes/Arca/tmp/oaitest_temp/oai_thisvancouver.vpl.ca_islandora_410.metadata): failed to open stream: No such file or directory","code":{"record_key":"oai_thisvancouver.vpl.ca_islandora_1910","raw_metadata_path":"/Volumes/Arca/tmp/oaitest_temp/oai_thisvancouver.vpl.ca_islandora_1910.metadata","dom":"[object] (DOMDocument: {})"},"severity":2,"file":"/Users/brandon/sfuvault/mik/src/filegetters/OaipmhModsXpath.php","line":56} []

Because the filegetter is looking for $record_key.metadata, while the actual filename is $identifier.metadata. So it can't actually find the file.

So... how the heck do we fix this?

@bondjimbond
Copy link
Collaborator Author

Trying to find where $record_key is first defined.

@mjordan
Copy link
Collaborator

mjordan commented Mar 11, 2020

I've never liked the fact that the OAI identifiers are so ugly and complex. There is a spec for OAI-PMH identifiers, that defines identifiers using the pattern oai-identifier = scheme ":" namespace-identifier ":" local-identifier. (Note that "namespace" here is not related to Fedora namespaces, it identifies the source OAI repository.) We could, in all places in the MIK OAI code, strip out everything but the "local identifier" part and use that as both the filename and the record key. That would at least give us less rope to hang ourselves with since the filename/record key would be a lot shorter than it is now.

But there is a problem with this: the OAI identifier spec uses : to separate the OAI-specific bits out from the local identifier... which in the case of Islandora source repos is the PID, which itself contains a :.

Maybe a general way to approach this is to modify MIK to strip out everything before and after the local identifier part and then to replace any : with an underscore. If this is done with a central function, we'd just call that function where ever MIK creates or needs to predict an identifier for an object.

@bondjimbond
Copy link
Collaborator Author

That sounds reasonable to me. Where are you thinking of doing this, and what would the function be?

For a quick and dirty patch, I'm thinking the convert-to-underscore would have to happen here: https://github.com/MarcusBarnes/mik/blob/master/src/fetchers/Oaipmh.php#L80-L82

That might just do the job... What do you think?

@bondjimbond
Copy link
Collaborator Author

OK, I've made a change. In that section:

                $identifier = ($rec->header->identifier);
                $identifier = json_decode(json_encode($identifier), 1)[0];
                $identifier = urlencode(str_replace(':', '_', $identifier));

This seems to work; I'm getting files! Unfortunately, the files are not being written to the directories that are created... Weird.

@bondjimbond bondjimbond mentioned this issue Mar 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants