You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've talked here and there about how to minimize unnecessary data transfers, and discussed the merits and drawbacks of various approaches. In particular, I'm not crazy about using a log to figure out where a file should or shouldn't be--I'd rather ask the source of truth itself!
In this connection, I'm considering an additional endpoint for the Database specification that searches for files by their MD5 checksums specifically, instead of using search queries. This endpoint would accept an array of checksums and return their corresponding file IDs (or null in the case that they aren't found).
Obviously this is a very complicated problem to solve, and the above approach doesn't begin to handle all of the nastiness to do with files that have been transferred but don't yet have IDs, etc. But I think it would at least give us a solid point of departure. I think I can probably stand up a JDP file checksum search endpoint that uses JAMO.
The text was updated successfully, but these errors were encountered:
JAMO does maintain an md5sum field in its records, but it's hard to know how many records have this populated. Also, it's not an indexed field, which produces pretty terrible performance for queries that select records related to it. I've had no luck getting results from JAMO queries that reference known md5 checksums. So unless I'm overlooking something, it doesn't look like JAMO can provide this capability.
The JAMO documentation says that it's possible to ask the team to add another index. That's an option to explore as this becomes more important.
Good news here--Chris Beecroft told me he would add md5 checksums to the set of indexed JAMO fields along with some others requested by the JDP team. So it looks like we'll be able to work with the JDP team to add an endpoint that supports querying files by md5 checksum soon.
We've talked here and there about how to minimize unnecessary data transfers, and discussed the merits and drawbacks of various approaches. In particular, I'm not crazy about using a log to figure out where a file should or shouldn't be--I'd rather ask the source of truth itself!
In this connection, I'm considering an additional endpoint for the Database specification that searches for files by their MD5 checksums specifically, instead of using search queries. This endpoint would accept an array of checksums and return their corresponding file IDs (or
null
in the case that they aren't found).Obviously this is a very complicated problem to solve, and the above approach doesn't begin to handle all of the nastiness to do with files that have been transferred but don't yet have IDs, etc. But I think it would at least give us a solid point of departure. I think I can probably stand up a JDP file checksum search endpoint that uses JAMO.
The text was updated successfully, but these errors were encountered: