-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate to venomx metadata #81
Comments
@cmungall - what are you thoughts? It seems like it'd make sense to align with venomx, since most (all?) of the metadata is going to be dataset and embedding-related |
venomx assumes each indexed object has a unique curategpt doesn't make any assumptions about indexed objects, it can be any json obj / python dict. some wrappers (e.g. ontology have a primary key) but others like the maxoa wrapper return associations, which don't have a natural primary key some options are
But I don't think either of these are ideal I think it's best if we say the mapping is to vx is only supported if the collection declares an identifier field |
Than it actually works well with DuckDB as this also wants unique ids for each indexed object. I kind of like the idea 2.
Just a thought: However for the beginning we could also test it a bit by not incorporating the whole venomx model/schema into the metadata but just adding a field for it. This way we can see and test it out, and roll back easily in any case. |
I kind of like 2) also. For collections that have IDs it works fine, and for those that do not have IDs, it doesn't seem like it hurts anything. Maybe we can mint them using a hash function of all the fields so they are deterministic?
(or is that too slow) |
I'm hesitant to include autogenerated identifiers if the process is opaque to users, i.e., if it's just made by CurateGPT for purposes of fitting the metadata model, then it isn't clear whether the ID refers to the some original source or the newly created data (though in this case it will be the latter). It works in the KGs because most edges don't start with IDs but in this setting there's likely to be a mishmash of different sources with and without IDs, plus the newly generated things. |
Per convo with @caufieldjh @iQuxLE et al, change curategpt to use venomx for metadata
The text was updated successfully, but these errors were encountered: