You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for the very helpful package. I am using it to link clinical trials from two databases, one in local language, one in English. Linking is done cross-lingually on the title of the trial, intervention etc.
Is there an option to precompute the embeddings for one of the databases (the "corpus"), so that the embeddings of the corpus database do not need to be recomputed every time one of the linktransformer commands are run and save time? In that way, only the query trial needs to be embedded and then the best match with the existing embeddings from the corpus can be evaluated.
I am thinking about a variant of linktransformer's evaluate_pairs method: If I have a new trial, I would load that into df1 and embeddings are calculated, and as df2 the precomputed embeddings of the corpus dataframe would be loaded.
Thanks in advance!
The text was updated successfully, but these errors were encountered:
While we currently don't support it, it can easily be done. I invite you or anyone else to contribute for this feature - or I will add it in the next update.
This would just involve adding an argument that accepts a path to an embeddings pickle which is loaded after the function call and not embedding the text if they are already loaded via the pickle. That can be done with any function.
I answered a similar issue before (on huggingface) here. That might help you to do this without tinkering the package as well.
Hi all
Thank you for the very helpful package. I am using it to link clinical trials from two databases, one in local language, one in English. Linking is done cross-lingually on the title of the trial, intervention etc.
Is there an option to precompute the embeddings for one of the databases (the "corpus"), so that the embeddings of the corpus database do not need to be recomputed every time one of the linktransformer commands are run and save time? In that way, only the query trial needs to be embedded and then the best match with the existing embeddings from the corpus can be evaluated.
I am thinking about a variant of linktransformer's
evaluate_pairs
method: If I have a new trial, I would load that into df1 and embeddings are calculated, and as df2 the precomputed embeddings of the corpus dataframe would be loaded.Thanks in advance!
The text was updated successfully, but these errors were encountered: