Replies: 2 comments
-
There's code in Tribuo for that (https://github.com/oracle/tribuo/blob/main/Interop/ONNX/src/main/java/org/tribuo/interop/onnx/extractors/BERTFeatureExtractor.java), with a notebook showing how to use it as part of a Tribuo data processing pipeline (https://tribuo.org/learn/4.3/tutorials/document-classification-tribuo-v4.html). If you want direct use of the embeddings I've not written a pure ORT tutorial for BERT, but you can see how to use CLIP embeddings in this example project which will be pretty similar - https://github.com/oracle/sd4j/blob/main/src/main/java/com/oracle/labs/mlrg/sd4j/TextEmbedder.java. It shows how to use the tokenizer and get the embeddings back out. |
Beta Was this translation helpful? Give feedback.
-
Thanks Adam.
…On Mon, May 6, 2024 at 8:21 PM Adam Pocock ***@***.***> wrote:
There's code in Tribuo for that (
https://github.com/oracle/tribuo/blob/main/Interop/ONNX/src/main/java/org/tribuo/interop/onnx/extractors/BERTFeatureExtractor.java),
with a notebook showing how to use it as part of a Tribuo data processing
pipeline (
https://tribuo.org/learn/4.3/tutorials/document-classification-tribuo-v4.html).
If you want direct use of the embeddings I've not written a pure ORT
tutorial for BERT, but you can see how to use CLIP embeddings in sd4j which
will be pretty similar -
https://github.com/oracle/sd4j/blob/main/src/main/java/com/oracle/labs/mlrg/sd4j/TextEmbedder.java.
It shows how to use the tokenizer and get the embeddings back out.
—
Reply to this email directly, view it on GitHub
<#20586 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A6NWEK57MMEN4IKC44V7YDLZBBCDVAVCNFSM6AAAAABHJ5LGF2VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TGMZWGMYTS>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I am interested in using this library from Java to tokenize and calculate vector embeddings. Is there any code sample showing how to do this? I saw a sample here that covers tokenization but its in C#.
Beta Was this translation helpful? Give feedback.
All reactions