-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contrastive Learning on multi-label datasets #2537
Comments
Hi, I think a workaround for this is possible in sentence transformers, but let me first ask whether I understand the problem. What you would like to do is create a dataset such that labels [1, 1, 0] and [0, 1, 0] would make the embeddings closer because of the second and third label but further away from each other because of the first label? |
Hi, I haven't figured out the best way to do it nor the implications, but your suggestion seems interesting and matches my need. My point is to allow to consider class values individually (as your suggestion does) instead of globally (which is the case in the current implementation, where every value for every class has to be the same ST to bring the embeddings closer). |
Here's a somewhat hacky approach which you can use:
The code optimizes two objectives at the same time using two different optimizers, so it's not exactly the same as combining the losses and then backpropagating, although I think that could be done by creating a custom loss function. I hope this helps. |
Hello,
(Cross posting this between SetFit and sentence-transformers)
We're investigating the possibility to use SetFit for customer service message classification.
Our case is a multi-label case since often the customers have more than one request in each message.
During the training phase of SetFit, the texts and labels are passed to Sentence Transformers' SentenceLabelDataset.
The contrastive examples are created based on the combination of labels, not on the intersection of labels, e.g. Labels [1, 1, 0] and [1, 0, 0] are going to be separated by contrastive learning, and only pairs of [1, 1, 0] will be gathered by the contrastive learning phase.
This can be somewhat counter productive in SetFit since with, for example, the classifier "one-vs-rest" which would require examples with one common label to be close to each other.
We were wondering if that behaviour was deliberatelly chosen this way and why ? Would you have experience dealing with this type of data and used a workaround ? Would you be interested in a contribution to allow this type of use-case ?
Cheers,
The text was updated successfully, but these errors were encountered: