You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Testing the bge-m3 embedding model, I wanted to see how it behaves under varying scenarios. After generating sparse embeddings and storing them in some json, I wanted to calculate their similarity using the _compute_single_lexical_matching_score method, which is defined in FlagEmbedding/inference/embedder/encoder_only/m3.py.
However, I got e.g. only a score of 0.23 when comparing identical sparse-embeddings
Here an output from my terminal: Teste Sparse Similarity Berechnung mit konvertierten Embeddings... Sparse Similarity Score: 0.23759149310728778 Similarity Berechnung erfolgreich! Sparse 1: {35542: 0.16986805200576782, 443: 0.1528966724872589, 599: 0.0936431884765625, 8647: 0.30713802576065063, 9: 0.04344563186168671, 174379: 0.2834935784339905} Sparse 2: {35542: 0.16986805200576782, 443: 0.1528966724872589, 599: 0.0936431884765625, 8647: 0.30713802576065063, 9: 0.04344563186168671, 174379: 0.2834935784339905}
Maybe I'm wrong, but wouldnt we need some kind of normalization factor for that? Currently only a simple dot-product is conducted.
The text was updated successfully, but these errors were encountered:
Since sparse embeddings are not normalized, the sparse embedding similarity between identical embeddings cannot reach 1.
It doesn't need normalization.
Testing the bge-m3 embedding model, I wanted to see how it behaves under varying scenarios. After generating sparse embeddings and storing them in some json, I wanted to calculate their similarity using the _compute_single_lexical_matching_score method, which is defined in FlagEmbedding/inference/embedder/encoder_only/m3.py.
However, I got e.g. only a score of 0.23 when comparing identical sparse-embeddings
Here an output from my terminal:
Teste Sparse Similarity Berechnung mit konvertierten Embeddings... Sparse Similarity Score: 0.23759149310728778 Similarity Berechnung erfolgreich! Sparse 1: {35542: 0.16986805200576782, 443: 0.1528966724872589, 599: 0.0936431884765625, 8647: 0.30713802576065063, 9: 0.04344563186168671, 174379: 0.2834935784339905} Sparse 2: {35542: 0.16986805200576782, 443: 0.1528966724872589, 599: 0.0936431884765625, 8647: 0.30713802576065063, 9: 0.04344563186168671, 174379: 0.2834935784339905}
Maybe I'm wrong, but wouldnt we need some kind of normalization factor for that? Currently only a simple dot-product is conducted.
The text was updated successfully, but these errors were encountered: