Instructions to use intfloat/multilingual-e5-large with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use intfloat/multilingual-e5-large with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("intfloat/multilingual-e5-large") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Inference
- Notebooks
- Google Colab
- Kaggle
Finetuning: Problem with dense similarity scores (0.7 .. 1.0) ?
Great ST! But finetuning gets me bad results:
I could not manage to get good results when finetuning for sentence similarity with HF standard classes.
Loss functions tried: (online) contrastive loss, cosine similarity loss. For contrastive, a margin is used, which is usually 0.5. But for your Sentence Transformer this does not make any sense (because sim. scores are within 0.7 and 1.0, as you wrote and as I observed by myself).
Evaluators tried: BinaryClassificationEvaluator, EmbeddingSimilarityEvaluator. However, the real usage shows bad results after finetuning.
Which loss function (with which basic parameters) would you recommend to finetune for sentence similarity?
Especially, I want to match similar queries
Thank you for your reply :-)
I recommend using the InfoNCE loss as mentioned in the paper, this loss is not sensitive to the absolute value of the similarity scores.
If you see decreased performance after fine-tuning, please try to lower learning rate / using fewer steps / use hard negatives etc.