Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 12
This is a sentence-transformers model finetuned from rasyosef/roberta-amharic-text-embedding-medium. It maps sentences & paragraphs to a 512-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 510, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
(1): Pooling({'word_embedding_dimension': 512, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("mogesa/Roberta-amharic-news-sentence-transformer")
# Run inference
sentences = [
'የግሉ ወረት እና አፍሪቃ',
'« በፊናንሱ ተቋማዊ ተሀድሶ የተነሳ የአፍሪቃ ህብረት ለቀጣዩ በጀቱ 12 ከመቶ ቁጠባ አድርጓል በዚህ አባል ሀገራት ያበረከቱት አስተዋፅኦ ትልቅ ነው',
'በሱዳን ጉዳይ ጣልቃ በመግባት የነዳጅ የሌሎች የተፈጥሮ ሀብቷን የመቀራመት እድል ሊፈጠር ሰበብ የሚሰጡ ሀገራት መኖራቸው ደግሞ ሁለተኛው ምክንያት ነው',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 512]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
በማእከላዊ ጎንደር ዞን ጠገዴ የታገቱት ስድስት ታዳጊዎች ለምን ተገደሉ |
"ቦታው ዘወር ያለ ነበር ኮከራ ቀበሌ የሚባል ድሮም 'የሽፍታ መጠጊያ' ይባላል |
0.33186144 |
የኢትዮ-ምህዳር ጋዜጣ ዋና አዘጋጅ ታሰረ |
ዋና አዘጋጁ በወንጀል ህግ በአንቀፅ 613 “ስማ ማጥፋት የሀሰት ሀሜት” በሚል የተቀመጠውን ተላልፏል በሚል የተከሰሰው |
0.50249875 |
አምባሳደር ሺን ፤ ኢትዮጵያና ኤርትራ |
አምባሳደሩ ቀደም በአለም አቀፍ ፍርድ ቤት በተደረገ ድርድር ውጤት ባድመ የኤርትራ መሆኗን እትዮጵያውያን መቀበል ይኖርባቸዋል ብለዋል |
0.54789203 |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
per_device_train_batch_size: 16per_device_eval_batch_size: 16multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin| Epoch | Step | Training Loss |
|---|---|---|
| 0.0367 | 500 | 1.2372 |
| 0.0734 | 1000 | 1.0754 |
| 0.1102 | 1500 | 1.0128 |
| 0.1469 | 2000 | 0.9841 |
| 0.1836 | 2500 | 0.944 |
| 0.2203 | 3000 | 0.9168 |
| 0.2571 | 3500 | 0.8863 |
| 0.2938 | 4000 | 0.8685 |
| 0.3305 | 4500 | 0.8575 |
| 0.3672 | 5000 | 0.8637 |
| 0.4039 | 5500 | 0.8353 |
| 0.4407 | 6000 | 0.8147 |
| 0.4774 | 6500 | 0.7913 |
| 0.5141 | 7000 | 0.7751 |
| 0.5508 | 7500 | 0.7719 |
| 0.5875 | 8000 | 0.7605 |
| 0.6243 | 8500 | 0.7206 |
| 0.6610 | 9000 | 0.7219 |
| 0.6977 | 9500 | 0.7302 |
| 0.7344 | 10000 | 0.7307 |
| 0.7712 | 10500 | 0.7019 |
| 0.8079 | 11000 | 0.7127 |
| 0.8446 | 11500 | 0.6693 |
| 0.8813 | 12000 | 0.6934 |
| 0.9180 | 12500 | 0.6721 |
| 0.9548 | 13000 | 0.6657 |
| 0.9915 | 13500 | 0.6696 |
| 1.0282 | 14000 | 0.5583 |
| 1.0649 | 14500 | 0.5335 |
| 1.1016 | 15000 | 0.5234 |
| 1.1384 | 15500 | 0.5192 |
| 1.1751 | 16000 | 0.5317 |
| 1.2118 | 16500 | 0.5325 |
| 1.2485 | 17000 | 0.5201 |
| 1.2853 | 17500 | 0.5096 |
| 1.3220 | 18000 | 0.5001 |
| 1.3587 | 18500 | 0.5015 |
| 1.3954 | 19000 | 0.4862 |
| 1.4321 | 19500 | 0.4901 |
| 1.4689 | 20000 | 0.5168 |
| 1.5056 | 20500 | 0.499 |
| 1.5423 | 21000 | 0.4937 |
| 1.5790 | 21500 | 0.4772 |
| 1.6157 | 22000 | 0.4709 |
| 1.6525 | 22500 | 0.4971 |
| 1.6892 | 23000 | 0.485 |
| 1.7259 | 23500 | 0.4689 |
| 1.7626 | 24000 | 0.4789 |
| 1.7994 | 24500 | 0.4606 |
| 1.8361 | 25000 | 0.4711 |
| 1.8728 | 25500 | 0.4774 |
| 1.9095 | 26000 | 0.4649 |
| 1.9462 | 26500 | 0.4779 |
| 1.9830 | 27000 | 0.4703 |
| 2.0197 | 27500 | 0.4202 |
| 2.0564 | 28000 | 0.389 |
| 2.0931 | 28500 | 0.3824 |
| 2.1298 | 29000 | 0.3682 |
| 2.1666 | 29500 | 0.3764 |
| 2.2033 | 30000 | 0.366 |
| 2.2400 | 30500 | 0.3723 |
| 2.2767 | 31000 | 0.38 |
| 2.3135 | 31500 | 0.3632 |
| 2.3502 | 32000 | 0.3817 |
| 2.3869 | 32500 | 0.3894 |
| 2.4236 | 33000 | 0.3844 |
| 2.4603 | 33500 | 0.3761 |
| 2.4971 | 34000 | 0.3871 |
| 2.5338 | 34500 | 0.3672 |
| 2.5705 | 35000 | 0.3621 |
| 2.6072 | 35500 | 0.3907 |
| 2.6439 | 36000 | 0.3688 |
| 2.6807 | 36500 | 0.3653 |
| 2.7174 | 37000 | 0.3632 |
| 2.7541 | 37500 | 0.3698 |
| 2.7908 | 38000 | 0.3696 |
| 2.8276 | 38500 | 0.3624 |
| 2.8643 | 39000 | 0.3731 |
| 2.9010 | 39500 | 0.3634 |
| 2.9377 | 40000 | 0.3504 |
| 2.9744 | 40500 | 0.3643 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
rasyosef/roberta-medium-amharic