Russian-Kyrgyz Translation Model (LoResMT 2026)

This model took first place in the Russian-to-Kyrgyz translation track at the LoResMT 2026 Turkic Languages Translation Challenge.

Model Description

  • Base model: mT0-large (pruned from 1.23B to 800M parameters)
  • Languages: Russian (ru) ↔ Kyrgyz (ky)
  • Training: Four-stage curriculum learning on filtered OPUS data and synthetic translations

See our paper for full details: tbd (LoResMT @ EACL 2026)

Performance

Russian → Kyrgyz

Benchmark chrF++ XCOMET-XXL
FLORES-200 devtest 44.9 80.5
LoResMT 2026 test 49.1 69.7

Kyrgyz → Russian

Benchmark chrF++ XCOMET-XXL
FLORES-200 devtest 42.4 82.8

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Novokshanov/ru-ky-mt0-loresmt2026")
model = AutoModelForSeq2SeqLM.from_pretrained("Novokshanov/ru-ky-mt0-loresmt2026")

# Availible prefixes are "<2ky>" and "<2ru>".

text = "<ky>Привет, как дела?"
inputs = tokenizer(text, max_length=512, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512, num_beams=5)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation)

Citation

tbd

License

CC BY-NC 4.0

Downloads last month
8
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Novokshanov/ru-ky-mt0-loresmt2026

Finetuned
(7)
this model

Dataset used to train Novokshanov/ru-ky-mt0-loresmt2026

Space using Novokshanov/ru-ky-mt0-loresmt2026 1

Collection including Novokshanov/ru-ky-mt0-loresmt2026