RomanSetu Collection Romansetu is a collection of models address the challenge of extending Large Language Models (LLMs) to non-English languages using non-Latin scripts • 11 items • Updated Mar 7, 2025 • 5
IndicRagSuite Collection A comprehensive dataset collection for Indic language information retrieval. • 2 items • Updated Mar 2 • 3
IndicLLMSuite Collection Largest Collections of Pretraining and Instruction Finetuning datasets for 22 Indic languages. • 4 items • Updated Nov 5, 2024 • 19
IndicConformer Collection A collection of ASR models for 22 scheduled languages of India • 23 items • Updated Mar 2 • 34
Airavata Evaluation Suite Collection A collection of benchmarks used for evaluation of Airavata, an Hindi instruction-tuned model on top of Sarvam's OpenHathi base model. • 20 items • Updated Mar 2 • 10
IndicXTREME Collection IndicXTREME is a human-supervised benchmark of 9 diverse NLU tasks across 20 languages, featuring 105 evaluation sets in total. • 8 items • Updated Oct 23, 2024 • 2
IndicNLG Collection IndicNLG Benchmark is a dataset collection designed for benchmarking Natural Language Generation (NLG) across 11 Indic languages. • 5 items • Updated Oct 15, 2024 • 5
IndicBERT v2 Collection IndicBERT v2 is a multilingual BERT model pretrained on IndicCorp v2, an Indic monolingual corpus of 20.9 billion tokens, covering 24 consitutionally • 4 items • Updated Oct 15, 2024 • 7
ELAICHI Collection ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and Low-frequency Character Bigrams. • 6 items • Updated Oct 24, 2024 • 9
Indic Parler-TTS Collection Collection of Parler-TTS models adapted to Indian languages. • 3 items • Updated Dec 4, 2024 • 11
BhasaAnuvaad Collection A Speech Translation Dataset for 13 Indian Languages • 8 items • Updated Mar 2 • 26
IndicBERT-v3 Collection A collection of state-of-the-art multilingual base encoder language models (270M, 1B, 4B) for Indic languages. • 3 items • Updated Jan 6 • 3
Bhojpuri and Hindi Rural Women ASR Collection This dataset includes ASR data from rural women speaking Hindi and Bhojpuri, supporting inclusive voice recognition. • 2 items • Updated Nov 6, 2025 • 2
Bhili on MahaVISTAAR Collection MT, ASR & TTS models for Bhili (भीली), specifically the Dehvali Bhili dialect, an Indo-Aryan language spoken by the Bhil community in western India. • 5 items • Updated 7 days ago • 1