Upload 4 files
#17
by coreprinciple - opened
Add BGE-M3 (BAAI/bge-m3)
Model: BAAI/bge-m3
Architecture: XLM-RoBERTa (large) β 568M parameters
Embedding dimensions: 1024
Max sequence length: 8192 tokens
Languages: 100+
Conversion
Converted using optimum-cli export onnx --model BAAI/bge-m3 --task feature-extraction.
Validated ONNX output against PyTorch: cosine similarity > 0.9999 across English,
French, and Chinese test sentences.
Local testing
Tested with Typesense 29.0 via Docker:
- Collection creation
- Document indexing with auto-embedding
- Semantic search (English)
- Cross-lingual semantic search (French query β English results)
Config
model_type: xlm_roberta
vocab_file_name: sentencepiece.bpe.model