LFM2.5-350M-Base
LFM2.5 is a new family of hybrid models designed for on-device deployment. It builds on the LFM2 architecture with extended pre-training and reinforcement learning.
Find more information about LFM2.5-350M in our blog post.
ποΈ Model Details
| Model | Parameters | Description |
|---|---|---|
| LFM2.5-350M-Base | 350M | Pre-trained base model for fine-tuning |
| LFM2.5-350M | 350M | General-purpose instruction-tuned model |
LFM2.5-350M is a general-purpose text-only model with the following features:
- Number of parameters: 350M
- Number of layers: 16 (10 double-gated LIV convolution blocks + 6 GQA blocks)
- Training budget: 28T tokens
- Context length: 32,768 tokens
- Vocabulary size: 65,536
- Knowledge cutoff: Mid-2024
- Languages: English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, Spanish
This pre-trained checkpoint is only recommended for tasks that require heavy fine-tuning, like language-specific (e.g., Japanese) or domain-specific (e.g., medical) assistants, training on proprietary data, or experimenting with novel post-training approaches.
π Inference
LFM2.5 is supported by many inference frameworks. See the Inference documentation for the full list.
| Name | Description | Docs | Notebook |
|---|---|---|---|
| Transformers | Simple inference with direct access to model internals. | Link | ![]() |
| vLLM | High-throughput production deployments with GPU. | Link | ![]() |
| llama.cpp | Cross-platform inference with CPU offloading. | Link | ![]() |
| MLX | Apple's machine learning framework optimized for Apple Silicon. | Link | β |
| LM Studio | Desktop application for running LLMs locally. | Link | β |
Here's a quick start example with Transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
model_id = "LiquidAI/LFM2.5-350M-Base"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
dtype="bfloat16",
# attn_implementation="flash_attention_2" <- uncomment on compatible GPU
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
prompt = "What is C. elegans?"
input_ids = tokenizer.apply_chat_template(
[{"role": "user", "content": prompt}],
add_generation_prompt=True,
return_tensors="pt",
tokenize=True,
).to(model.device)
output = model.generate(
input_ids,
do_sample=True,
temperature=0.1,
top_k=50,
repetition_penalty=1.05,
max_new_tokens=512,
streamer=streamer,
)
π§ Fine-Tuning
We recommend fine-tuning LFM2.5 for your specific use case to achieve the best results.
| Name | Description | Docs | Notebook |
|---|---|---|---|
| CPT (Unsloth) | Continued Pre-Training using Unsloth for text completion. | Link | ![]() |
| CPT (Unsloth) | Continued Pre-Training using Unsloth for translation. | Link | ![]() |
| SFT (Unsloth) | Supervised Fine-Tuning with LoRA using Unsloth. | Link | ![]() |
| SFT (TRL) | Supervised Fine-Tuning with LoRA using TRL. | Link | ![]() |
| DPO (TRL) | Direct Preference Optimization with LoRA using TRL. | Link | ![]() |
| GRPO (Unsloth) | GRPO with LoRA using Unsloth. | Link | ![]() |
| GRPO (TRL) | GRPO with LoRA using TRL. | Link | ![]() |
π¬ Contact
- Got questions or want to connect? Join our Discord community
- If you are interested in custom solutions with edge deployment, please contact our sales team.
Citation
@article{liquidAI2026350M,
author = {Liquid AI},
title = {LFM2.5-350M: No Size Left Behind},
journal = {Liquid AI Blog},
year = {2026},
note = {www.liquid.ai/blog/lfm2-5-350m-no-size-left-behind},
}
@article{liquidai2025lfm2,
title={LFM2 Technical Report},
author={Liquid AI},
journal={arXiv preprint arXiv:2511.23404},
year={2025}
}
- Downloads last month
- -
