Model Card: SS-350M-SQL-Strict
Model Summary
SS-350M-SQL-Strict is a specialized, lightweight LLM fine-tuned for the singular task of Text-to-SQL translation. Built upon the LiquidAI LFM2.5-350M architecture, this model has been engineered to follow a "Strict" output protocol: it generates only raw SQL code, eliminating the conversational filler, Markdown blocks, and explanations typically found in general-purpose models.
By leveraging 4-bit QLoRA and Unsloth optimizations, this model provides high-speed, low-latency SQL generation suitable for edge deployment and resource-constrained environments.
Model Details
- Developed by: Saad Salman
- Architecture: Liquid Foundation Model (LFM) 2.5
- Parameters: 350 Million
- Quantization: 4-bit (bitsandbytes)
- Fine-tuning Method: QLoRA
- Primary Task: Natural Language to SQL (Strict)
Training Logic & Parameters
The model was trained using a custom pipeline to enforce strict code generation. The key differentiator is the use of Completion-Only Loss masking, which prevents the model from wasting weights on learning the prompt structure, focusing 100% of its learning capacity on the SQL syntax.
Hyperparameters
| Parameter | Value | Description |
|---|---|---|
| Max Steps | 800 | Optimal convergence point for 350M params |
| Learning Rate | 2e-4 | High enough for rapid logic acquisition |
| Batch Size | 16 | (4 per device with 4 grad accumulation) |
| Rank (r) | 32 | High rank to capture complex SQL logic |
| Alpha | 32 | Scaling factor for LoRA weights |
| Optimizer | AdamW 8-bit | Memory-efficient optimization |
Training Curve Analysis
The model demonstrated a classic "L-shaped" convergence curve. Initial loss started at ~38.1 and successfully plateaued between 8.0 and 11.0. This plateau indicates the model has fully internalized the ChatML structure and the SQL schema-mapping logic.
Prompting Specification (ChatML)
To ensure the "Strict" behavior, you must use the following ChatML format. Failure to use this format may result in hallucinated text.
Template
<|im_start|>system
You are a SQL translation engine. Return ONLY raw SQL. Schema: {YOUR_SCHEMA}<|im_end|>
<|im_start|>user
{YOUR_QUESTION}<|im_end|>
<|im_start|>assistant
Example Input
<|im_start|>system
You are a SQL translation engine. Return ONLY raw SQL. Schema: Table 'orders' (id, price, status, created_at)<|im_end|>
<|im_start|>user
Find the average price of all 'completed' orders.<|im_end|>
<|im_start|>assistant
Example Output
SELECT AVG(price) FROM orders WHERE status = 'completed';
Training Dataset
The model was trained on the Gretel Synthetic SQL dataset. This dataset is designed to cover:
- Complex joins and subqueries.
- Diverse industry domains (Finance, Retail, Tech).
- Correct handling of
GROUP BY,ORDER BY, andHAVINGclauses.
Technical Limitations
- Schema Size: Best suited for schemas with < 20 tables.
- Dialect: Defaulted to standard SQL.
- Reasoning: The model does not "explain" its code; it is a direct translation engine.
How to Use with Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "saadxsalman/SS-350M-SQL-Strict"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")
# Ready for inference!
- Downloads last month
- 182