Model Card: SS-350M-SQL-Strict

Model Summary

SS-350M-SQL-Strict is a specialized, lightweight LLM fine-tuned for the singular task of Text-to-SQL translation. Built upon the LiquidAI LFM2.5-350M architecture, this model has been engineered to follow a "Strict" output protocol: it generates only raw SQL code, eliminating the conversational filler, Markdown blocks, and explanations typically found in general-purpose models.

By leveraging 4-bit QLoRA and Unsloth optimizations, this model provides high-speed, low-latency SQL generation suitable for edge deployment and resource-constrained environments.


Model Details

  • Developed by: Saad Salman
  • Architecture: Liquid Foundation Model (LFM) 2.5
  • Parameters: 350 Million
  • Quantization: 4-bit (bitsandbytes)
  • Fine-tuning Method: QLoRA
  • Primary Task: Natural Language to SQL (Strict)

Training Logic & Parameters

The model was trained using a custom pipeline to enforce strict code generation. The key differentiator is the use of Completion-Only Loss masking, which prevents the model from wasting weights on learning the prompt structure, focusing 100% of its learning capacity on the SQL syntax.

Hyperparameters

Parameter Value Description
Max Steps 800 Optimal convergence point for 350M params
Learning Rate 2e-4 High enough for rapid logic acquisition
Batch Size 16 (4 per device with 4 grad accumulation)
Rank (r) 32 High rank to capture complex SQL logic
Alpha 32 Scaling factor for LoRA weights
Optimizer AdamW 8-bit Memory-efficient optimization

Training Curve Analysis

The model demonstrated a classic "L-shaped" convergence curve. Initial loss started at ~38.1 and successfully plateaued between 8.0 and 11.0. This plateau indicates the model has fully internalized the ChatML structure and the SQL schema-mapping logic.


Prompting Specification (ChatML)

To ensure the "Strict" behavior, you must use the following ChatML format. Failure to use this format may result in hallucinated text.

Template

<|im_start|>system
You are a SQL translation engine. Return ONLY raw SQL. Schema: {YOUR_SCHEMA}<|im_end|>
<|im_start|>user
{YOUR_QUESTION}<|im_end|>
<|im_start|>assistant

Example Input

<|im_start|>system
You are a SQL translation engine. Return ONLY raw SQL. Schema: Table 'orders' (id, price, status, created_at)<|im_end|>
<|im_start|>user
Find the average price of all 'completed' orders.<|im_end|>
<|im_start|>assistant

Example Output

SELECT AVG(price) FROM orders WHERE status = 'completed';

Training Dataset

The model was trained on the Gretel Synthetic SQL dataset. This dataset is designed to cover:

  • Complex joins and subqueries.
  • Diverse industry domains (Finance, Retail, Tech).
  • Correct handling of GROUP BY, ORDER BY, and HAVING clauses.

Technical Limitations

  • Schema Size: Best suited for schemas with < 20 tables.
  • Dialect: Defaulted to standard SQL.
  • Reasoning: The model does not "explain" its code; it is a direct translation engine.

How to Use with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "saadxsalman/SS-350M-SQL-Strict"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")

# Ready for inference!
Downloads last month
182
Safetensors
Model size
0.4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for saadxsalman/SS-350M-SQL-Strict

Finetuned
(10)
this model
Quantizations
1 model

Dataset used to train saadxsalman/SS-350M-SQL-Strict