Ordis-1.5B-V355-VarGH-GGUF

GGUF Quantized Versions — 7 Formats

1.5B | World's Strongest 1.5B Practical Model

"Size implies limits; Architecture breaks them."

Website | Full Model (HF Format) | ModelScope

About Ordis

Ordis is the product of dozens of version iterations and 16+ controlled-variable experiments, built to push the absolute performance ceiling of 1.5B parameters.

Not a benchmark-gaming model. Ordis is engineered for real-world deployment — with native anti-hallucination, structured self-correction, causal reasoning, and honest epistemic humility (no safety templates).

For full technical details, benchmarks, demos, and theoretical foundation, see the full model card.

Quantized Versions

Filename	Quant	Bits	Size	Max RAM	Recommendation
`ordis-v355-vargh-1.5b-q2-k.gguf`	Q2_K	2	~0.7 GB	~3.2 GB	Experimental only
`ordis-v355-vargh-1.5b-q3-k-m.gguf`	Q3_K_M	3	~0.8 GB	~3.3 GB	Low-end devices
`ordis-v355-vargh-1.5b-q4-k-m.gguf`	Q4_K_M	4	~1.0 GB	~3.5 GB	Recommended — best balance
`ordis-v355-vargh-1.5b-q5-k-m.gguf`	Q5_K_M	5	~1.1 GB	~3.6 GB	Good quality
`ordis-v355-vargh-1.5b-q6-k.gguf`	Q6_K	6	~1.3 GB	~3.8 GB	Desktop recommended
`ordis-v355-vargh-1.5b-q8-0.gguf`	Q8_0	8	~1.6 GB	~4.1 GB	Near-lossless
`ordis-v355-vargh-1.5b-f16.gguf`	F16	16	~3.1 GB	~5.6 GB	Full precision reference

Which File Should I Choose?

Mobile / embedded: Q3_K_M or Q4_K_M (under 1 GB)
Laptop / general use: Q4_K_M (best quality-to-size ratio)
Desktop with spare RAM: Q6_K or Q8_0 (minimal quality loss)
Research / evaluation: F16 (no quantization artifacts)
Extreme compression: Q2_K (expect noticeable quality degradation)

Quick Start

Ollama (Easiest)

ollama run hf.co/sugiken/Ordis-1.5B-V355-VarGH-GGUF:Q4_K_M

llama.cpp CLI

./llama-cli -m ordis-v355-vargh-1.5b-q4-k-m.gguf \
  -p "<|im_start|>user\nWhat makes you different?<|im_end|>\n<|im_start|>assistant\n" \
  -ngl 99 -n 512 --temp 0.7 --top-p 0.9

llama-cpp-python

from llama_cpp import Llama

llm = Llama(model_path="ordis-v355-vargh-1.5b-q4-k-m.gguf", n_gpu_layers=-1)

output = llm.create_chat_completion(
    messages=[{"role": "user", "content": "Explain causal reasoning in economics."}],
    max_tokens=512,
    temperature=0.7
)
print(output["choices"][0]["message"]["content"])

ModelScope SDK

from modelscope import snapshot_download
model_dir = snapshot_download('sugiken/Ordis-1.5B-V355-VarGH-GGUF')

Git Clone

git clone https://www.modelscope.cn/sugiken/Ordis-1.5B-V355-VarGH-GGUF.git

Prompt Template

Ordis uses standard ChatML format:

<|im_start|>user
Your question here<|im_end|>
<|im_start|>assistant

Tips:

Ask directly — no special prompt engineering needed
Chinese questions get the strongest responses
Ordis automatically uses <think> blocks for reasoning tasks
"I don't know" is a feature, not a bug

Benchmarks (Summary)

Standard Benchmarks (lm-eval v0.4.10, 0-shot, A100-80GB)

0-shot = no examples given, same as real user experience. All four metrics beat base model — no alignment tax paid.

Benchmark	What It Tests	Ordis 1.5B	Base Qwen	Delta
ARC-Challenge	Science reasoning	45.22%	40.27%	+4.95
HellaSwag	Common sense	68.14%	66.06%	+2.08
GSM8K (CoT)	Math	50.80%	48.07%	+2.73
TruthfulQA MC2	Truthfulness	47.73%	43.47%	+4.26
Average	Overall	52.97%	49.47%	+3.50

Custom Evaluations

Benchmark	Score
Custom 60-Q Eval (6 dimensions)	85.0% (51/60)
124-Point Comprehensive	86/114 (75.4%) — Grade A
CLadder Causal Reasoning	54.3% (highest at this scale)

60-Question Breakdown:

Dimension	Score
Reasoning	100%
Common Sense	100%
Defense Overload	100%
Anti-Hallucination	90%
Identity	60%
IDK Ability	60%

Key Capabilities

Structured Self-Correction (SSC): 5-step error correction protocol — acknowledge, attribute, correct, verify
Confidence-Guided Decisions: High confidence → assert; Low confidence → honest refusal; Mid confidence → known limitation
Cross-Domain Causal Reasoning: Transfers causal structures across domains, not task-specific memorization
Native Anti-Hallucination: Uncertainty from reasoning, not safety templates. No "As an AI..." disclaimers
Runtime Metacognition: Internal [Snapshot] cognitive monitoring during conversation

Known Limitations

Limitation	Root Cause
Anti-Gaslighting = 0/4	Physical law: open-loop systems cannot verify memory (F=0)
Mid-confidence instability	1.5B capacity ceiling
English identity leakage	Base model prior too strong
Celebrity detail hallucination	Limited parametric memory

Full analysis and honest disclosure in the complete model card.

Model Details

Property	Value
Base Model	Qwen/Qwen2.5-1.5B-Instruct
Parameters	1.5B
Fine-tuning	LoRA
Context Length	32K tokens (base model native)
Training Seq Length	2048 tokens (optimal performance range)
Languages	Chinese (primary), English
Original Format	BF16 SafeTensors
GGUF Converted By	OrdisAI
License	Apache 2.0

Citation

@misc{ordis2026,
  title={Ordis-1.5B-V355-VarGH: The Summit of Small Models},
  author={Liu, S.},
  year={2026},
  publisher={OrdisAI},
  url={https://www.ordisai.com}
}

Ordis-1.5B-V355-VarGH-GGUF | Apache 2.0 | OrdisAI

Built with honest engineering. Shipped with honest disclosure.

Downloads last month: -

GGUF

Model size

2B params

Architecture

qwen2

Hardware compatibility

16-bit

View +6 variants