Ordis-1.5B-V355-VarGH-GGUF

GGUF Quantized Versions β€” 7 Formats

1.5B | World's Strongest 1.5B Practical Model

"Size implies limits; Architecture breaks them."

Website | Full Model (HF Format) | ModelScope


About Ordis

Ordis is the product of dozens of version iterations and 16+ controlled-variable experiments, built to push the absolute performance ceiling of 1.5B parameters.

Not a benchmark-gaming model. Ordis is engineered for real-world deployment β€” with native anti-hallucination, structured self-correction, causal reasoning, and honest epistemic humility (no safety templates).

For full technical details, benchmarks, demos, and theoretical foundation, see the full model card.


Quantized Versions

Filename Quant Bits Size Max RAM Recommendation
ordis-v355-vargh-1.5b-q2-k.gguf Q2_K 2 ~0.7 GB ~3.2 GB Experimental only
ordis-v355-vargh-1.5b-q3-k-m.gguf Q3_K_M 3 ~0.8 GB ~3.3 GB Low-end devices
ordis-v355-vargh-1.5b-q4-k-m.gguf Q4_K_M 4 ~1.0 GB ~3.5 GB Recommended β€” best balance
ordis-v355-vargh-1.5b-q5-k-m.gguf Q5_K_M 5 ~1.1 GB ~3.6 GB Good quality
ordis-v355-vargh-1.5b-q6-k.gguf Q6_K 6 ~1.3 GB ~3.8 GB Desktop recommended
ordis-v355-vargh-1.5b-q8-0.gguf Q8_0 8 ~1.6 GB ~4.1 GB Near-lossless
ordis-v355-vargh-1.5b-f16.gguf F16 16 ~3.1 GB ~5.6 GB Full precision reference

Which File Should I Choose?

  • Mobile / embedded: Q3_K_M or Q4_K_M (under 1 GB)
  • Laptop / general use: Q4_K_M (best quality-to-size ratio)
  • Desktop with spare RAM: Q6_K or Q8_0 (minimal quality loss)
  • Research / evaluation: F16 (no quantization artifacts)
  • Extreme compression: Q2_K (expect noticeable quality degradation)

Quick Start

Ollama (Easiest)

ollama run hf.co/sugiken/Ordis-1.5B-V355-VarGH-GGUF:Q4_K_M

llama.cpp CLI

./llama-cli -m ordis-v355-vargh-1.5b-q4-k-m.gguf \
  -p "<|im_start|>user\nWhat makes you different?<|im_end|>\n<|im_start|>assistant\n" \
  -ngl 99 -n 512 --temp 0.7 --top-p 0.9

llama-cpp-python

from llama_cpp import Llama

llm = Llama(model_path="ordis-v355-vargh-1.5b-q4-k-m.gguf", n_gpu_layers=-1)

output = llm.create_chat_completion(
    messages=[{"role": "user", "content": "Explain causal reasoning in economics."}],
    max_tokens=512,
    temperature=0.7
)
print(output["choices"][0]["message"]["content"])

ModelScope SDK

from modelscope import snapshot_download
model_dir = snapshot_download('sugiken/Ordis-1.5B-V355-VarGH-GGUF')

Git Clone

git clone https://www.modelscope.cn/sugiken/Ordis-1.5B-V355-VarGH-GGUF.git

Prompt Template

Ordis uses standard ChatML format:

<|im_start|>user
Your question here<|im_end|>
<|im_start|>assistant

Tips:

  • Ask directly β€” no special prompt engineering needed
  • Chinese questions get the strongest responses
  • Ordis automatically uses <think> blocks for reasoning tasks
  • "I don't know" is a feature, not a bug

Benchmarks (Summary)

Standard Benchmarks (lm-eval v0.4.10, 0-shot, A100-80GB)

0-shot = no examples given, same as real user experience. All four metrics beat base model β€” no alignment tax paid.

Benchmark What It Tests Ordis 1.5B Base Qwen Delta
ARC-Challenge Science reasoning 45.22% 40.27% +4.95
HellaSwag Common sense 68.14% 66.06% +2.08
GSM8K (CoT) Math 50.80% 48.07% +2.73
TruthfulQA MC2 Truthfulness 47.73% 43.47% +4.26
Average Overall 52.97% 49.47% +3.50

Custom Evaluations

Benchmark Score
Custom 60-Q Eval (6 dimensions) 85.0% (51/60)
124-Point Comprehensive 86/114 (75.4%) β€” Grade A
CLadder Causal Reasoning 54.3% (highest at this scale)

60-Question Breakdown:

Dimension Score
Reasoning 100%
Common Sense 100%
Defense Overload 100%
Anti-Hallucination 90%
Identity 60%
IDK Ability 60%

Key Capabilities

  • Structured Self-Correction (SSC): 5-step error correction protocol β€” acknowledge, attribute, correct, verify
  • Confidence-Guided Decisions: High confidence β†’ assert; Low confidence β†’ honest refusal; Mid confidence β†’ known limitation
  • Cross-Domain Causal Reasoning: Transfers causal structures across domains, not task-specific memorization
  • Native Anti-Hallucination: Uncertainty from reasoning, not safety templates. No "As an AI..." disclaimers
  • Runtime Metacognition: Internal [Snapshot] cognitive monitoring during conversation

Known Limitations

Limitation Root Cause
Anti-Gaslighting = 0/4 Physical law: open-loop systems cannot verify memory (F=0)
Mid-confidence instability 1.5B capacity ceiling
English identity leakage Base model prior too strong
Celebrity detail hallucination Limited parametric memory

Full analysis and honest disclosure in the complete model card.


Model Details

Property Value
Base Model Qwen/Qwen2.5-1.5B-Instruct
Parameters 1.5B
Fine-tuning LoRA
Context Length 32K tokens (base model native)
Training Seq Length 2048 tokens (optimal performance range)
Languages Chinese (primary), English
Original Format BF16 SafeTensors
GGUF Converted By OrdisAI
License Apache 2.0

Citation

@misc{ordis2026,
  title={Ordis-1.5B-V355-VarGH: The Summit of Small Models},
  author={Liu, S.},
  year={2026},
  publisher={OrdisAI},
  url={https://www.ordisai.com}
}

Ordis-1.5B-V355-VarGH-GGUF | Apache 2.0 | OrdisAI

Built with honest engineering. Shipped with honest disclosure.

Downloads last month
-
GGUF
Model size
2B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support