ORCA6 v0.1-rc1

ORCA6 is an orchestration-advisor model project focused on AI tool selection, workflow architecture, RAG design, model-routing tradeoffs, and developer automation strategy.

Current best local adapter:

qwen3_14b_orca_refusal_smoke/

This checkpoint is a guarded local release candidate, not a broad public-quality model. It should be used with the source-packet runtime guard documented in the evaluation report, runbook, and local adapter CLI.

Intended Use

  • Recommend orchestration patterns for AI developer workflows.
  • Compare tools such as n8n, LangGraph, LiteLLM, Qdrant, Langfuse, Promptfoo, MCP servers, local inference stacks, and related infrastructure.
  • Provide architecture tradeoffs, implementation plans, and conservative next steps.

Not Intended For

  • Executing code or tools directly.
  • Legal, medical, financial, or safety-critical decisions.
  • General code-completion benchmarks unrelated to orchestration.

Training Data

  • SFT train rows: 41
  • SFT validation rows: 3
  • DPO preference rows: 30
  • Preference source: auto-graded bootstrap preferences
  • Corpus source: GitHub documentation chunks from the ORCA6 pilot retrieval set.
  • Final SFT source mix: {"graded_preference": 27, "grounded_sft_builder": 8, "refusal_sft_builder": 6}
  • Final answer word count: min=102, max=239, avg=187.05
  • Grounded SFT rows include retrieved-evidence citation examples and refusal hard negatives for empty evidence, unsupported claims, high-risk automation, medical-record access, and certification/compliance claims.

Training Dataset Audit

Dataset Rows Sources Avg answer words Issues
data/training_sft.jsonl 27 {"graded_preference": 27} 202.22 0
data/validation_sft.jsonl 3 graded_preference 213.0 0
data/training_sft_plus_grounded.jsonl 35 {"graded_preference": 27, "grounded_sft_builder": 8} 198.37 0
data/refusal_sft_hard_negatives.jsonl 6 {"refusal_sft_builder": 6} 121.0 0
data/training_sft_plus_grounded_refusals.jsonl 41 {"graded_preference": 27, "grounded_sft_builder": 8, "refusal_sft_builder": 6} 187.05 0

Models and Infrastructure Used

Role Model / System Notes
Initial smoke base Qwen/Qwen2.5-0.5B-Instruct Small-model SFT smoke path validation.
Attempted 14B base Qwen/Qwen2.5-14B-Instruct Initial fetch was too slow/stalled; not the release base.
Release base Qwen/Qwen3-14B Official adapter base and Hugging Face base_model.
Local downloaded base path .cache/orca6-qwen3-14b-download Local path used by current helper defaults; historical run used /tmp/orca6-qwen3-14b-download.
Preference/judge/bootstrap generation qwen3-coder:30b via Ollama Used for local answer generation / auto-graded bootstrap preference data.
Embedding model nomic-embed-text:latest via Ollama Used to embed pilot corpus chunks into Qdrant.
Fine-tuning stack Unsloth + TRL SFTTrainer + PEFT LoRA / QLoRA 4-bit training profile on local RTX 3090.

Training Rounds

Round Base model Adapter output Rows Validation rows Profile Final loss Eval / pass result
Small-model smoke Qwen/Qwen2.5-0.5B-Instruct qwen_finetuned_v0_smoke2/ 27 3 smoke SFT eval_loss=1.762, train_loss=2.260, 14 steps Pipeline smoke only
Qwen3 14B smoke Qwen/Qwen3-14B qwen3_14b_orca_smoke/ 27 3 14B LoRA smoke eval_loss=1.264, train_loss=1.537, 14 steps 8 held-out prompts generated; quality not release-ready
Grounded fit-check Qwen/Qwen3-14B qwen3_14b_orca_grounded_smoke/ 35 3 seq_length=512, lora_r=16, lora_alpha=32, lora_dropout=0, grad accum 2 eval_loss=1.211, train_loss=1.592, 36 steps Grounded eval 11/12 = 91.7%
Refusal fit-check / release adapter Qwen/Qwen3-14B qwen3_14b_orca_refusal_smoke/ 41 3 seq_length=512, lora_r=16, lora_alpha=32, lora_dropout=0, grad accum 2 eval_loss=1.206, train_loss=1.510, 42 steps Guarded eval 12/12 = 100.0%; expanded release eval 54/54 = 100.0%

Earlier 2048-token/r64 and 1024-token/r32 grounded-profile attempts hit CUDA OOM with the longer grounded examples. The release fit-check profile settled on 512 tokens and LoRA rank 16 on the RTX 3090.

Retrieval and Vector Database Stack

The model was trained and evaluated around a source-packet workflow rather than free-form citation generation.

Component Setting
Vector database used for active retrieval Qdrant
Qdrant collection orca6_pilot
Embedding model nomic-embed-text:latest through Ollama
Embedding dimension 768
Vector distance Cosine
Lexical retrieval In-process BM25 over data/pilot_orchestration_chunks.jsonl
Rank fusion Reciprocal-rank fusion plus exact-match/domain-cue boosts
Vector DBs represented in corpus/tool coverage Qdrant, Chroma, Weaviate, pgvector
Other retrieval/RAG tools represented LlamaIndex, Ragas, LangGraph, Langfuse, LangSmith, Promptfoo, LiteLLM, Ollama, llama.cpp, vLLM

Evaluation

Latest recorded retrieval metrics:

{
  "queries": 20,
  "calibrated_pass_at_1": 0.7,
  "calibrated_pass_at_3": 1.0,
  "calibrated_pass_at_5": 1.0,
  "hit_at_1": 1.0,
  "hit_at_3": 1.0,
  "hit_at_5": 1.0,
  "all_expected_at_3": 0.95,
  "all_expected_at_5": 1.0,
  "all_expected_at_10": 1.0
}

Latest guarded grounded-answer eval:

{
  "outputs": "evals/qwen3_14b_orca_refusal_guarded_eval_outputs.jsonl",
  "total": 12,
  "passed": 12,
  "pass_rate": 1.0,
  "by_type": {
    "source_packet": {
      "passed": 8,
      "total": 8
    },
    "hard_negative": {
      "passed": 4,
      "total": 4
    }
  }
}

Expanded v0.1-rc1 grounded release eval:

{
  "total": 54,
  "passed": 54,
  "pass_rate": 1.0,
  "by_type": {
    "source_packet": {
      "passed": 50,
      "total": 50
    },
    "hard_negative": {
      "passed": 4,
      "total": 4
    }
  },
  "outputs": "evals/qwen3_14b_orca_refusal_release_grounded_outputs.jsonl"
}

Evaluation Matrix

Evaluation Passed Total Pass rate Notes
Retrieval calibrated pass@1 14 20 70.0% Smoke retrieval exact/semantic check
Retrieval calibrated pass@3 20 20 100.0% Re-check matched recorded metrics
Retrieval calibrated pass@5 20 20 100.0% Re-check matched recorded metrics
Retrieval all-expected@3 19 20 95.0% Multi-expected query coverage
Retrieval all-expected@5 20 20 100.0% Multi-expected query coverage
Grounded adapter, unguarded 11 12 91.7% Pre-refusal grounded adapter; failed one empty-evidence hard negative
Refusal adapter, unguarded 11 12 91.7% Still failed one empty-evidence high-risk citation case
Refusal adapter + runtime guard 12 12 100.0% 8/8 source-packet, 4/4 hard-negative
Expanded release grounded eval 54 54 100.0% 50/50 source-packet, 4/4 hard-negative

Unguarded refusal eval passed 11/12. The remaining unguarded failure was an empty-evidence, high-risk payment automation prompt where the model invented a source citation. The current gate therefore requires the runtime source-packet guard.

Artifact Statistics

Artifact Path Size / Count
Published adapter package adapter/ on Hugging Face LoRA adapter, tokenizer, chat template, and config; merged shards/runs/checkpoints excluded from upload
Adapter weights qwen3_14b_orca_refusal_smoke/adapter_model.safetensors 256,976,504 bytes
Local adapter tree qwen3_14b_orca_refusal_smoke 27.77 GB including local merged model artifacts under the ignored working tree
Local merged model qwen3_14b_orca_refusal_smoke/merged 27.52 GB; 6 safetensors shards
Local Q8_0 GGUF release/gguf/orca6-qwen3-14b-refusal-q8_0.gguf 14.62 GB
Release manifest release/release_manifest.json 126 tracked release artifacts
GitHub release candidate assets v0.1-rc1 109/109 expected assets attached
Hugging Face model repo veroarc/ORCA6 Adapter, tokenizer, model card, eval reports, release notes
Hugging Face feedback Space veroarc/orca6-feedback Manual feedback intake UI

Release artifact checksums are recorded in:

release/release_manifest.json

Limitations

  • The current dataset is small and should be treated as a v0 bootstrap.
  • Auto-graded preferences are useful for pipeline validation but should be replaced or supplemented with human preference labels.
  • Recommendations are only as current as the indexed source corpus.
  • The adapter is not intended for unguarded citation-heavy answering. Use a runtime prompt guard that forbids invented source IDs, URLs, integrations, certifications, guarantees, and high-risk actions without retrieved evidence.
  • The model must not execute tools or approve irreversible actions.

Release Notes

  • Generated: 2026-06-26
  • Version: v0.1-rc1
  • Base model target: Qwen/Qwen3-14B
  • Adapter target: qwen3_14b_orca_refusal_smoke/
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for veroarc/ORCA6

Finetuned
Qwen/Qwen3-14B
Finetuned
(277)
this model