Daily Model Scout Report — 2026-04-27

#16
by msudharsanan - opened
Denali Advanced Integration org

Daily Model Scout Report — 2026-04-27

Scouting window: 2026-04-20 → 2026-04-27 (last 7 days), with a few late-March / mid-April items pulled in when they're clearly the headline release of the cycle.

Baselines (3,500-sample hard eval, _overall.weighted_score)

Model Score Role
qwen3-vl-8b-sft+grpo 0.9131 best overall
qwen3-vl-8b-sft-grpo-nvfp4 0.8945 best quantized
qwen3-vl-2b-sft-grpo-v9 0.8948 best small
qwen35-2b-base 0.8437 best Qwen3.5 base

High relevance — benchmark immediately

1. Qwen/Qwen3.6-27B (link)

  • Released: 2026-04-21 · Likes: 899 · Downloads: ~400k
  • Architecture: Dense 27B causal LM with vision encoder, 64 layers, hidden 5120
  • Modalities: text + image + video
  • License: Apache 2.0
  • Context: 262k native, up to ~1M with YaRN
  • Reported scores: VideoMME 87.7, V* 94.7, MMLU-Pro 86.2, GPQA-Diamond 87.8
  • Why it matters: This is the natural successor to Qwen3-VL-8B (our best base). Same family lineage, bigger backbone, fresh post-training. With our existing SFT+GRPO recipe it should land above 0.9131 if the underlying base is stronger than Qwen3-VL-8B-Instruct.
  • Cost note: 27B BF16 ≈ 54 GB weights — fits on a single RTX PRO 6000 98 GB for inference and SFT, but tighter than 8B. FP8 variant below halves it.

2. Qwen/Qwen3.6-35B-A3B (link)

  • Released: 2026-04-15 · Likes: 1,448 · Downloads: ~1.35M
  • Architecture: MoE — 35B total / 3B activated per token, 256 experts (8 routed + 1 shared), Gated DeltaNet + Gated Attention hybrid layout
  • Modalities: text + image + video (up to 224k video tokens)
  • License: Apache 2.0
  • Context: 262k native, ~1M with YaRN
  • Reported scores: RealWorldQA 85.3, MMBench-EN-DEV-v1.1 92.8, OmniDocBench 89.9, VideoMMU 83.7
  • Why it matters: Active-param footprint is ~3B, so inference cost is comparable to our 2B SFT model while quality should approach the 27B dense. Strong document-understanding numbers (OmniDoc 89.9) are directly relevant to apparel tag/label OCR fields. Thinking mode is on by default — for our 9-field JSON extraction task, force non-thinking mode at eval.
  • Cost note: 35B BF16 ≈ 70 GB — fits, but NVFP4 is the obvious deployment target.

3. RedHatAI/Qwen3.6-35B-A3B-NVFP4 (link)

  • Released: ~2026-04-15+ · Likes: 106 · Downloads: ~525k
  • Already-quantized NVFP4 build of #2. Matches our existing deployment format (we already run qwen3-vl-8b-sft-grpo-nvfp4 at 0.8945). Should land in the ~17 GB weight range — easy fit.
  • Why it matters: Skips the quantization step we'd otherwise have to redo ourselves. Good first-pass benchmark to see whether the 35B-A3B family is worth investing SFT cycles in.

4. Qwen/Qwen3.6-27B-FP8 and Qwen/Qwen3.6-35B-A3B-FP8

  • Official Qwen FP8 quants released alongside the BF16 weights. Useful as a sanity-check rung between BF16 and our NVFP4 path.

Medium relevance — worth watching

5. moonshotai/Kimi-K2.6 (link)

  • Released: 2026-04-14 · Likes: 1,093 · Downloads: ~443k
  • Architecture: 1T total / 32B activated, MoonViT 400M vision encoder, 384 experts, 8 active per token
  • License: Modified MIT
  • Reported scores: MMMU-Pro 79.4, MathVision 87.4, SWE-Bench 80.2
  • Why medium, not high: 1T parameters does not fit on 98 GB even at NVFP4 (~250 GB). Vision is also somewhat secondary in this release (mostly an agentic-coding model). Track for distilled or smaller variants.

6. Community quants of Qwen3.6 (unsloth/*-GGUF, cyankiwi/*-AWQ, lmstudio-community/*)

  • Useful for local CPU/Mac smoke tests but not production candidates given our NVFP4 + RTX PRO 6000 path.

Low relevance / context only

  • tencent/HY-Embodied-0.5-X (2026-04-23, 4B/2B-active VLM): purpose-built for robotics / embodied planning, not general image classification. Skip.
  • kai-os/Carnice-V2-27b (2026-04-25, 32 likes): community uncensored finetune in the Qwen3.6-27B family — irrelevant for our task.
  • Guilherme34/Darwin-36B-Opus-ABLITERATED-HERETIC (2026-04-26): abliterated/distill chain, not a base candidate.
  • nvidia/Qwen3-VL-235B-A22B-Instruct-NVFP4-MLPerf-Inference-Closed-V6.1 (2026-04-07): NVIDIA's own NVFP4 of last-generation Qwen3-VL-235B — useful reference for NVFP4 calibration recipes, not a deployment target (235B too big).
  • No new Florence-3, InternVL4, PaliGemma3, Idefics4, MiniCPM-V-5, DeepSeek-VL3, LLaVA-OneVision-2, SmolVLM-3, Phi-5-Vision, or Molmo-2 releases in the window. The week is dominated by Qwen3.6.
  • No fashion/apparel/garment-specific VLM finetunes of note this week. (Denali-AI/granite4-vision-garment-classifier from 2026-04-03 is our own.)

Recommended next actions

  1. Run zero-shot eval on the 3.5k-sample hard eval set for, in priority order:
    • Qwen/Qwen3.6-27B (BF16 or FP8)
    • RedHatAI/Qwen3.6-35B-A3B-NVFP4 (cheapest first look at 35B-A3B)
    • Qwen/Qwen3.6-35B-A3B (BF16) if NVFP4 looks promising
  2. If zero-shot meets or beats qwen3-vl-8b-instruct-base (0.8751), kick off SFT+GRPO with the standard 9-field recipe and full pipeline (eval on 3.5k → update JSON/wiki → upload to HF with model card + charts).
  3. Force non-thinking mode for Qwen3.6-35B-A3B eval — JSON-extraction tasks don't benefit from <think> traces and they'll inflate latency.
  4. Hold on Kimi-K2.6 until a smaller distilled variant lands — not deployable on current hardware.

Compiled by /hf-model-scout · 2026-04-27 · sources: HF Hub image-text-to-text listings sorted by created_at, individual model cards for top candidates.

Sign up or log in to comment