Daily Model Scout Report — 2026-04-27

#16

by msudharsanan - opened 18 days ago

Denali Advanced Integration org 18 days ago

Daily Model Scout Report — 2026-04-27

Scouting window: 2026-04-20 → 2026-04-27 (last 7 days), with a few late-March / mid-April items pulled in when they're clearly the headline release of the cycle.

Baselines (3,500-sample hard eval, `_overall.weighted_score`)

Model	Score	Role
`qwen3-vl-8b-sft+grpo`	0.9131	best overall
`qwen3-vl-8b-sft-grpo-nvfp4`	0.8945	best quantized
`qwen3-vl-2b-sft-grpo-v9`	0.8948	best small
`qwen35-2b-base`	0.8437	best Qwen3.5 base

High relevance — benchmark immediately

1. `Qwen/Qwen3.6-27B` (link)

Released: 2026-04-21 · Likes: 899 · Downloads: ~400k
Architecture: Dense 27B causal LM with vision encoder, 64 layers, hidden 5120
Modalities: text + image + video
License: Apache 2.0
Context: 262k native, up to ~1M with YaRN
Reported scores: VideoMME 87.7, V* 94.7, MMLU-Pro 86.2, GPQA-Diamond 87.8
Why it matters: This is the natural successor to Qwen3-VL-8B (our best base). Same family lineage, bigger backbone, fresh post-training. With our existing SFT+GRPO recipe it should land above 0.9131 if the underlying base is stronger than Qwen3-VL-8B-Instruct.
Cost note: 27B BF16 ≈ 54 GB weights — fits on a single RTX PRO 6000 98 GB for inference and SFT, but tighter than 8B. FP8 variant below halves it.

2. `Qwen/Qwen3.6-35B-A3B` (link)

Released: 2026-04-15 · Likes: 1,448 · Downloads: ~1.35M
Architecture: MoE — 35B total / 3B activated per token, 256 experts (8 routed + 1 shared), Gated DeltaNet + Gated Attention hybrid layout
Modalities: text + image + video (up to 224k video tokens)
License: Apache 2.0
Context: 262k native, ~1M with YaRN
Reported scores: RealWorldQA 85.3, MMBench-EN-DEV-v1.1 92.8, OmniDocBench 89.9, VideoMMU 83.7
Why it matters: Active-param footprint is ~3B, so inference cost is comparable to our 2B SFT model while quality should approach the 27B dense. Strong document-understanding numbers (OmniDoc 89.9) are directly relevant to apparel tag/label OCR fields. Thinking mode is on by default — for our 9-field JSON extraction task, force non-thinking mode at eval.
Cost note: 35B BF16 ≈ 70 GB — fits, but NVFP4 is the obvious deployment target.

3. `RedHatAI/Qwen3.6-35B-A3B-NVFP4` (link)

Released: ~2026-04-15+ · Likes: 106 · Downloads: ~525k
Already-quantized NVFP4 build of #2. Matches our existing deployment format (we already run qwen3-vl-8b-sft-grpo-nvfp4 at 0.8945). Should land in the ~17 GB weight range — easy fit.
Why it matters: Skips the quantization step we'd otherwise have to redo ourselves. Good first-pass benchmark to see whether the 35B-A3B family is worth investing SFT cycles in.

4. `Qwen/Qwen3.6-27B-FP8` and `Qwen/Qwen3.6-35B-A3B-FP8`

Official Qwen FP8 quants released alongside the BF16 weights. Useful as a sanity-check rung between BF16 and our NVFP4 path.

Medium relevance — worth watching

5. `moonshotai/Kimi-K2.6` (link)

Released: 2026-04-14 · Likes: 1,093 · Downloads: ~443k
Architecture: 1T total / 32B activated, MoonViT 400M vision encoder, 384 experts, 8 active per token
License: Modified MIT
Reported scores: MMMU-Pro 79.4, MathVision 87.4, SWE-Bench 80.2
Why medium, not high: 1T parameters does not fit on 98 GB even at NVFP4 (~250 GB). Vision is also somewhat secondary in this release (mostly an agentic-coding model). Track for distilled or smaller variants.

6. Community quants of Qwen3.6 (`unsloth/-GGUF`, `cyankiwi/-AWQ`, `lmstudio-community/*`)

Useful for local CPU/Mac smoke tests but not production candidates given our NVFP4 + RTX PRO 6000 path.

Low relevance / context only

tencent/HY-Embodied-0.5-X (2026-04-23, 4B/2B-active VLM): purpose-built for robotics / embodied planning, not general image classification. Skip.
kai-os/Carnice-V2-27b (2026-04-25, 32 likes): community uncensored finetune in the Qwen3.6-27B family — irrelevant for our task.
Guilherme34/Darwin-36B-Opus-ABLITERATED-HERETIC (2026-04-26): abliterated/distill chain, not a base candidate.
nvidia/Qwen3-VL-235B-A22B-Instruct-NVFP4-MLPerf-Inference-Closed-V6.1 (2026-04-07): NVIDIA's own NVFP4 of last-generation Qwen3-VL-235B — useful reference for NVFP4 calibration recipes, not a deployment target (235B too big).
No new Florence-3, InternVL4, PaliGemma3, Idefics4, MiniCPM-V-5, DeepSeek-VL3, LLaVA-OneVision-2, SmolVLM-3, Phi-5-Vision, or Molmo-2 releases in the window. The week is dominated by Qwen3.6.
No fashion/apparel/garment-specific VLM finetunes of note this week. (Denali-AI/granite4-vision-garment-classifier from 2026-04-03 is our own.)

Recommended next actions

Run zero-shot eval on the 3.5k-sample hard eval set for, in priority order:
- Qwen/Qwen3.6-27B (BF16 or FP8)
- RedHatAI/Qwen3.6-35B-A3B-NVFP4 (cheapest first look at 35B-A3B)
- Qwen/Qwen3.6-35B-A3B (BF16) if NVFP4 looks promising
If zero-shot meets or beats qwen3-vl-8b-instruct-base (0.8751), kick off SFT+GRPO with the standard 9-field recipe and full pipeline (eval on 3.5k → update JSON/wiki → upload to HF with model card + charts).
Force non-thinking mode for Qwen3.6-35B-A3B eval — JSON-extraction tasks don't benefit from <think> traces and they'll inflate latency.
Hold on Kimi-K2.6 until a smaller distilled variant lands — not deployable on current hardware.

Compiled by /hf-model-scout · 2026-04-27 · sources: HF Hub image-text-to-text listings sorted by created_at, individual model cards for top candidates.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Daily Model Scout Report — 2026-04-27

Daily Model Scout Report — 2026-04-27

Baselines (3,500-sample hard eval, _overall.weighted_score)

High relevance — benchmark immediately

1. Qwen/Qwen3.6-27B (link)

2. Qwen/Qwen3.6-35B-A3B (link)

3. RedHatAI/Qwen3.6-35B-A3B-NVFP4 (link)

4. Qwen/Qwen3.6-27B-FP8 and Qwen/Qwen3.6-35B-A3B-FP8

Medium relevance — worth watching

5. moonshotai/Kimi-K2.6 (link)

6. Community quants of Qwen3.6 (unsloth/*-GGUF, cyankiwi/*-AWQ, lmstudio-community/*)

Low relevance / context only

Recommended next actions

Baselines (3,500-sample hard eval, `_overall.weighted_score`)

1. `Qwen/Qwen3.6-27B` (link)

2. `Qwen/Qwen3.6-35B-A3B` (link)

3. `RedHatAI/Qwen3.6-35B-A3B-NVFP4` (link)

4. `Qwen/Qwen3.6-27B-FP8` and `Qwen/Qwen3.6-35B-A3B-FP8`

5. `moonshotai/Kimi-K2.6` (link)

6. Community quants of Qwen3.6 (`unsloth/-GGUF`, `cyankiwi/-AWQ`, `lmstudio-community/*`)