Step-3.7-Flash-JANG_2L

JANG_2L conversion of stepfun-ai/Step-3.7-Flash-NVFP4.

This bundle was built from the public NVFP4 checkpoint. Routed MoE tensors were decoded from ModelOpt NVFP4 (uint8 payload, float8_e4m3fn block scales, fp32 side scales) and then re-quantized into JANG affine weight/scales/biases tensors. BF16 attention, shared expert, dense, vision, and projector tensors were handled according to the JANG plan.

Status

This artifact has a text-only local coherence proof through the bundled step3p7_mlx.py bridge, which loads the nested Step3p5 text model using MLX and drops vision tensors for text generation.

Verified locally:

  • 67 safetensors shards
  • 2,570 tensors in model.safetensors.index.json
  • No missing shard references
  • No raw NVFP4 weight_scale, weight_scale_2, or input_scale sidecars are present in the output index
  • jang_config.json capability verification passes
  • Text generation proof passes on a math prompt

Text proof:

{
  "prompt": "What is 2+2? Answer with only the number.",
  "output": "The user asks \"What is 2+2? Answer with only the number.\" So the answer is 4. The user wants only the number. So we should output \"4\". There's no disallowed content. It's a simple arithmetic. So we comply.\\n</think>\\n4",
  "prompt_tokens": 26,
  "generated_tokens": 58,
  "prefill_s": 9.161997079849243,
  "contains_final_4": true
}

Speed note: short cold measurements include MLX graph/kernel compile and are not representative of steady decode. A no-wrapper warmed decode run over 32 measured tokens produced:

{
  "prefill_s": 9.369971990585327,
  "warm_tokens": 4,
  "measured_tokens": 32,
  "decode_s": 0.7534263134002686,
  "tok_s": 42.47263392697507
}

Still required before full VLM runtime claims:

  • Step3p7 VLM wrapper in the target MLX/vMLX runtime
  • image patch token expansion and vision projector path

Format

  • Format: JANG affine
  • Profile: JANG_2L
  • Quantization backend: mx.quantize
  • Default group size: 128
  • Bit widths used: 2, 3, 4, 6, 8
  • Vision/projector: BF16 source converted to F16 passthrough for this first artifact
  • Output size: about 82G
  • Runtime bridge: step3p7_mlx.py wraps mlx_lm.models.step3p5 for text-only proof

Important allocation choices:

  • self_attn.{q,k,v,o,g}_proj: 8-bit
  • embed_tokens: 6-bit
  • routed experts: gate_proj=4, down_proj=3, up_proj=2
  • true router/gate tensors: passthrough where present

Runtime Metadata

jang_config.json stamps:

{
  "reasoning_parser": "qwen3",
  "tool_parser": "step3p5",
  "think_in_template": true,
  "supports_tools": true,
  "supports_thinking": true,
  "family": "step3p7",
  "modality": "vision",
  "cache_type": "kv"
}

The source chat template opens the assistant generation prompt inside <think>. Runtimes should not add a second synthetic reasoning prefix.

Vision And Audio

The source checkpoint contains the Step vision encoder and vit_large_projector. No audio tensors or audio tokenizer files were present in the downloaded checkpoint.

The source config mentions next-token prediction layers, but no MTP/nextn tensors were present in the NVFP4 source. This bundle does not synthesize MTP tensors from config fields.

Korean

이 번들은 stepfun-ai/Step-3.7-Flash-NVFP4를 JANG_2L 형식으로 변환한 산출물입니다. 텍스트 경로는 step3p7_mlx.py 브리지를 통해 로컬 생성 검증을 통과했습니다. 비전 가중치는 포함되어 있지만, 이미지 입력 경로는 아직 별도 런타임 구현과 검증이 필요합니다. 오디오 텐서는 원본 체크포인트에 없었습니다.

Downloads last month
151
Safetensors
Model size
24B params
Tensor type
U32
·
F16
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OsaurusAI/Step-3.7-Flash-JANG_2L

Quantized
(4)
this model