davidanugraha/Qwen3-4B-Instruct-2507-UserSim-Factored-SimPO-Sample-Span 4B • Updated 21 days ago • 39
davidanugraha/Qwen3-4B-Instruct-2507-UserSim-Factored-SimPO-Sample-EqWeightSpan 4B • Updated 21 days ago • 36
davidanugraha/Qwen3-4B-Instruct-2507-UserSim-Factored-SimPO-Sample-InvertedSpan 4B • Updated 21 days ago • 35
davidanugraha/Qwen3-4B-Instruct-2507-UserSim-Factored-SimPO-Sample-NoSpan 4B • Updated 21 days ago • 43
davidanugraha/Qwen3-4B-Instruct-2507-UserSim-SFT-Baseline Text Generation • 4B • Updated 22 days ago • 49
davidanugraha/Qwen3-4B-Instruct-2507-UserSim-SFT-Factored Text Generation • 4B • Updated 22 days ago • 64
davidanugraha/DeepSeek-R1-Distill-Qwen-7B-Overthinking-SFT Text Generation • 8B • Updated Dec 28, 2025 • 3
davidanugraha/DeepSeek-R1-Distill-Qwen-1.5B-Overthinking-SFT Text Generation • 2B • Updated Dec 28, 2025 • 3
davidanugraha/Qwen2.5-Coder-3B-Instruct-ReinfPP-Reflection-16k-20test-passrate 3B • Updated Dec 13, 2025 • 2
davidanugraha/Qwen2.5-Coder-3B-Instruct-ReinfPP-Reflection-16k-20test-binary 3B • Updated Dec 13, 2025 • 2
davidanugraha/Qwen2.5-Coder-3B-Instruct-ReinfPP-Reflection-8k-20test-binary 3B • Updated Dec 13, 2025 • 1
davidanugraha/Qwen2.5-Coder-3B-Instruct-ReinfPP-Reflection-4k-20test-passrate 3B • Updated Dec 13, 2025 • 1
davidanugraha/Qwen2.5-Coder-3B-Instruct-ReinfPP-Reflection-4k-20test-binary 3B • Updated Dec 13, 2025 • 1