Paused Sudanese Arabic SWE-AGILE Reasoning Benchmark ๐ง Run Sudanese Arabic reasoning benchmark with context strategies
Running Sudanese Arabic Synthetic Data Quality Benchmark ๐ Evaluate Sudanese Arabic models and compare their generated responses
Paused Sudanese Arabic Reading Comprehension Benchmark ๐ Run Sudanese Arabic QA benchmark and compare models
Running Sudanese Arabic Code-Switching Detection ๐ Detect ArabicโEnglish codeโswitches in Sudanese text
Paused Process Reward Agents: Test-Time Reasoning Scaling ๐ณ Compare greedy vs rewardโguided reasoning for a question
Paused Sudanese CoT Reasoning Benchmark ๐ง Generate step-by-step Sudanese Arabic reasoning and analysis
Paused Master Key Hypothesis Demo ๐ Explore simulated crossโmodel transfer with 3D visualizations