Sam
samsam55
·
AI & ML interests
None yet
Organizations
None yet
Self Improving
Deep Search
Computer Use
Visual Multi Modal LLM
-
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints
Paper • 2510.08565 • Published • 21 -
Detect Anything via Next Point Prediction
Paper • 2510.12798 • Published • 50 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 120 -
DeepEyesV2: Toward Agentic Multimodal Model
Paper • 2511.05271 • Published • 45
Misc
-
UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG
Paper • 2510.03663 • Published • 16 -
LLM-guided Hierarchical Retrieval
Paper • 2510.13217 • Published • 21 -
AnyUp: Universal Feature Upsampling
Paper • 2510.12764 • Published • 12 -
katanemo/Arch-Router-1.5B
Text Generation • Updated • 1.02k • • 248
3D Models & Modeling
-
Towards Scalable and Consistent 3D Editing
Paper • 2510.02994 • Published • 6 -
UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections
Paper • 2509.24817 • Published • 9 -
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
Paper • 2510.15019 • Published • 64 -
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery
Paper • 2510.15869 • Published • 50
Datasets
Run on CPU Optimizations
World View Creation (out painting 3D)
Coding LLMs
TTS & Speech to Text
-
Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction
Paper • 2510.03117 • Published • 12 -
ResembleAI/chatterbox
Text-to-Speech • Updated • 2.23M • • 1.51k -
thewh1teagle/phonikud
0.3B • Updated • 188 • 1 -
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
Paper • 2510.13344 • Published • 63
Agents
Reinforcement Learning Etc..
Datasets
Self Improving
Run on CPU Optimizations
Deep Search
World View Creation (out painting 3D)
Computer Use
Coding LLMs
Visual Multi Modal LLM
-
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints
Paper • 2510.08565 • Published • 21 -
Detect Anything via Next Point Prediction
Paper • 2510.12798 • Published • 50 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 120 -
DeepEyesV2: Toward Agentic Multimodal Model
Paper • 2511.05271 • Published • 45
TTS & Speech to Text
-
Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction
Paper • 2510.03117 • Published • 12 -
ResembleAI/chatterbox
Text-to-Speech • Updated • 2.23M • • 1.51k -
thewh1teagle/phonikud
0.3B • Updated • 188 • 1 -
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
Paper • 2510.13344 • Published • 63
Misc
-
UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG
Paper • 2510.03663 • Published • 16 -
LLM-guided Hierarchical Retrieval
Paper • 2510.13217 • Published • 21 -
AnyUp: Universal Feature Upsampling
Paper • 2510.12764 • Published • 12 -
katanemo/Arch-Router-1.5B
Text Generation • Updated • 1.02k • • 248
Agents
3D Models & Modeling
-
Towards Scalable and Consistent 3D Editing
Paper • 2510.02994 • Published • 6 -
UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections
Paper • 2509.24817 • Published • 9 -
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
Paper • 2510.15019 • Published • 64 -
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery
Paper • 2510.15869 • Published • 50