14 22 23

Bin Wang

wanderkid

https://wangbindl.github.io/

wangbinDL

AI & ML interests

Computer Vision, Multimodal Large Language Model

Recent Activity

authored a paper about 19 hours ago

TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition

authored a paper about 19 hours ago

InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery

authored a paper about 19 hours ago

MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding

View all activity

Organizations

authored 4 papers about 19 hours ago

TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition

Paper • 2512.01248 • Published Dec 1, 2025 • 12

InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery

Paper • 2602.08990 • Published Feb 9 • 78

MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding

Paper • 2603.22458 • Published 15 days ago • 132

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Paper • 2603.25040 • Published 13 days ago • 125

submitted a paper to Daily Papers 1 day ago

MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale

Paper • 2604.04771 • Published 2 days ago • 92

authored a paper 6 months ago

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26, 2025 • 155

authored 6 papers over 1 year ago

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Paper • 2412.07626 • Published Dec 10, 2024 • 30

OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation

Paper • 2412.02592 • Published Dec 3, 2024 • 24

Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction

Paper • 2410.21169 • Published Oct 28, 2024 • 30

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

Paper • 2410.12628 • Published Oct 16, 2024 • 41

MinerU: An Open-Source Solution for Precise Document Content Extraction

Paper • 2409.18839 • Published Sep 27, 2024 • 41

CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation

Paper • 2409.03643 • Published Sep 5, 2024 • 19

authored a paper almost 2 years ago

UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition

Paper • 2404.15254 • Published Apr 23, 2024 • 1

authored 7 papers about 2 years ago

InternLM2 Technical Report

Paper • 2403.17297 • Published Mar 26, 2024 • 34

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

Paper • 2401.16420 • Published Jan 29, 2024 • 55

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization

Paper • 2311.16839 • Published Nov 28, 2023 • 1

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Paper • 2311.17911 • Published Nov 29, 2023 • 2

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition

Paper • 2309.15112 • Published Sep 26, 2023 • 2

Parrot Captions Teach CLIP to Spot Text

Paper • 2312.14232 • Published Dec 21, 2023 • 12

VIGC: Visual Instruction Generation and Correction

Paper • 2308.12714 • Published Aug 24, 2023 • 1

Bin Wang

AI & ML interests

Recent Activity

Organizations

wanderkid's activity