arxiv:2606.30124

SciIR: A Large-scale Training Dataset and Benchmark for Scientific Image Reasoning Generation

Published on Jun 29

· Submitted by

Zhiyuan Ma on Jul 2

MAIR Lab@HUST

Upvote

Authors:

Abstract

Scientific image generation faces challenges in semantic alignment and logical reasoning, prompting the creation of SciIR-82k dataset and SciIR-Bench evaluation framework to improve scientific reasoning capabilities in text-to-image models.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

While Text-to-Image (T2I) models have shown remarkable success in generating photorealistic visual content, they still struggle with the rigorous semantic alignment and logical reasoning required for scientific imagery. Inspired by Peirce's Semiotic Triad, we introduce Scientific Image Reasoning (SciIR), a comprehensive resource for training and evaluation of scientific image generation. We formalize scientific reasoning into three core dimensions: Entity Structure (Icon), Scientific Process (Index), and Scientific Law (Symbol). Specifically, to overcome the scarcity of training data in scientific image generation, we elaborately create SciIR-82k, a large-scale dataset containing over 80,000 high-quality scientific image-text pairs from cutting-edge publications. The dataset is hierarchically organized according to the semiotic dimensions and incorporates a Scientific Reasoning Chain-of-Thought (Sci-RCoT) to explicitly model underlying visual logic. For evaluation, we propose SciIR-Bench, which aligns with these three semiotic levels and employs an Atomic Checklist to convert the outcome-oriented scientific accuracy into process-oriented, verifiable, fine-grained questions. Our extensive experiments reveal significant deficiencies in current models' scientific reasoning capabilities. Furthermore, by fine-tuning on the SciIR-82k dataset, we developed the Qwen-Image-SciIR model, which achieves a substantial improvement on the SciIR-Bench, increasing the final score from 35\% to 43\%, laying a solid foundation for future advances in scientific image generation.

View arXiv page View PDF Project page GitHub 2 Add to collection

Community

zhizhi111

Paper submitter about 21 hours ago

We formalize scientific reasoning in three core dimensions: Entity Structure (Icon), Scientific Process (Index), and Scientific Law (Symbol). To overcome the scarcity of training data in scientific image generation, we elaborately create SciIR-82k, a large-scale dataset containing over 80,000 high-quality scientific image–text pairs from cutting-edge publications. The dataset is hierarchically organized according to the semiotic dimensions and incorporates a Scientific Reasoning Chain-of-Thought (Sci-RCoT) to explicitly model the underlying visual logic.

For evaluation, we propose SciIR-Bench, which aligns with these three semiotic levels and employs an Atomic Checklist to convert the outcome-oriented scientific accuracy into process-oriented, verifiable, fine-grained questions. Our extensive experiments reveal significant deficiencies in current models' scientific reasoning capabilities. Furthermore, by fine-tuning on the SciIR-82k dataset, we developed the Qwen-Image-SciIR model, which achieves a substantial improvement on the SciIR-Bench, increasing the final score from 35% to 43%, laying a solid foundation for future advances in scientific image generation.

librarian-bot

about 6 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

ChaelChael

about 5 hours ago

🎉 Official resources for the ECCV 2026 paper are now available!

📄 Paper: https://arxiv.org/abs/2606.30124

📦 Code: https://github.com/MAIR-Lab-HUST/SciIR

🤗 Dataset: https://huggingface.co/datasets/MAIR-Lab-HUST/SciIR-82k

SciIR introduces a large-scale training dataset and benchmark for Scientific Image Reasoning Generation, aiming to push text-to-image models beyond visual plausibility toward scientific correctness.

Built upon the semiotic triad of Entity Structure, Scientific Process, and Scientific Law, SciIR-82k provides 80K+ high-quality scientific image-text pairs with reasoning-chain supervision, while SciIR-Bench evaluates scientific accuracy through fine-grained atomic checklists.

By fine-tuning on SciIR-82k, Qwen-Image-SciIR improves the SciIR-Bench score from 35% to 43%, laying a foundation for more faithful and reasoning-aware scientific image generation.

If you find our work useful, please consider starring the repository, using the dataset, and citing our paper. Thanks for your support! ⭐🚀

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.30124

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.30124 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.30124 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.30124 in a Space README.md to link it from this page.