COinCO Context Classification Models
Authors: Tianze Yang*, Tyson Jordan*, Ruitong Sun*, Ninghao Liu, Jin Sun *Equal contribution Affiliation: University of Georgia
Overview
Fine-grained context classification models for detecting out-of-context objects in images. Each model is a fully merged Qwen2.5-VL-3B-Instruct fine-tuned via LoRA on the COinCO dataset.
The models classify whether an object (marked by a red bounding box) is in-context or out-of-context based on three criteria:
| Model | Criterion | Description |
|---|---|---|
co_occurrence/ |
Co-occurrence | Whether the object can reasonably appear together with other objects in the scene |
location/ |
Location | Whether the object is placed in a physically and contextually reasonable position |
size/ |
Size | Whether the object's size is proportional and realistic relative to other objects |
How to Use
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
import torch
# Choose a model: "co_occurrence", "location", or "size"
model_id = "COinCO/Context_Classification_Models"
subfolder = "co_occurrence" # or "location" or "size"
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
model_id,
subfolder=subfolder,
torch_dtype=torch.float16,
device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_id, subfolder=subfolder)
Training Details
- Base Model: Qwen2.5-VL-3B-Instruct
- Method: LoRA fine-tuning (merged into base model)
- Dataset: COinCO inpainted images with multi-model consensus labels
- Training Data: ~5,000 samples per criterion from the training split
- Epochs: 3
- Learning Rate: 2e-4
- LoRA Rank: See adapter config for details
Evaluation Results
Inpainted Test Set (binary classification: In-context vs Out-of-context)
| Criterion | Baseline (Qwen2.5-VL-3B) | Fine-tuned | Improvement |
|---|---|---|---|
| Co-occurrence | 75.54% | 80.82% | +5.28% |
| Location | 74.43% | 71.05% | -3.38% |
| Size | 50.21% | 66.01% | +15.80% |
Real COCO Images (shortcut learning detection, higher = less shortcut reliance)
| Criterion | Baseline | Fine-tuned | Improvement |
|---|---|---|---|
| Co-occurrence | 88.95% | 87.00% | -1.95% |
| Location | 47.55% | 91.35% | +43.80% |
| Size | 52.55% | 83.20% | +30.65% |
Related Resources
- Paper: "Common Inpainted Objects In-N-Out of Context"
- Dataset: COinCO/COinCO-dataset
- Code: YangTianze009/COinCO
Citation
@article{yang2025coinco,
title={Common Inpainted Objects In-N-Out of Context},
author={Tianze Yang and Tyson Jordan and Ruitong Sun and Ninghao Liu and Jin Sun},
year={2025}
}
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for COinCO/Context_Classification_Models
Base model
Qwen/Qwen2.5-VL-3B-Instruct