FIRM-Reward
Collection
The data and models of "Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation" • 8 items • Updated • 1
This model is a fine-tuned version of Qwen/Qwen3-VL-8B-Instruct on the instruction_following_train_v3 and the consistency_train_v3 datasets. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 0.591 | 0.2182 | 500 | 0.5827 |
| 0.5605 | 0.4364 | 1000 | 0.5460 |
| 0.5252 | 0.6546 | 1500 | 0.5199 |
| 0.5075 | 0.8728 | 2000 | 0.5055 |
Install from pip and serve model
# Install vLLM from pip: pip install vllm# Start the vLLM server: vllm serve "VisionXLab/FIRM-Edit-8B"# Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "VisionXLab/FIRM-Edit-8B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'