PolyX Research

non-profit

https://github.com/PolyX-Research

PolyX-Research

Activity Feed

AI & ML interests

Non-profit Research Team for Multimodal Large Language Model

Recent Activity

Jiaqi-hkust authored a paper 2 days ago

Adaptive Debiasing Tsallis Entropy for Test-Time Adaptation

Jiaqi-hkust authored a paper 2 days ago

Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?

Jiaqi-hkust submitted a paper 3 days ago

Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?

View all activity

Jiaqi-hkust

authored 2 papers 2 days ago

Adaptive Debiasing Tsallis Entropy for Test-Time Adaptation

Paper • 2602.11743 • Published Feb 12

Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?

Paper • 2606.08063 • Published 9 days ago • 74

Jiaqi-hkust

posted an update 2 days ago

Post

3860

🚀 Introducing Robust-U1: Teaching MLLMs to Self-Recover Corrupted Visual Content

Multimodal Large Language Models (MLLMs) have achieved impressive visual understanding, yet they remain highly brittle under real-world corruptions—noise, blur, compression artifacts, adverse weather.

Standard MLLMs suffer dramatic performance drops, and existing robustness solutions come with fundamental limits: black‑box feature alignment lacks interpretability, while white‑box text reasoning cannot restore the lost pixel‑level visual details. This raises a crucial question:

🧐 Can MLLMs recover corrupted visual content by themselves?

If the answer is yes, we can move beyond merely “compensating” for corruption and instead build a more intrinsic, generalizable form of resilience. Robust-U1 is our answer to that question.

💡 Paper: https://arxiv.org/abs/2606.08063
🔗 Code: github.com/jqtangust/Robust-U1
🌍 Demo: Jiaqi-hkust/Robust-U1

Jiaqi-hkust

submitted a paper to Daily Papers 3 days ago

Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?

Paper • 2606.08063 • Published 9 days ago • 74

Jiaqi-hkust

posted an update 6 days ago

Post

3473

Happy to introduce Response-G1 #ACL2026 — a proactive agent for streaming video understanding.

📄 Paper: http://arxiv.org/abs/2605.07575
📷 Code: http://github.com/kadmkbl/Response-G1

We are happy to have a further discussion!!!

#ACL2026 #AI #Multimodal #VideoUnderstanding #OpenSource #LLM

1 reply

Jiaqi-hkust

posted an update 3 months ago

Post

5870

🛰️ Introducing Awesome-Remote-Sensing-Agents: The Largest Curated Collection of Intelligent Remote Sensing Agents

We are excited to share our new repository Awesome-Remote-Sensing-Agents – a comprehensive, community-driven collection of 100+ papers at the intersection of remote sensing and intelligent agents (LLMs, VLM, multi‑agent systems, etc.).

🔗 GitHub Repository: https://github.com/PolyX-Research/Awesome-Remote-Sensing-Agents

Our repository organizes this rapidly growing field into a structured, easy‑to‑navigate resource for researchers, practitioners, and enthusiasts.

📚 What’s Inside?
We’ve carefully curated papers across 6 key application domains:
🌿 Ecological Monitoring – forest fires, biodiversity, climate science
🚨 Emergency Response – flood mapping, wildfire tracking, disaster geolocalization
⛏️ Geological Exploration – mineral mapping, lithological recognition, geologic reasoning
🌊 Marine Supervision – ocean science, autonomous surface vehicles
🌾 Precision Agriculture – crop disease detection, land use simulation
🏙️ Urban Governance – change detection, urban planning, embodied navigation

🤝 Join the Community!
We warmly welcome contributions to keep this list up‑to‑date:
📝 Add missing papers via Pull Request
🏷️ Propose new or refined categories
🔗 Report broken links or outdated entries
💬 Discuss via GitHub Issues or contact the authors

Jiaqi-hkust

authored a paper 6 months ago

LongVideoAgent: Multi-Agent Reasoning with Long Videos

Paper • 2512.20618 • Published Dec 23, 2025 • 56

Jiaqi-hkust

submitted a paper to Daily Papers 6 months ago

LongVideoAgent: Multi-Agent Reasoning with Long Videos

Paper • 2512.20618 • Published Dec 23, 2025 • 56

Jiaqi-hkust

posted an update 6 months ago

Post

3711

We have open-sourced Robust-R1 (AAAI 2026 Oral), a new paradigm in the field of anti-degradation and robustness enhancement for multimodal large models.

Multimodal Large Language Models struggle to maintain reliable performance under extreme real-world visual degradations, which impede their practical robustness. Existing robust MLLMs predominantly rely on implicit training/adaptation that focuses solely on visual encoder generalization, suffering from limited interpretability and isolated optimization. To overcome these limitations, we propose Robust-R1, a novel framework that explicitly models visual degradations through structured reasoning chains. Our approach integrates: (i) supervised fine-tuning for degradation-aware reasoning foundations, (ii) reward-driven alignment for accurately perceiving degradation parameters, and (iii) dynamic reasoning depth scaling adapted to degradation intensity. To facilitate this approach, we introduce a specialized 11K dataset featuring realistic degradations synthesized across four critical real-world visual processing stages, each annotated with structured chains connecting degradation parameters, perceptual influence, pristine semantic reasoning chain, and conclusion. Comprehensive evaluations demonstrate state-of-the-art robustness: Robust-R1 outperforms all general and robust baselines on the real-world degradation benchmark R-Bench, while maintaining superior anti-degradation performance under multi-intensity adversarial degradations on MMMB, MMStar, and RealWorldQA.

We have made all of our papers, codes, data, model weights and demos fully open-source:
Paper: Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding (2512.17532) (help us to upvote)
GitHub code: https://github.com/jqtangust/Robust-R1 (help us to star)
HF model: https://huggingface.co/Jiaqi-hkust/Robust-R1
HF data: Jiaqi-hkust/Robust-R1
HF Space: Jiaqi-hkust/Robust-R1

We sincerely invite everyone to give it a try.

2 replies

Jiaqi-hkust

authored a paper 6 months ago

Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding

Paper • 2512.17532 • Published Dec 19, 2025 • 68

Jiaqi-hkust

submitted a paper to Daily Papers 6 months ago

Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding

Paper • 2512.17532 • Published Dec 19, 2025 • 68

Jiaqi-hkust

authored 6 papers 6 months ago

GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual Affective Computing

Paper • 2403.05916 • Published Mar 9, 2024

Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference

Paper • 2502.13542 • Published Feb 19, 2025 • 1

LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization

Paper • 2506.09373 • Published Jun 11, 2025 • 1

authored a paper over 1 year ago

Hawk: Learning to Understand Open-World Video Anomalies

Paper • 2405.16886 • Published May 27, 2024 • 1

Jiaqi-hkust

posted an update over 1 year ago

Post

2122

We have open-sourced Hawk (NeurIPS 2024) 🎉, one of the pioneering frameworks for open-world video anomaly understanding.

In the field of video anomaly detection, despite continuous technological advancements, existing systems still face limitations in semantic understanding of scenes and user interaction, making it challenging to effectively identify complex anomalous scenes. Additionally, the scarcity of datasets restricts the applicability of these systems in open-world scenarios.

To tackle these challenges, we developed Hawk, an open-world video understanding and anomaly detection framework. Hawk significantly enhances anomaly recognition by identifying motion information differences between anomalous and normal videos. We introduce an auxiliary consistency loss to strengthen the focus on motion modalities and establish a supervisory relationship between motion and language representations. Furthermore, we have annotated over 8,000 anomalous videos and their language descriptions and created 8,000 question-answer pairs to support effective training in diverse open-world scenarios.

Experimental results demonstrate that Hawk surpasses existing video understanding frameworks in video description generation and question-answering tasks.

We warmly invite everyone to try it out!
- Hugging Face Demo: Jiaqi-hkust/hawk
- Hugging Face Model: Jiaqi-hkust/hawk
- Hugging Face Dataset: Jiaqi-hkust/hawk
- GitHub Code: https://github.com/jqtangust/hawk

We look forward to your feedback and participation! 👏

2 replies

AI & ML interests

Recent Activity

Team members 1

PolyX's activity