Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
1
Yiming Tang
tangyiming
Follow
AI & ML interests
None yet
Recent Activity
new
activity
1 day ago
Qwen/Qwen3-Next-80B-A3B-Instruct:
Megatron Swift dpo training on Qwen/Qwen3-Next-80B-A3B-Instruct always always return nan loss. Why?
View all activity
Organizations
None yet
models
0
None public yet
datasets
0
None public yet