Quang Huy
NothingLQH
·
AI & ML interests
None yet
Recent Activity
updated a collection 13 days ago
ConvertHTMLtoJSON updated a collection 3 months ago
SpeechToText updated a collection 4 months ago
ImageOrganizations
None yet
ConvertHTMLtoJSON
Automation
TextToVideo
VLM
-
FocusedAD: Character-centric Movie Audio Description
Paper • 2504.12157 • Published • 8 -
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Paper • 2504.10465 • Published • 27 -
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Paper • 2504.13180 • Published • 20 -
OS-Copilot/OS-Atlas-Base-7B
Image-Text-to-Text • 8B • Updated • 1.17k • 42
Speech
-
facebook/wav2vec2-lv-60-espeak-cv-ft
Automatic Speech Recognition • Updated • 102k • 66 - Running on T4459
Resemble Enhance
🚀459Enhance and denoise your audio files
-
pyannote/speaker-diarization-3.1
Automatic Speech Recognition • Updated • 11.7M • 1.7k -
Atotti/miipher-2-HuBERT-HiFi-GAN-v0.1
Updated • 14
ImageToVideo
-
Pushing the Boundaries of State Space Models for Image and Video Generation
Paper • 2502.00972 • Published -
IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
Paper • 2501.13920 • Published • 19 -
tencent/HunyuanVideo-I2V
Image-to-Video • Updated • 138 • • 350 -
IndexTeam/Index-anisora
Updated • 222
TextToText
NLP
3D
LiveImage
DatasetLanguage
Image
LLM
-
stepfun-ai/GOT-OCR2_0
Image-Text-to-Text • Updated • 56.5k • 1.53k - Running on ZeroFeatured572
Midi Music Generator
🎼572Generate MIDI music from prompts
-
OpenGVLab/InternVL2_5-78B-MPO
Image-Text-to-Text • 78B • Updated • 49 • 54 -
OpenGVLab/InternVL2_5-38B-MPO-AWQ
Image-Text-to-Text • Updated • 295 • 6
DATA_PDF
MJ6
Translation
ControlVPS
ORC
-
reducto/RolmOCR
Image-Text-to-Text • 8B • Updated • 128k • 584 -
moonshotai/Kimi-VL-A3B-Instruct
Image-Text-to-Text • 16B • Updated • 298k • 258 -
5CD-AI/Vintern-1B-v3_5
Image-Text-to-Text • 0.9B • Updated • 11.1k • 115 -
nanonets/Nanonets-OCR-s
Image-Text-to-Text • 4B • Updated • 64.4k • 1.59k
Prompt
Story
SpeechToText
- Sleeping1
Vietnamese Streaming RNN-T
💻1RNN-T with Whisper Encoder
-
erax-ai/EraX-WoW-Turbo-V1.0
Automatic Speech Recognition • 0.8B • Updated • 16 • 54 -
openai/whisper-large-v3-turbo
Automatic Speech Recognition • Updated • 5.05M • • 2.87k -
nvidia/canary-1b
Automatic Speech Recognition • Updated • 2.38k • 456
Anime
Video
IdeaMusic
Vistral-7B-Chat
TextToSpeech
News
DATA_PDF
ConvertHTMLtoJSON
MJ6
Automation
Translation
TextToVideo
ControlVPS
VLM
-
FocusedAD: Character-centric Movie Audio Description
Paper • 2504.12157 • Published • 8 -
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Paper • 2504.10465 • Published • 27 -
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Paper • 2504.13180 • Published • 20 -
OS-Copilot/OS-Atlas-Base-7B
Image-Text-to-Text • 8B • Updated • 1.17k • 42
ORC
-
reducto/RolmOCR
Image-Text-to-Text • 8B • Updated • 128k • 584 -
moonshotai/Kimi-VL-A3B-Instruct
Image-Text-to-Text • 16B • Updated • 298k • 258 -
5CD-AI/Vintern-1B-v3_5
Image-Text-to-Text • 0.9B • Updated • 11.1k • 115 -
nanonets/Nanonets-OCR-s
Image-Text-to-Text • 4B • Updated • 64.4k • 1.59k
Speech
-
facebook/wav2vec2-lv-60-espeak-cv-ft
Automatic Speech Recognition • Updated • 102k • 66 - Running on T4459
Resemble Enhance
🚀459Enhance and denoise your audio files
-
pyannote/speaker-diarization-3.1
Automatic Speech Recognition • Updated • 11.7M • 1.7k -
Atotti/miipher-2-HuBERT-HiFi-GAN-v0.1
Updated • 14
Prompt
ImageToVideo
-
Pushing the Boundaries of State Space Models for Image and Video Generation
Paper • 2502.00972 • Published -
IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
Paper • 2501.13920 • Published • 19 -
tencent/HunyuanVideo-I2V
Image-to-Video • Updated • 138 • • 350 -
IndexTeam/Index-anisora
Updated • 222
Story
TextToText
SpeechToText
- Sleeping1
Vietnamese Streaming RNN-T
💻1RNN-T with Whisper Encoder
-
erax-ai/EraX-WoW-Turbo-V1.0
Automatic Speech Recognition • 0.8B • Updated • 16 • 54 -
openai/whisper-large-v3-turbo
Automatic Speech Recognition • Updated • 5.05M • • 2.87k -
nvidia/canary-1b
Automatic Speech Recognition • Updated • 2.38k • 456
NLP
Anime
3D
Video
LiveImage
IdeaMusic
DatasetLanguage
Vistral-7B-Chat
Image
TextToSpeech
LLM
-
stepfun-ai/GOT-OCR2_0
Image-Text-to-Text • Updated • 56.5k • 1.53k - Running on ZeroFeatured572
Midi Music Generator
🎼572Generate MIDI music from prompts
-
OpenGVLab/InternVL2_5-78B-MPO
Image-Text-to-Text • 78B • Updated • 49 • 54 -
OpenGVLab/InternVL2_5-38B-MPO-AWQ
Image-Text-to-Text • Updated • 295 • 6