Activity Feed

AI & ML interests

None defined yet.

Recent Activity

alvarobarttĀ 
posted an update about 1 month ago
view post
Post
3599
Learn how to deploy Microsoft Research VibeVoice ASR on Microsoft Azure Foundry with Hugging Face to generate rich audio transcriptions with Who, When, and What! šŸ’„

> šŸ•’ 60-minute single-pass processing, no chunking or stitching
> šŸ‘¤ Customized hotwords to guide recognition on domain-specific content
> šŸ“ Rich transcription: joint ASR + diarization + timestamping in one pass
> šŸŒ 50+ languages with automatic detection and code-switching support
> šŸ¤— Deployed on Microsoft Foundry via an OpenAI-compatible Chat Completions API

https://huggingface.co/docs/microsoft-azure/foundry/examples/deploy-vibevoice-asr
alvarobarttĀ 
posted an update 3 months ago
view post
Post
3221
šŸ’„ hf-mem v0.4.1 now also estimates KV cache memory requirements for any context length and batch size with the --experimental flag!

uvx hf-mem --model-id ... --experimental will automatically pull the required information from the Hugging Face Hub to include the KV cache estimation, when applicable.

šŸ’” Alternatively, you can also set the --max-model-len, --batch-size and --kv-cache-dtype arguments (Ć  la vLLM) manually if preferred.
  • 1 reply
Ā·
merveĀ 
posted an update 6 months ago
view post
Post
11454
deepseek-ai/DeepSeek-OCR is out! šŸ”„ my take ā¤µļø
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages
  • 4 replies
Ā·
merveĀ 
posted an update 7 months ago
view post
Post
6986
large AI labs open-sourced a ton of models last week šŸ”„
here's few picks, find even more here merve/sep-16-releases-68d13ea4c547f02f95842f05 šŸ¤
> IBM released a new Docling model with 258M params based on Granite (A2.0) šŸ“ ibm-granite/granite-docling-258M
> Xiaomi released 7B audio LM with base and instruct variants (MIT) XiaomiMiMo/mimo-audio-68cc7202692c27dae881cce0
> DecartAI released Lucy Edit, open Nano Banana šŸŒ (NC) decart-ai/Lucy-Edit-Dev
> OpenGVLab released a family of agentic computer use models (3B/7B/32B) with the dataset šŸ’» OpenGVLab/scalecua-68c912cf56f7ff4c8e034003
> Meituan Longcat released thinking version of LongCat-Flash šŸ’­ meituan-longcat/LongCat-Flash-Thinking
  • 2 replies
Ā·
merveĀ 
posted an update 7 months ago
view post
Post
3521
IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face šŸ”„

> not only a document converter but also can do document question answering, understand multiple languages 🤯
> best part: released with Apache 2.0 license šŸ‘ use it with your commercial projects!
> it supports transformers, vLLM and MLX from the get-go! šŸ¤—
> built on SigLIP2 & granite-165M

model: ibm-granite/granite-docling-258M
demo: ibm-granite/granite-docling-258m-demo šŸ’—
merveĀ 
posted an update 7 months ago
view post
Post
1277
a ton of image/video generation models and LLMs from big labs šŸ”„

> Meta released facebook/mobilellm-r1-68c4597b104fac45f28f448e, smol LLMs for on-device use šŸ’¬
> Tencent released tencent/SRPO, high res image generation model and tencent/POINTS-Reader, cutting edge OCR šŸ“
> ByteDance released bytedance-research/HuMo, video generation from any input āÆļø

find more models, datasets, demos here merve/sep-11-releases-68c7dbfa26bea8cd921fa0ac
merveĀ 
posted an update 7 months ago
view post
Post
1070
fan-favorite vision LM Florence-2 is now officially supported in transformers šŸ¤—

find all the models in
florence-community
org 🫔
merveĀ 
posted an update 7 months ago
merveĀ 
posted an update 7 months ago