TmpCoherePartners

non-profit
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

alvarobarttΒ 
posted an update 13 days ago
view post
Post
303
Open agents on AWS SageMaker AI with open models from the Hugging Face Hub!

> Deploy an open model from the Hugging Face Hub on SageMaker AI
> Connect the deployed model to Strands Agents
> Add built-in and custom tools for tool calling
> Expose external capabilities through MCP integration
> Bonus: talk to your agent and visualize traces with Gradio

https://alvarobartt.com/agents-on-aws-sagemaker
alvarobarttΒ 
posted an update 17 days ago
view post
Post
3282
Latest hf-mem release added a breakdown of Mixture-of-Experts (MoE) memory usage!

TL; DR MoEs can be misleading to reason about from active parameters alone, since each token only activates a subset of experts, while the serving setup still needs to account for the full resident memory footprint.

🧠 hf-mem now splits MoE memory into base model weights, routed experts, and KV cache
πŸ—οΈ Dense models usually load and use most weights every forward pass, while MoEs load many experts but only route each token to a few of them
⚑ Active params isn't the same as memory footprint, especially for sparse architectures
πŸ“¦ Runtime memory is about what is used per request/token, while loading memory also includes the expert weights that need to be resident
πŸ“š KV cache can still dominate depending on context length, batch size, and concurrency
πŸ”€ Expert Parallelism (EP) helps shard experts across accelerators when expert weights dominate
πŸš€ Data Parallelism (DP) + EP is often a good fit for throughput-oriented MoE serving

Check the repository at https://github.com/alvarobartt/hf-mem
alvarobarttΒ 
posted an update 3 months ago
view post
Post
3742
Learn how to deploy Microsoft Research VibeVoice ASR on Microsoft Azure Foundry with Hugging Face to generate rich audio transcriptions with Who, When, and What! πŸ’₯

> πŸ•’ 60-minute single-pass processing, no chunking or stitching
> πŸ‘€ Customized hotwords to guide recognition on domain-specific content
> πŸ“ Rich transcription: joint ASR + diarization + timestamping in one pass
> 🌍 50+ languages with automatic detection and code-switching support
> πŸ€— Deployed on Microsoft Foundry via an OpenAI-compatible Chat Completions API

https://huggingface.co/docs/microsoft-azure/foundry/examples/deploy-vibevoice-asr
juanjucmΒ 
posted an update 4 months ago
view post
Post
332
Last week,
zai-org
dropped zai-org/GLM-4.7-Flash. Now, we bring it to Microsoft Foundry!

- πŸ† 30B-A3B MoE, the strongest model in the 30B class. It excels at coding tasks, agentic workflows and reasoning.
- 🀏🏻 Lighter version of his 358B big brother, balancing performance and efficiency.

Not light enough for you? We are also adding
unsloth
unsloth/GLM-4.7-Flash-GGUF to the catalog, with GPU and CPU support powered by llama.cpp πŸ”₯

Go join the hype and deploy them from the Hugging Face collection on Microsoft Foundry!
  • 2 replies
Β·
alvarobarttΒ 
posted an update 4 months ago
view post
Post
3271
πŸ’₯ hf-mem v0.4.1 now also estimates KV cache memory requirements for any context length and batch size with the --experimental flag!

uvx hf-mem --model-id ... --experimental will automatically pull the required information from the Hugging Face Hub to include the KV cache estimation, when applicable.

πŸ’‘ Alternatively, you can also set the --max-model-len, --batch-size and --kv-cache-dtype arguments (Γ  la vLLM) manually if preferred.
  • 1 reply
Β·
alvarobarttΒ 
posted an update over 1 year ago
view post
Post
3648
πŸ”₯ Agents can do anything! @microsoft Research just announced the release of Magma 8B!

Magma is a new Visual Language Model (VLM) with 8B parameters for multi-modal agents designed to handle complex interactions across virtual and real environments; and it's MIT licensed!

Magma comes with exciting new features such as:
- Introduces the Set-of-Mark and Trace-of-Mark techniques for fine-tuning
- Leverages a large amount of unlabeled video data to learn the spatial-temporal grounding and planning
- A strong generalization and ability to be fine-tuned for other agentic tasks
- SOTA in different multi-modal benchmarks spanning across UI navigation, robotics manipulation, image / video understanding and spatial understanding and reasoning
- Generates goal-driven visual plans and actions for agentic use cases

Model: microsoft/Magma-8B
Technical Report: Magma: A Foundation Model for Multimodal AI Agents (2502.13130)
alvarobarttΒ 
posted an update almost 2 years ago
view post
Post
3043
πŸ€— Serving Meta Llama 3.1 405B on Google Cloud is now possible via the Hugging Face Deep Learning Containers (DLCs) for Text Generation Inference (TGI)

In this post, we showcase how to deploy https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 on an A3 instance with 8 x H100 GPUs on Vertex AI

Thanks to the Hugging Face DLCs for TGI and Google Cloud Vertex AI, deploying a high-performance text generation container for serving Large Language Models (LLMs) has never been easier. And we’re not going to stop here – stay tuned as we enable more experiences to build AI with open models on Google Cloud!

Read the full post at https://huggingface.co/blog/llama31-on-vertex-ai
alvarobarttΒ 
posted an update about 2 years ago
view post
Post
3282
πŸ”₯ Prometheus 2 was recently released by Kaist AI as an alternative and closely mirroring both human and GPT-4 evaluation, and surpassing the former Prometheus!

prometheus-eval/prometheus-7b-v2.0
prometheus-eval/prometheus-8x7b-v2.0

🌬️Fine-tuned on top of mistralai/Mistral-7B-Instruct-v0.2 and mistralai/Mixtral-8x7B-Instruct-v0.1
πŸ—‚οΈThe datasets used for fine-tuning have been publicly released i.e. prometheus-eval/Feedback-Collection and prometheus-eval/Preference-Collection
🀝🏻Unified LM evaluator for absolute (a single prompt-completion pair) and relative (two completions for a given prompt) due to model merging
❌No longer needs a mandatory reference / golden answer, but can still be provided optionally
πŸ”Surpasses the former version of Prometheus, and has a high correlation with human, GPT-4, and Claude 3 Opus scores when evaluating LMs
πŸ“Apache 2.0 license

Long-story short, an amazing job from Kaist AI bridging the gap with LLM evaluators other than proprietary and bigger models!

This week at Argilla, we decided to add a new task to use Prometheus 2 as an LLM evaluator using distilabel, so we implemented PrometheusEval.

😱 Using PrometheusEval running their 7B variant with vLLM in a single L40 on top of HuggingFaceH4/instruction-dataset, we got the 327 existing prompt-completion pairs evaluated and pushed to the Hub in less than 2 minutes!

Find the generated dataset and the code at distilabel-internal-testing/instruction-dataset-prometheus
  • 1 reply
Β·
alvarobarttΒ 
posted an update about 2 years ago
view post
Post
2783
🦫 We have just released argilla/Capybara-Preferences in collaboration with Kaist AI (@JW17 , @nlee-208 ) and Hugging Face (@lewtun )

A new synthetic preference dataset built using distilabel on top of the awesome LDJnr/Capybara from @LDJnr

The current dataset combines the already generated alternative completions from argilla/distilabel-capybara-dpo-7k-binarized, while also adding the remaining ones using the same approach!

Here are some key features on how we built it:

- 🧹 Duplicate removal, keeping the conversation besides the last assistant response, and some slight pre-processing

- πŸ€– Generation of alternative completions for the existing conversations (last turn only) with: mlabonne/NeuralBeagle14-7B, argilla/notus-7b-v1, and teknium/OpenHermes-2.5-Mistral-7B

- πŸ‘¨πŸ»β€πŸ« Running UltraFeedback via GPT-4 to generate the critique i.e. ratings and rationales, for the last assistant responses

- πŸŽ‰ Finally, we selected the chosen and rejected responses based on their UltraFeedback score, and applied some slight post-processing!

Sounds simple right? Start building your own synthetic datasets with https://github.com/argilla-io/distilabel already!
alvarobarttΒ 
posted an update over 2 years ago
alvarobarttΒ 
posted an update over 2 years ago
view post
Post
πŸ’¨ Notux 8x7b was just released!

From Argilla, we recently fine-tuned Mixtral 8x7b Instruct from Mistral AI using DPO, and a binarized and curated version of UltraFeedback, to find out it outperforms every other MoE-based model on the Hub.

- argilla/notux-8x7b-v1
- argilla/ultrafeedback-binarized-preferences-cleaned
  • 19 replies
Β·