view article Article Blazingly fast whisper transcriptions with Inference Endpoints +4 mfuntowicz, freddyaboulton, Steveeeeeeen, reach-vb, erikkaum, michellehbn • May 13, 2025 • 82
view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference mfuntowicz, hlarcher • Jan 16, 2025 • 76
view article Article Hugging Face on AMD Instinct MI300 GPU +2 fxmarty, mohitsha, seungrokj, mfuntowicz • May 21, 2024 • 16
view article Article CPU Optimized Embeddings with 🤗 Optimum Intel and fastRAG +4 peterizsak, mber, danf, echarlaix, mfuntowicz, moshew • Mar 15, 2024 • 14
view article Article Accelerating SD Turbo and SDXL Turbo Inference with ONNX Runtime and Olive +1 sschoenmeyer, tlwu, mfuntowicz • Jan 15, 2024 • 7
view article Article AMD + 🤗: Large Language Models Out-of-the-Box Acceleration with AMD GPU +4 fxmarty, IlyasMoutawwakil, mohitsha, echarlaix, seungrokj, mfuntowicz • Dec 5, 2023 • 4
view article Article Optimum-NVIDIA Unlocking blazingly fast LLM inference in just 1 line of code laikh-nvidia, mfuntowicz • Dec 5, 2023 • 5
view article Article Accelerating over 130,000 Hugging Face models with ONNX Runtime sschoenmeyer, mfuntowicz • Oct 4, 2023 • 1
view article Article Accelerating over 130,000 Hugging Face models with ONNX Runtime sschoenmeyer, mfuntowicz • Oct 4, 2023 • 1
view article Article Case Study: Millisecond Latency using Hugging Face Infinity and modern CPUs +1 philschmid, jeffboudier, mfuntowicz • Jan 13, 2022 • 3
view article Article Scaling up BERT-like model Inference on modern CPU - Part 2 +2 echarlaix, jeffboudier, mfuntowicz, michaelbenayoun • Nov 4, 2021 • 1
view article Article Introducing Optimum: The Optimization Toolkit for Transformers at Scale +2 mfuntowicz, echarlaix, michaelbenayoun, jeffboudier • Sep 14, 2021 • 2