36 15 20

Garreth Lee PRO

garrethlee

AI & ML interests

None yet

Recent Activity

new activity about 3 hours ago

mundo-ai/turn-benchmark-test:Simplify preview note to 'all turn-taking predictions'

new activity about 3 hours ago

mundo-ai/turn-benchmark-dev:Simplify preview note to 'all turn-taking predictions'

new activity about 3 hours ago

mundo-ai/turn-benchmark-test:Document Opus preview columns; predict on FLAC only

View all activity

Organizations

liked a model 3 months ago

sarulab-speech/DialogueSidon

Updated Mar 28 • 11

liked a Space 8 months ago

The Smol Training Playbook

📚

3.21k

The secrets to building world-class LLMs

liked a dataset 10 months ago

HuggingFaceM4/FineVision

Viewer • Updated Oct 21, 2025 • 24.2M • 139k • 494

liked a model 10 months ago

google/embeddinggemma-300m

liked a dataset 10 months ago

nvidia/Granary

Viewer • Updated Mar 12 • 141M • 6.8k • 199

liked a Space about 1 year ago

Dia 1.6B

👯

1.78k

Generate realistic dialogue from a script, using Dia!

liked a Space over 1 year ago

The Ultra-Scale Playbook

🌌

3.89k

The ultimate guide to training LLM on large GPU Clusters

liked a model over 1 year ago

deepseek-ai/DeepSeek-R1

Text Generation • 685B • Updated Mar 27, 2025 • 6.03M • • 13.4k

liked a dataset over 1 year ago

HuggingFaceFW/fineweb-2

Viewer • Updated Oct 27, 2025 • 4.48B • 94.9k • 822

liked 2 Spaces over 1 year ago

Number Tokenization Blog

📈

123

Explore how tokenization affects arithmetic in LLMs

Hub LFS Analysis

📈

An analysis of LFS files on the Hub.

liked a model over 1 year ago

GoToCompany/gemma2-9b-cpt-sahabatai-v1-instruct

9B • Updated Nov 6, 2024 • 1.05k • 47

liked a Space over 1 year ago

Sahabat-AI Chatbot (Gemma2 9b)

😻

Chatbot

liked 2 datasets over 1 year ago

indolem/IndoMMLU

Updated Oct 11, 2023 • 270 • 20

PleIAs/common_corpus

Viewer • Updated May 6 • 69.9k • 55k • 402

liked 3 Spaces over 1 year ago

Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks

📝

Evaluate multilingual models using FineTasks

TxT360: Trillion Extracted Text

📖

134

Explore the TxT360 LLM pre‑training dataset online

Model Memory Utility

🚀

1.01k

Calculate GPU memory needed for training Hugging Face models

liked a Space almost 2 years ago

FineWeb: decanting the web for the finest text data at scale

🍷

1.37k

Explore and download the FineWeb web‑scale text dataset

liked a model about 2 years ago

mistralai/Mistral-7B-Instruct-v0.2

Text Generation • 7B • Updated Jul 24, 2025 • 1.13M • • 3.16k

Garreth Lee PRO

AI & ML interests

Recent Activity

Organizations

garrethlee's activity

The Smol Training Playbook

Dia 1.6B

The Ultra-Scale Playbook

Number Tokenization Blog

Hub LFS Analysis

Sahabat-AI Chatbot (Gemma2 9b)

Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks

TxT360: Trillion Extracted Text

Model Memory Utility

FineWeb: decanting the web for the finest text data at scale