AI & ML interests

Infrastructure, integrations and tooling for the Hugging Face ecosystem.

Recent Activity

AINovice2005Β  published a Space 2 days ago
the-hf-stack/README
View all activity

AINovice2005Β 
published a Space 2 days ago
AINovice2005Β 
posted an update 12 days ago
AINovice2005Β 
posted an update about 1 month ago
view post
Post
170
I've built a system to make open-source contributions easier to understand across repositories.

It:

aggregates merged external PRs (reviewed by maintainers)
structures them into a single contributions.md
adds a lightweight AI layer to query patterns and impact

The idea is to move from scattered PRs to a readable changelog of work.

Read about it: https://medium.com/@paragekbote23/from-commits-to-impact-building-an-automated-changelog-for-open-source-contributions-20cdfebcee58
AINovice2005Β 
posted an update about 2 months ago
view post
Post
3509
In celebration of the new storage graph feature on the Hub, here's mine 😊 :


Post inspired by @ZennyKenny
AINovice2005Β 
posted an update 2 months ago
view post
Post
155
I recently created my first storage bucket to store experiment data of my performance analysis of 15 tokenizers across 20 languages.

The setup is simple enough for a new product and can be scalable depending on the use-case πŸ€— .

Bucket: https://huggingface.co/buckets/AINovice2005/tokenizer-benchmark

github gist: https://gist.github.com/ParagEkbote/b3877f667f84cbb9a27bdaca94ba662a

Article: https://medium.com/@paragekbote23/one-sentence-fifteen-tokenizers-a-tokenizer-benchmarking-pipeline-with-hf-storage-buckets-2e59790276fd
AINovice2005Β 
posted an update 2 months ago
AINovice2005Β 
posted an update 2 months ago
view post
Post
96
Pro tip2: You can treat HF datasets as versioned repos by pinning a specific revision (tag, branch or commit) when downloading files. 🧠

This ensures your data processing pipelines always use the exact dataset state before passing the data to the model. It enables reproducible pipelines and allows for reliable outputs of your ML system.

from huggingface_hub import hf_hub_download

data_path = hf_hub_download(
    repo_id="lysandre/arxiv-nlp",
    filename="train.parquet",
    repo_type="dataset",
    revision="877b84a8f93f2d619faa2a6e514a32beef88ab0a"
)
AINovice2005Β 
posted an update 2 months ago
AINovice2005Β 
posted an update 2 months ago
view post
Post
478
Just published my first cuda kernel, inspired by Sage Attention. Feel free to try it out ☺️

AINovice2005/attention-int8
  • 2 replies
Β·