Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

prithivMLmods 
posted an update 3 days ago
view post
Post
4824
I've made 8 Spaces in the Qwen-Image-Edit series, and out of them, 5 Spaces reached “Space of the Week”! A few Spaces are still topping the list even after many months.

Cumulatively, the series has crossed 8.2 million+ ZeroGPU runs and nearly 4 million visitors overall.

Thanks for all the community support! 🤗❤️

🔗 Spaces: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
  • 4 replies
·
tomaarsen 
posted an update 6 days ago
view post
Post
389
🤗 Announcing the Ettin Reranker family: six new state-of-the-art CrossEncoder rerankers for search from 17M to 1B parameters, plus the full training data and the ~150-line recipe. Built on the Ettin ModernBERT encoders, Apache 2.0. Details:

All six were trained with the same single-stage pointwise MSE distillation recipe, with mixedbread-ai/mxbai-rerank-large-v2 (1.54B) as the teacher. Only the learning rate and per-device batch size change between sizes. The 1B student matches the teacher within 0.0001 NDCG@10 on MTEB(eng, v2) Retrieval, the 150M is the strongest reranker I tested in the under-600M range, and the 17M beats the 33M ms-marco-MiniLM-L12-v2 by +0.051 NDCG@10 at roughly half the parameter count.

Speed matters as much as quality for a reranker, since it determines whether the model fits the latency budget between retrieval and showing results. Our 17M is the fastest reranker in the whole comparison at 7517 pairs/sec on an H100. Our 150M runs 2.3x faster than the two other 150M ModernBERT-base rerankers (gte-reranker-modernbert-base and granite-embedding-reranker-english-r2) because the modular Transformer module propagates unpadded inputs through every layer rather than just the FA2 attention kernel. And our 1B is 2.4x faster than its 1.5B teacher while matching it on quality.

I bootstrapped the training recipe with the new train-sentence-transformers Agent Skill shipped in Sentence Transformers v5.5.0. Install it with hf skills add train-sentence-transformers --claude and ask Claude Code (or Codex / Cursor / Gemini CLI) to fine-tune a SentenceTransformer, CrossEncoder, or SparseEncoder model on your data.

I wrote a blog post walking through usage, results across six embedder pairings, the speed story, and the complete training script. Check it out, or just point your Agent to the URL:

https://huggingface.co/blog/ettin-reranker

Collection: https://huggingface.co/collections/cross-encoder/ettin-rerankers
kanaria007 
posted an update about 2 hours ago
view post
Post
10
✅ Article highlight: *World Event Oracles & Canonical History* (art-60-158, v0.1)

TL;DR:
This article asks a deceptively hard question for persistent worlds:

*What does it mean to say that something really happened?*

Its answer is strict: history is not whatever the lore team writes down. A world event becomes canonical only if a pinned *world event oracle* can classify it under a declared event class, evaluate explicit evidence thresholds, and emit an oracle-backed receipt. Otherwise it stays *PENDING* or *NON_CANONICAL*.

Read:
kanaria007/agi-structural-intelligence-protocols

Why it matters:
• turns “what happened” from narrative vibe into a governed decision surface
• separates canonical history from rumors, partial evidence, and unresolved events
• makes event classes, evidence thresholds, and canon rules explicit and versioned
• prevents retroactive lore rewrites unless reclassification is itself governed

What’s inside:
• a *world event oracle* that consumes receipts and decides canon status
• pinned *event classes* with schemas, required bindings, and threshold rules
• explicit threshold families for shard coverage, replay status, ledger support, monitoring, and disclosure
• oracle outputs like *CANONICAL*, *PENDING_VERIFICATION*, and *NON_CANONICAL*
• governed canon updates via CPO + shadow apply + reclassification verification

Key idea:
Do not say:

*“this is the official story.”*

Say:

*“this event entered canonical history because a pinned oracle evaluated this event class, under these thresholds, with these receipts, and found the claim admissible.”*

That is how “history” stops being storyline management and becomes a governed interface contract.
Kurapika993 
posted an update about 7 hours ago
view post
Post
30
Built a small Streamlit + CLI demo for generating context-dependent toxicity datasets using OpenAI models.

GitHub: https://github.com/Mayukhga83/Toximatics-Contextual-Toxicity-Data-Generator
Demo: https://toximatics-contextual-toxicity-data-generator-fnn9mzm7bkuzmta4.streamlit.app/


The core idea is that the same utterance can become toxic or benign depending on the surrounding social situation. With is generation framework you can create such datasets at scale.

The pipeline supports:

direct context augmentation given the seed utterance
new utterance-context pair generation given seed utterances
multistage generation for diverse examples
validation with a critic model
CSV / JSONL export

Example:

Utterance:
“You are so lucky to work from home.”

Benign context:
A friend congratulates someone on improved work-life balance.

Toxic context:
A colleague dismisses someone struggling with childcare and burnout.

The project is connected to recent work on contextual toxicity understanding https://aclanthology.org/2024.sigdial-1.65/.

MonsterMMORPG 
posted an update about 9 hours ago
view post
Post
46
Started ACESTEP 1.5 XL training research finally - https://www.patreon.com/posts/ace-step-1-5-xl-157675060

Locally on your Windows Computer with your Gaming GPU train your voice + music and use as you wish

Using 8x GPU server to find most optimal training parameters and hopefully gonna publish full tutorial

No synthID like 11 Labs
AxionLab-official 
posted an update about 20 hours ago
view post
Post
90
Someone ran Supra-50M-Instruct ON A 1GHZ 1999 CPU

https://www.reddit.com/r/LocalLLM/comments/1tm21ar/i_see_your_strix_halo_and_raise_you_a_vintage/

"As a fun experiment, I decided to try running the recently released Supra-50m on a 26-year-old machine I keep for retro Windows 9.X games. Although the model was somewhat silly and inconsistent, the performance wasn't bad, reaching around 1.3 tok/s with CPU inference alone.

Since this CPU doesn't have SSE2, I changed from llama.cpp to llama2.ce and asked Claude to write a custom tokenizer.

It's crazy to think that with the right file size of 200 MB, we could have experienced this magic back in 1999" - u/drone_stonks, r/localllm
pankajpandey-dev 
posted an update 2 days ago
view post
Post
138
Just released Qwen3-0.6B fine-tuned on Hindi instruction data 🇮🇳

✅ Full model: pankajpandey-dev/Qwen3-0.6B-Hindi-Instruct-v1
✅ GGUF versions (Q2/Q4/Q5/Q8): pankajpandey-dev/Qwen3-0.6B-Hindi-Instruct-v1-GGUF

Smallest Hindi-capable GGUF — runs on any laptop at 0.37GB.
Next: v2 with more data, better responses.

#Hindi #LLM #GGUF #OpenSource
Juanxi 
posted an update 2 days ago
view post
Post
131
🧐 BlogXiv provides a scholarly discovery interface for researchers to trace emerging ideas, compare technical perspectives, and engage with high-quality research communication.

Welcome any contributions such as star and submissions, thanks

Github: https://github.com/OpenEnvision/BlogXiv
Website: https://openenvision.github.io/BlogXiv/
kanaria007 
posted an update 2 days ago
view post
Post
116
✅ Article highlight: *Real-Scale World Simulation Game* (art-60-157, v0.1)

TL;DR:
This article asks what it would take to build a “real SAO-like” world without hand-wavy magic.

The answer is not unlimited freedom. It is a *persistent world with bounded agency*: NPCs can act, form societies, trade, govern, and shape history—but only through pinned profiles, CAS state, ledgers, receipts, and replayable world history. In other words: a living world is believable only if it is governable.

Read:
kanaria007/agi-structural-intelligence-protocols

Why it matters:
• shows how to move from “match fairness” to “world-history fairness”
• treats NPC societies as bounded agents rather than decorative scripts
• makes laws, markets, factions, and institutions explicit state layers instead of lore vibes
• explains why “living world” claims need receipts, replay, and anti-abuse monitoring

What’s inside:
• layered world state as CAS: *physics, economy, society, institution, narrative*
• NPCs as receipted bounded agents with observation, action, and resource limits
• institution ledgers for law, market rules, faction control, and world governance
• world replay as *history reproduction*, not just match replay
• adversary monitoring for griefing, market rigging, propaganda, and governance capture
• unique-entity / ownership / transfer receipts for “only one in the world” style claims

Key idea:
Do not say:

*“the world feels alive.”*

Say:

*“this world evolved through a receipted, bounded-agency closed loop: state, NPC decisions, player actions, institutional transitions, replay, monitoring, and publication rules.”*

That is how a persistent world becomes believable without becoming ungovernable.
juiceb0xc0de 
posted an update 3 days ago
view post
Post
148
Gemma-4-E2B SAE Atlas — Work in Progress

JumpReLU Sparse Autoencoders trained on every layer of Gemma-4-E2B-it using an adaptive Lagrangian controller. Training in progress. I'm publishing layers live as they come hot off the press for anyone interested in following along. I will be making further adjustments for finer resolution but the early data should be helpful I think? I'm just a bartender don't trust everything I say. 🤗 The Lagrangian math is pretty cool. It auto-steers the trainer taking the guess work out of hyperparameter adjustments.

Full paper and methodology when ever I get around to writing it up. There's a lot of work to be done. For now though, enjoy! 🤗

juiceb0xc0de/gemma-4-e2b-saes
  • 3 replies
·