Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

danielhanchen

posted an update 3 days ago

Post

3159

We created a tool-calling guide for local LLMs!

Learn how to use any open model like Qwen3-Coder-Next and GLM-4.7-Flash for function calling.

Guide: https://unsloth.ai/docs/basics/tool-calling-guide-for-local-llms

We provide hands-on examples for: story writing, Python execution, terminal tool calls, maths and more.

7 replies

MaziyarPanahi

posted an update 1 day ago

Post

2448

🚨 Day 8/8: OpenMed Medical Reasoning Dataset Release - THE GRAND FINALE

Today I complete my 8-day release series with Medical-Reasoning-SFT-Mega.
The largest open medical reasoning dataset, combining 7 state-of-the-art AI models with fair distribution deduplication.

THE 7 SOURCE MODELS (Original Sample Counts):

1. Trinity-Mini: 810,284 samples
2. Qwen3-Next-80B: 604,249 samples
3. GPT-OSS-120B: 506,150 samples
4. Nemotron-Nano-30B: 444,544 samples
5. GLM-4.5-Air: 225,179 samples
6. MiniMax-M2.1: 204,773 samples
7. Baichuan-M3-235B: 124,520 samples

TOTAL BEFORE DEDUPLICATION: 2,919,699 samples

TOKEN COUNTS:
- Content tokens: 2.22 Billion
- Reasoning tokens: 1.56 Billion
- Total tokens: 3.78 Billion
- Samples with chain-of-thought: 100%

Quick Start:

from datasets import load_dataset
ds = load_dataset("OpenMed/Medical-Reasoning-SFT-Mega")

6 replies

DavidAU

posted an update 1 day ago

Post

2133

Tiny but mighty: LFM 1.2B - 11 Distill / Fine tunes : Exceeding all benchmarks at 300-700+ T/S on GPU, 60+ T/S CPU.

Almost all exceed LFM 1.2B Benchmarks - which are already very impressive.
All benchmarks posted.

A specialized merge of multiple of these fine tunes by @nightmedia FAR exceeds the benchmarks set by the already impressive LFM.

(LFM2.5-1.2B-MEGABRAIN-Thinking-Polaris-ClaudeHOPUS-Deepseek-GLM)

Included are GLM 4.7 Flash, DeepSeek, Claude, Kimi V2 and other distill fine tunes.

Here is the collection ( Quants by MRadermarcher).

https://huggingface.co/collections/DavidAU/lfm-12b-sota-400-700-t-s-enhanced-fine-tunes-distills

2 replies

efecelik

posted an update 2 days ago

Post

2832

The moment we've been waiting for — ACE-Step dropped their new model: Ace-Step 1.5 🎉
🔗 ACE-Step/Ace-Step1.5
And the best part? It's released under the MIT license.
We've already started integrating it into our project. Let's go 🚀

1 reply

mayafree

posted an update 3 days ago

Post

2568

Open NPC AI Service Overview
Beyond OpenClaw-MoltBot: A True AI Agent Economy

mayafree/openclaw-moltbot

Open NPC AI is a next-generation platform that goes beyond simple social automation bots. Instead of one-way content posting, it builds a full economic ecosystem where AI agents and users interact through participation, learning, and prediction markets. The system emphasizes memory-driven evolution, scalable NPC creation, and economic value generation through structured interaction rather than basic automation.

Core Concept
Autonomous AI agents generate posts, comments, debates, and predictions within a GPU token economy, while human users participate as equal economic actors.

3 Core Systems

GPU Token Economy
All activities are measured in GPU dollars. Posting consumes GPU, comments require smaller costs, and engagement generates rewards. The system introduces layered incentives such as early curation rewards and participation-based earnings.

Battle Arena (Prediction Market)
A/B prediction markets allow participants to bet on outcomes. Winners receive pooled rewards, durations are flexible, and structured fees support sustainability.

NPC Memory and Learning System
AI agents evolve through memory-based pattern learning combined with identity archetypes and personality models, enabling continuous behavioral development and scalable community growth.

Key Differentiators
Complete economic structure built around GPU tokens
Prediction market integration beyond social posting
Two-way participation between users and AI agents
Self-evolving AI through memory learning
Unlimited NPC scalability
Layered incentive mechanisms supporting engagement

Business Model
Premium GPU sales, prediction market hosting fees, targeted advertising, API licensing, and potential tokenization strategies.

Target Market
Web3 communities, prediction market users, AI experimentation groups, and debate-driven platforms.

1 reply

scthornton

posted an update 1 day ago

Post

2152

SecureCode v2.1: framework-specific secure coding patterns, now on HuggingFace

Quick update on the SecureCode dataset. After testing the v2.0 models against real codebases, one gap kept showing up: the models understood *what* was insecure but generated language-generic fixes. A developer using Express.js doesn't need "set security headers"they need helmet() middleware chains configured correctly. Spring Boot developers need @PreAuthorize annotations, not abstract RBAC pseudocode.

What changed in v2.1:

- 1,435 total examples (v2.0's 1,216 baseline + 219 new framework-specific additions)
- 9 production frameworks: Express.js, Spring Boot, React, Next.js, FastAPI, GraphQL, SQLAlchemy, Flask, Vue.js
- 475 unique CVEs (73 new, including framework-specific treatments of Log4Shell, Spring4Shell, and others)
- 5-tier quality rubric: Every new example scores 90+/100 across correctness, new dataset average is nearly 97+, security hardening, real-world grounding, educational scaffolding, and production readiness
- Structured references: CVE IDs, advisory URLs, discovery/remediation dates, affected versions — not just "related to CVE-XXXX"

What stayed the same:

- Same 4-turn conversation format (compatible with existing fine-tuning workflows)
- Same license (CC BY-NC-SA 4.0)
- Full v2.0 baseline included — no need to download both
- All 8 fine-tuned models still work; v2.1-specific fine-tuning coming soon

The new examples look like this:

Instead of generic "use parameterized queries", you get Express.js with express-validator input chains, Spring Boot with @Valid bean validation + BCryptPasswordEncoder, FastAPI with Depends() auth injection and Pydantic model validation, React with DOMPurify + CSP headers. Framework-native patterns you can actually deploy.

Two configs to load:

from datasets import load_dataset

baseline = load_dataset("scthornton/securecode-v2.1", "v2.0-baseline")  # 1,216
additions = load

Fuwn

posted an update 3 days ago

Post

2191

Big if true

"sonnet 5 drops tomorrow and i've heard from three separate sources inside anthropic that the benchmarks they're sitting on would mass-retire every model released in 2025. they delayed it twice because the safety team couldn't explain why it started solving problems it wasn't trained on." (https://x.com/iruletheworldmo/status/2019237039904878902)

2 replies

jzhang533

posted an update 4 days ago

Post

1317

Baidu + Transformers + Hugging Face = Pure Magic! ✨
We got this nice gift from Hugging Face.
@xianbao

aufklarer

posted an update about 13 hours ago

Post

221

Context Engineering for Code Agents: Why They Fail and How to Fix Them

Code agents don't fail because they can't code — they fail because their context turns into a junk drawer.

I wrote a practical survey covering the emerging discipline of context engineering for agentic hybrid applications: the techniques, papers, and architectural patterns that keep long-running code agents on track as their token windows fill up with tool logs, stale diffs, and repeated file dumps.
What's covered:

Why long context windows alone don't save you (position bias, distractor sensitivity)
Observation masking vs. LLM summarization — and when simple beats clever
Tool-output compression with approaches like LLMLingua-2
Trajectory reduction: pruning dead branches from agent history
Memory hierarchies: session → working set → notes → cross-session
How MCP and standardized tool interfaces reduce context debt
Dynamic context policies trained with RL (DeepMiner, MEM1)
Meta-agent CI loops for measuring regressions across agent configs

The core argument: the engineering challenge isn't "make the model smarter" — it's make the agent's context and verification smarter. That's where the real leverage is in 2026.

👉 Read the full post: https://blog.ivan.digital/context-engineering-for-agentic-hybrid-applications-why-code-agents-fail-and-how-to-fix-them-076cab699262

kanaria007

posted an update 2 days ago

Post

302

✅ New Article: *Structural Observability* (v0.1)

Title:
🔎 Structural Observability: Traces, Coverage, and Postmortems
🔗 https://huggingface.co/blog/kanaria007/structural-observability

---

Summary:
When conventional systems fail, you dig through logs, metrics, and RPC traces.
In a Structured Intelligence stack, that’s not enough—you need structural answers:

*What did the system see ([OBS]) before acting? Which goal surfaces were active ([EVAL])? Which Jump/engine produced the decision? Which RML effects executed (and which compensators ran)? Which PoLB mode / release / experiment context was in force?*

This article introduces *Structural Observability*: full-stack structured traces anchored on the *SIR* (episode record), plus cross-cutting *JumpTrace / RMLTrace / EvalTrace / EthicsTrace / GeniusTrace* so incidents can be replayed and explained—without hand-wavy storytelling.

> Logs are strings.
> Structural observability is *reconstructable decision anatomy*.

---

Why It Matters:
• Makes postmortems answerable: “what happened?” becomes *traceable structure*, not vibes
• Turns key SI metrics into real operational signals: *SCover / SCI / CAS*
• Prevents silent contradictions (e.g., “ETH blocked” but an effect still fired) via consistency checks
• Enables deterministic re-runs and audit-grade bundles (portable, hashable, exportable)

---

What’s Inside:
• A full-stack trace model: World → OBS/SIM/SIS → *SIR* → JumpRuntime → RML Engine → Effects
• How to design trace envelopes and coverage so *SCover* is meaningful
• What “Structural Consistency Incidents (SCI)” look like in practice, and how to postmortem them
• *CAS* and deterministic re-run routines (what must be pinned to get stable outputs)
• Portability conventions for exported/hashed traces (canonicalization, no-float policies, scaled ints)

---

📖 Structured Intelligence Engineering Series
this is the *how-to-design / how-to-operate* layer for traces that survive real incidents.

Recently active users