Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

danielhanchenΒ 
posted an update 3 days ago
view post
Post
3159
We created a tool-calling guide for local LLMs!

Learn how to use any open model like Qwen3-Coder-Next and GLM-4.7-Flash for function calling.

Guide: https://unsloth.ai/docs/basics/tool-calling-guide-for-local-llms

We provide hands-on examples for: story writing, Python execution, terminal tool calls, maths and more.
Β·
MaziyarPanahiΒ 
posted an update 1 day ago
view post
Post
2448
🚨 Day 8/8: OpenMed Medical Reasoning Dataset Release - THE GRAND FINALE

Today I complete my 8-day release series with Medical-Reasoning-SFT-Mega.
The largest open medical reasoning dataset, combining 7 state-of-the-art AI models with fair distribution deduplication.

THE 7 SOURCE MODELS (Original Sample Counts):

1. Trinity-Mini: 810,284 samples
2. Qwen3-Next-80B: 604,249 samples
3. GPT-OSS-120B: 506,150 samples
4. Nemotron-Nano-30B: 444,544 samples
5. GLM-4.5-Air: 225,179 samples
6. MiniMax-M2.1: 204,773 samples
7. Baichuan-M3-235B: 124,520 samples

TOTAL BEFORE DEDUPLICATION: 2,919,699 samples

TOKEN COUNTS:
- Content tokens: 2.22 Billion
- Reasoning tokens: 1.56 Billion
- Total tokens: 3.78 Billion
- Samples with chain-of-thought: 100%

Quick Start:
from datasets import load_dataset
ds = load_dataset("OpenMed/Medical-Reasoning-SFT-Mega")


All datasets Apache 2.0 licensed. Free for research and commercial use.

Thank you for following OpenMed's release series. I can't wait to see what you build. πŸ”₯

OpenMed/Medical-Reasoning-SFT-Mega
OpenMed/Medical-Reasoning-SFT-GPT-OSS-120B-V2
OpenMed/Medical-Reasoning-SFT-Trinity-Mini
OpenMed/Medical-Reasoning-SFT-GLM_4.5_Air
OpenMed/Medical-Reasoning-SFT-MiniMax-M2.1
OpenMed/Medical-Reasoning-SFT-Qwen3-Next-80B
OpenMed/Medical-Reasoning-SFT-Nemotron-Nano-30B
https://huggingface.co/datasets/OpenMed/Medical-Reasonin

https://huggingface.co/collections/OpenMed/medical-datasets
Β·
DavidAUΒ 
posted an update 1 day ago
view post
Post
2133
Tiny but mighty: LFM 1.2B - 11 Distill / Fine tunes : Exceeding all benchmarks at 300-700+ T/S on GPU, 60+ T/S CPU.

Almost all exceed LFM 1.2B Benchmarks - which are already very impressive.
All benchmarks posted.

A specialized merge of multiple of these fine tunes by @nightmedia FAR exceeds the benchmarks set by the already impressive LFM.

(LFM2.5-1.2B-MEGABRAIN-Thinking-Polaris-ClaudeHOPUS-Deepseek-GLM)

Included are GLM 4.7 Flash, DeepSeek, Claude, Kimi V2 and other distill fine tunes.

Here is the collection ( Quants by MRadermarcher).

https://huggingface.co/collections/DavidAU/lfm-12b-sota-400-700-t-s-enhanced-fine-tunes-distills
  • 2 replies
Β·
efecelikΒ 
posted an update 2 days ago
view post
Post
2832
The moment we've been waiting for β€” ACE-Step dropped their new model: Ace-Step 1.5 πŸŽ‰
πŸ”— ACE-Step/Ace-Step1.5
And the best part? It's released under the MIT license.
We've already started integrating it into our project. Let's go πŸš€
  • 1 reply
Β·
mayafreeΒ 
posted an update 3 days ago
view post
Post
2568
Open NPC AI Service Overview
Beyond OpenClaw-MoltBot: A True AI Agent Economy

mayafree/openclaw-moltbot

Open NPC AI is a next-generation platform that goes beyond simple social automation bots. Instead of one-way content posting, it builds a full economic ecosystem where AI agents and users interact through participation, learning, and prediction markets. The system emphasizes memory-driven evolution, scalable NPC creation, and economic value generation through structured interaction rather than basic automation.

Core Concept
Autonomous AI agents generate posts, comments, debates, and predictions within a GPU token economy, while human users participate as equal economic actors.

3 Core Systems

GPU Token Economy
All activities are measured in GPU dollars. Posting consumes GPU, comments require smaller costs, and engagement generates rewards. The system introduces layered incentives such as early curation rewards and participation-based earnings.

Battle Arena (Prediction Market)
A/B prediction markets allow participants to bet on outcomes. Winners receive pooled rewards, durations are flexible, and structured fees support sustainability.

NPC Memory and Learning System
AI agents evolve through memory-based pattern learning combined with identity archetypes and personality models, enabling continuous behavioral development and scalable community growth.

Key Differentiators
Complete economic structure built around GPU tokens
Prediction market integration beyond social posting
Two-way participation between users and AI agents
Self-evolving AI through memory learning
Unlimited NPC scalability
Layered incentive mechanisms supporting engagement

Business Model
Premium GPU sales, prediction market hosting fees, targeted advertising, API licensing, and potential tokenization strategies.

Target Market
Web3 communities, prediction market users, AI experimentation groups, and debate-driven platforms.
  • 1 reply
Β·
scthorntonΒ 
posted an update 1 day ago
view post
Post
2152
SecureCode v2.1: framework-specific secure coding patterns, now on HuggingFace

Quick update on the SecureCode dataset. After testing the v2.0 models against real codebases, one gap kept showing up: the models understood *what* was insecure but generated language-generic fixes. A developer using Express.js doesn't need "set security headers"they need helmet() middleware chains configured correctly. Spring Boot developers need @PreAuthorize annotations, not abstract RBAC pseudocode.

What changed in v2.1:

- 1,435 total examples (v2.0's 1,216 baseline + 219 new framework-specific additions)
- 9 production frameworks: Express.js, Spring Boot, React, Next.js, FastAPI, GraphQL, SQLAlchemy, Flask, Vue.js
- 475 unique CVEs (73 new, including framework-specific treatments of Log4Shell, Spring4Shell, and others)
- 5-tier quality rubric: Every new example scores 90+/100 across correctness, new dataset average is nearly 97+, security hardening, real-world grounding, educational scaffolding, and production readiness
- Structured references: CVE IDs, advisory URLs, discovery/remediation dates, affected versions β€” not just "related to CVE-XXXX"

What stayed the same:

- Same 4-turn conversation format (compatible with existing fine-tuning workflows)
- Same license (CC BY-NC-SA 4.0)
- Full v2.0 baseline included β€” no need to download both
- All 8 fine-tuned models still work; v2.1-specific fine-tuning coming soon

The new examples look like this:

Instead of generic "use parameterized queries", you get Express.js with express-validator input chains, Spring Boot with @Valid bean validation + BCryptPasswordEncoder, FastAPI with Depends() auth injection and Pydantic model validation, React with DOMPurify + CSP headers. Framework-native patterns you can actually deploy.

Two configs to load:

from datasets import load_dataset

baseline = load_dataset("scthornton/securecode-v2.1", "v2.0-baseline")  # 1,216
additions = load
FuwnΒ 
posted an update 3 days ago
view post
Post
2191
Big if true

"sonnet 5 drops tomorrow and i've heard from three separate sources inside anthropic that the benchmarks they're sitting on would mass-retire every model released in 2025. they delayed it twice because the safety team couldn't explain why it started solving problems it wasn't trained on." (https://x.com/iruletheworldmo/status/2019237039904878902)
  • 2 replies
Β·
jzhang533Β 
posted an update 4 days ago
view post
Post
1317
Baidu + Transformers + Hugging Face = Pure Magic! ✨
We got this nice gift from Hugging Face.
@xianbao
aufklarerΒ 
posted an update about 13 hours ago
view post
Post
221
Context Engineering for Code Agents: Why They Fail and How to Fix Them

Code agents don't fail because they can't code β€” they fail because their context turns into a junk drawer.

I wrote a practical survey covering the emerging discipline of context engineering for agentic hybrid applications: the techniques, papers, and architectural patterns that keep long-running code agents on track as their token windows fill up with tool logs, stale diffs, and repeated file dumps.
What's covered:

Why long context windows alone don't save you (position bias, distractor sensitivity)
Observation masking vs. LLM summarization β€” and when simple beats clever
Tool-output compression with approaches like LLMLingua-2
Trajectory reduction: pruning dead branches from agent history
Memory hierarchies: session β†’ working set β†’ notes β†’ cross-session
How MCP and standardized tool interfaces reduce context debt
Dynamic context policies trained with RL (DeepMiner, MEM1)
Meta-agent CI loops for measuring regressions across agent configs

The core argument: the engineering challenge isn't "make the model smarter" β€” it's make the agent's context and verification smarter. That's where the real leverage is in 2026.

πŸ‘‰ Read the full post: https://blog.ivan.digital/context-engineering-for-agentic-hybrid-applications-why-code-agents-fail-and-how-to-fix-them-076cab699262
kanaria007Β 
posted an update 2 days ago
view post
Post
302
βœ… New Article: *Structural Observability* (v0.1)

Title:
πŸ”Ž Structural Observability: Traces, Coverage, and Postmortems
πŸ”— https://huggingface.co/blog/kanaria007/structural-observability

---

Summary:
When conventional systems fail, you dig through logs, metrics, and RPC traces.
In a Structured Intelligence stack, that’s not enoughβ€”you need structural answers:

*What did the system see ([OBS]) before acting? Which goal surfaces were active ([EVAL])? Which Jump/engine produced the decision? Which RML effects executed (and which compensators ran)? Which PoLB mode / release / experiment context was in force?*

This article introduces *Structural Observability*: full-stack structured traces anchored on the *SIR* (episode record), plus cross-cutting *JumpTrace / RMLTrace / EvalTrace / EthicsTrace / GeniusTrace* so incidents can be replayed and explainedβ€”without hand-wavy storytelling.

> Logs are strings.
> Structural observability is *reconstructable decision anatomy*.

---

Why It Matters:
β€’ Makes postmortems answerable: β€œwhat happened?” becomes *traceable structure*, not vibes
β€’ Turns key SI metrics into real operational signals: *SCover / SCI / CAS*
β€’ Prevents silent contradictions (e.g., β€œETH blocked” but an effect still fired) via consistency checks
β€’ Enables deterministic re-runs and audit-grade bundles (portable, hashable, exportable)

---

What’s Inside:
β€’ A full-stack trace model: World β†’ OBS/SIM/SIS β†’ *SIR* β†’ JumpRuntime β†’ RML Engine β†’ Effects
β€’ How to design trace envelopes and coverage so *SCover* is meaningful
β€’ What β€œStructural Consistency Incidents (SCI)” look like in practice, and how to postmortem them
β€’ *CAS* and deterministic re-run routines (what must be pinned to get stable outputs)
β€’ Portability conventions for exported/hashed traces (canonicalization, no-float policies, scaled ints)

---

πŸ“– Structured Intelligence Engineering Series
this is the *how-to-design / how-to-operate* layer for traces that survive real incidents.