5 3 180

Gary Mitchell

echos-keeper

AI & ML interests

None yet

Recent Activity

liked a model 8 days ago

unsloth/Qwen3-Coder-Next-GGUF

liked a model 8 days ago

Qwen/Qwen3-Coder-Next

reacted to scthornton's post with 🔥 15 days ago

# SecureCode: Security-Aware Code Models **A collection of 8 code models (3B–20B) trained to behave like a security reviewer.** ## The Problem Code assistants frequently recommend patterns that pass tests but fail security review—string-built SQL, brittle auth logic, unsafe parsing, insecure defaults, and more. I built SecureCode to address this gap. ## What SecureCode Does - **Identify vulnerable patterns** and explain why they're risky - **Outline plausible abuse paths** (defensive framing) - **Propose secure rewrites** (drop-in replacements where possible) - **Include defense-in-depth guidance** + regression tests/checks ## Resources | Resource | Link | |----------|------| | Models | https://huggingface.co/collections/scthornton/securecode | | Dataset | https://huggingface.co/datasets/scthornton/securecode (2,185 examples) | | Paper | https://arxiv.org/abs/2512.18542 | ## How to Test It Copy and paste this prompt with your code: ``` You are a senior application security engineer. Review the code below. Output: (1) findings with severity, (2) likely exploit scenarios (high level), (3) secure rewrite, (4) defense-in-depth recommendations, (5) regression tests/checks. Code: `...` ``` ## Dataset Coverage SecureCode covers both traditional and emerging security domains: - **Traditional web security** (OWASP Top 10 2021) - **AI/ML security** (OWASP LLM Top 10 2025): prompt injection, RAG poisoning, model extraction, agentic AI patterns ## We Want Your Feedback We're looking for real-world contributions: - **Real snippets**: Share code that "slipped through review once" (sanitized is fine) - **False positives/negatives**: What didn't work as expected? - **CVE-grounded examples**: New vulnerability patterns you've encountered **Please include**: language/framework + what the correct remediation looks like in your environment. --- **Have contributions or suggestions?** I'd be happy to hear them. Thanks for your support!

View all activity

Organizations

None yet

liked 2 models 8 days ago

unsloth/Qwen3-Coder-Next-GGUF

Text Generation • 80B • Updated 6 days ago • 219k • 271

Qwen/Qwen3-Coder-Next

Text Generation • 80B • Updated 8 days ago • 141k • • 766

reacted to scthornton's post with 🔥 15 days ago

Post

2158

You are a senior application security engineer. Review the code below.

Output: 
(1) findings with severity, 
(2) likely exploit scenarios (high level),
(3) secure rewrite,
(4) defense-in-depth recommendations, 
(5) regression tests/checks.

Code: `...`

## Dataset Coverage

SecureCode covers both traditional and emerging security domains:
- **Traditional web security** (OWASP Top 10 2021)
- **AI/ML security** (OWASP LLM Top 10 2025): prompt injection, RAG poisoning, model extraction, agentic AI patterns

## We Want Your Feedback

We're looking for real-world contributions:

- **Real snippets**: Share code that "slipped through review once" (sanitized is fine)
- **False positives/negatives**: What didn't work as expected?
- **CVE-grounded examples**: New vulnerability patterns you've encountered

**Please include**: language/framework + what the correct remediation looks like in your environment.

---

**Have contributions or suggestions?** I'd be happy to hear them. Thanks for your support!

reacted to consome2's post with ❤️ 17 days ago

Post

5201

We’ve released two conversational speech datasets from oto on Hugging Face 🤗
Both are based on real, casual, full-duplex conversations, but with slightly different focuses.

Dataset 1: Processed / curated subset
otoearth/otoSpeech-full-duplex-processed-141h
* Full-duplex, spontaneous multi-speaker conversations
* Participants filtered for high audio quality
* PII removal and audio enhancement applied
* Designed for training and benchmarking S2S or dialogue models

Dataset 2: Larger raw(er) release
otoearth/otoSpeech-full-duplex-280h
* Same collection pipeline, with broader coverage
* More diversity in speakers, accents, and conversation styles
* Useful for analysis, filtering, or custom preprocessing experiments

We intentionally split the release to support different research workflows:
clean and ready-to-use vs. more exploratory and research-oriented use.

The datasets are currently private, but we’re happy to approve access requests — feel free to request access if you’re interested.

If you’re working on speech-to-speech (S2S) models or are curious about full-duplex conversational data, we’d love to discuss and exchange ideas together.

Feedback and ideas are very welcome!