**A collection of 8 code models (3B–20B) trained to behave like a security reviewer.**
## The Problem
Code assistants frequently recommend patterns that pass tests but fail security review—string-built SQL, brittle auth logic, unsafe parsing, insecure defaults, and more. I built SecureCode to address this gap.
You are a senior application security engineer. Review the code below.
Output:
(1) findings with severity,
(2) likely exploit scenarios (high level),
(3) secure rewrite,
(4) defense-in-depth recommendations,
(5) regression tests/checks.
Code: `...`
## Dataset Coverage
SecureCode covers both traditional and emerging security domains: - **Traditional web security** (OWASP Top 10 2021) - **AI/ML security** (OWASP LLM Top 10 2025): prompt injection, RAG poisoning, model extraction, agentic AI patterns
## We Want Your Feedback
We're looking for real-world contributions:
- **Real snippets**: Share code that "slipped through review once" (sanitized is fine) - **False positives/negatives**: What didn't work as expected? - **CVE-grounded examples**: New vulnerability patterns you've encountered
**Please include**: language/framework + what the correct remediation looks like in your environment.
---
**Have contributions or suggestions?** I'd be happy to hear them. Thanks for your support!
We’ve released two conversational speech datasets from oto on Hugging Face 🤗 Both are based on real, casual, full-duplex conversations, but with slightly different focuses.
Dataset 1: Processed / curated subset otoearth/otoSpeech-full-duplex-processed-141h * Full-duplex, spontaneous multi-speaker conversations * Participants filtered for high audio quality * PII removal and audio enhancement applied * Designed for training and benchmarking S2S or dialogue models
Dataset 2: Larger raw(er) release otoearth/otoSpeech-full-duplex-280h * Same collection pipeline, with broader coverage * More diversity in speakers, accents, and conversation styles * Useful for analysis, filtering, or custom preprocessing experiments
We intentionally split the release to support different research workflows: clean and ready-to-use vs. more exploratory and research-oriented use.
The datasets are currently private, but we’re happy to approve access requests — feel free to request access if you’re interested.
If you’re working on speech-to-speech (S2S) models or are curious about full-duplex conversational data, we’d love to discuss and exchange ideas together.