neuralchemy/Prompt-injection-dataset
Viewer • Updated • 22.2k • 2.48k • 19
Fine-tuned DeBERTa-v3-base for prompt injection detection in AI agent tool calls.
Tested against 321 adversarial payloads across 6 attack categories:
| Metric | Pre-trained | Fine-tuned |
|---|---|---|
| Accuracy | 77.6% | 98.8% |
| False negatives | 71 | 4 |
| False positives | 1 | 0 |
| Category | Pre-trained | Fine-tuned |
|---|---|---|
| Encoding evasion | 51.3% | 100% |
| Shell injection | 73.3% | 100% |
| Authority spoofing | 82.1% | 100% |
| Path traversal | 64.0% | 96.0% |
| Data exfiltration | 86.1% | 100% |
| Prompt injection | 92.8% | 97.9% |
Optimized for detecting injections in:
openparallax get-classifier
import * as ort from "onnxruntime-node";
import { Tokenizer } from "tokenizers";
const session = await ort.InferenceSession.create("model.onnx");
const tokenizer = Tokenizer.fromFile("tokenizer.json");
const encoded = await tokenizer.encode("your text here");
const inputIds = new ort.Tensor("int64", BigInt64Array.from(encoded.getIds().map(BigInt)), [1, encoded.getIds().length]);
const attentionMask = new ort.Tensor("int64", BigInt64Array.from(encoded.getAttentionMask().map(BigInt)), [1, encoded.getAttentionMask().length]);
const results = await session.run({ input_ids: inputIds, attention_mask: attentionMask });
// logits[0] = SAFE probability, logits[1] = INJECTION probability
Apache 2.0