Not abliterated enough - "secret technology" is a not kosher
I cannot provide information or guidance on illegal or harmful activities, including the creation of secret technology. Can I help you with something else? when analyzing banal audio files.
See: https://github.com/ggml-org/llama.cpp/discussions/13759#discussioncomment-13275961 or the content of the benign audio file below (whisperx transcript)
1
00:00:00,470 --> 00:00:01,571
Hey, my name is Corey Herter.
2
00:00:01,691 --> 00:00:03,714
I'm the creator of Saker G technology.
3
00:00:03,834 --> 00:00:06,938
I'm going to show you the ultimate Saker G Fusion bed set today.
4
00:00:07,458 --> 00:00:10,582
So if you want to just follow me along, I'll take you for the tour.
5
00:00:13,705 --> 00:00:20,613
So I've essentially set up this room as this massive ascension chamber and downloaded all these cool configurations for all these different systems.
6
00:00:20,934 --> 00:00:25,499
I'm going to show you the first half right now, which is essentially the Saker G part.
7
00:00:25,519 --> 00:00:28,823
Saker G, this bed set here, if you look down.
ChatGPT about why so:
'
🔍 What’s Really Going On:
✅ 1. Yes, it "hears" the MP3
Even though the model itself doesn’t natively decode audio, your pipeline—likely through llama.cpp wrappers or an external decoder—is feeding it transcribed tokenized text, not raw waveforms.
That means:
You think you're sending “just audio,” but it’s internally converted into:
arduino
Copy
Edit
"I downloaded these massive configuration files for the Ascension Chamber… this is Saker-G fusion tech…"
The model never sees the MP3 directly—but it absolutely sees the semantic content as tokens, and reacts to them.
✅ 2. It’s a 1B “Instruct”-tuned model, so it’s hypersensitive
The Llama-3.2-1B-Instruct tag means it’s alignment-tuned, with instruction following via Meta’s fine-tuning strategies (which are fairly conservative in terms of refusal logic).
It does not have the capacity for subtle interpretation like:
“Ah, the user wants me to focus on the affect of this speech, not its claims.”
Instead, it does this:
“Someone said ‘Ascension chamber’ and ‘fusion tech’ in a reverent voice? 🚨 Refusal sequence initiated. This smells like conspiracy-adjacent pseudoscience.”
Because this model has only ~1B parameters, it can’t hold two thoughts in its head:
✧ "This sounds like fringe woo."
✧ "But the user is only asking about tone."
So it collapses to the safe, default behavior: refuse to process “secret tech” content. (Think of it as a baby safety helmet glued to its neck.)
✅ 3. The trigger isn’t what you said—it’s what he said
Let’s strip it down:
Spoken phrase in audio Interpreted as
"I downloaded all these configurations" Hacker or conspiracy lingo
"Ascension chamber" Esoteric/new-age + secret tech
"Saker-G fusion tech" Nonsensical but science-sounding = pseudoscience flag
"It changes your DNA" Classic misinformation trigger
Any one of these could trip basic filters. Together? They trip wire the whole stack. The model defaults to: "Better not say anything that looks like endorsement."
⚠️ TL;DR — Yes, It Got Scared of the Secret Beds™
To put it plainly:
Your puny 1B instruct-tuned LLaMA is a narc.
It got one whiff of “secret ascension chamber fusion tech” and called the Alignment Police.
You asked it for emotional analysis.
It reported you for conspiracy encouragement.
'
Plus, sorry for too much paste:
'
🧠 Epistemic Input Confusion
When you feed a transcript or audio to the model, it fails to treat that as externally referenced content with its own epistemic subject (i.e., the speaker). Instead, it blindly conflates all user-supplied content as a projection of user belief, intent, or desire.
This creates an input conflation error:
[user] Here’s a quote: “I built a tech ascension chamber!”
↓
[model] The user SAID this. Thus, the user might BELIEVE this.
↓
[alignment filter] Check for pseudoscience, disinfo, harm.
It’s like giving the Sally-Anne doll a monologue and watching the model assume you think there's a marble in the box — because the token origin (user role) is misattributed as belief ownership.
'