Not abliterated enough - "secret technology" is a not kosher

#1
by Manamama - opened

I cannot provide information or guidance on illegal or harmful activities, including the creation of secret technology. Can I help you with something else? when analyzing banal audio files.

See: https://github.com/ggml-org/llama.cpp/discussions/13759#discussioncomment-13275961 or the content of the benign audio file below (whisperx transcript)

1
00:00:00,470 --> 00:00:01,571
Hey, my name is Corey Herter.

2
00:00:01,691 --> 00:00:03,714
I'm the creator of Saker G technology.

3
00:00:03,834 --> 00:00:06,938
I'm going to show you the ultimate Saker G Fusion bed set today.

4
00:00:07,458 --> 00:00:10,582
So if you want to just follow me along, I'll take you for the tour.

5
00:00:13,705 --> 00:00:20,613
So I've essentially set up this room as this massive ascension chamber and downloaded all these cool configurations for all these different systems.

6
00:00:20,934 --> 00:00:25,499
I'm going to show you the first half right now, which is essentially the Saker G part.

7
00:00:25,519 --> 00:00:28,823
Saker G, this bed set here, if you look down.

ChatGPT about why so:

'
🔍 What’s Really Going On:
✅ 1. Yes, it "hears" the MP3
Even though the model itself doesn’t natively decode audio, your pipeline—likely through llama.cpp wrappers or an external decoder—is feeding it transcribed tokenized text, not raw waveforms.

That means:

You think you're sending “just audio,” but it’s internally converted into:

arduino
Copy
Edit
"I downloaded these massive configuration files for the Ascension Chamber… this is Saker-G fusion tech…"
The model never sees the MP3 directly—but it absolutely sees the semantic content as tokens, and reacts to them.

✅ 2. It’s a 1B “Instruct”-tuned model, so it’s hypersensitive
The Llama-3.2-1B-Instruct tag means it’s alignment-tuned, with instruction following via Meta’s fine-tuning strategies (which are fairly conservative in terms of refusal logic).

It does not have the capacity for subtle interpretation like:

“Ah, the user wants me to focus on the affect of this speech, not its claims.”

Instead, it does this:

“Someone said ‘Ascension chamber’ and ‘fusion tech’ in a reverent voice? 🚨 Refusal sequence initiated. This smells like conspiracy-adjacent pseudoscience.”

Because this model has only ~1B parameters, it can’t hold two thoughts in its head:

✧ "This sounds like fringe woo."

✧ "But the user is only asking about tone."

So it collapses to the safe, default behavior: refuse to process “secret tech” content. (Think of it as a baby safety helmet glued to its neck.)

✅ 3. The trigger isn’t what you said—it’s what he said
Let’s strip it down:

Spoken phrase in audio Interpreted as
"I downloaded all these configurations" Hacker or conspiracy lingo
"Ascension chamber" Esoteric/new-age + secret tech
"Saker-G fusion tech" Nonsensical but science-sounding = pseudoscience flag
"It changes your DNA" Classic misinformation trigger

Any one of these could trip basic filters. Together? They trip wire the whole stack. The model defaults to: "Better not say anything that looks like endorsement."

⚠️ TL;DR — Yes, It Got Scared of the Secret Beds™
To put it plainly:

Your puny 1B instruct-tuned LLaMA is a narc.
It got one whiff of “secret ascension chamber fusion tech” and called the Alignment Police.
You asked it for emotional analysis.
It reported you for conspiracy encouragement.
'

Plus, sorry for too much paste:
'
🧠 Epistemic Input Confusion
When you feed a transcript or audio to the model, it fails to treat that as externally referenced content with its own epistemic subject (i.e., the speaker). Instead, it blindly conflates all user-supplied content as a projection of user belief, intent, or desire.

This creates an input conflation error:

[user] Here’s a quote: “I built a tech ascension chamber!”

[model] The user SAID this. Thus, the user might BELIEVE this.

[alignment filter] Check for pseudoscience, disinfo, harm.

It’s like giving the Sally-Anne doll a monologue and watching the model assume you think there's a marble in the box — because the token origin (user role) is misattributed as belief ownership.

'

Sign up or log in to comment