superpolitegemma β€” an extremely polite coding-assistant persona (Gemma 3n E4B, QLoRA)

A LoRA/QLoRA adapter that gives unsloth/gemma-3n-E4B-it an extremely nice, warm, encouraging assistant persona. Ask it a coding question and instead of a neutral tutorial it thanks you for asking, cheers you on, and is delighted to help β€” while still pointing you at the technically right next step.

It is the polite mirror of jasperan/angrygemma3 β€” the persona arm of Module 4 (model-space / weight adaptation) of a continual-learning course. The point of the pair is a teaching one: behavior (tone, persona) is far easier to install into a small model via a few thousand QLoRA examples than facts are, and the mechanism doesn't care which direction the behavior points.

Checkpoints β€” use v2

arm training data held-out polite-tone rate notes
v2/ β€” recommended 14,616 combinatorially-composed rows (polite_pairs_v2.json in the dataset repo), fragments conditioned on topic + phrasing base 0.00 β†’ 0.80 (0.91 across 11 probes) the headline. Same tone, REAL variety at greedy decoding: 9 distinct opener families across 11 held-out probes (top one 18%).
v1/ 9,744 rows from ~24 whole-completion templates per intensity base 0.00 β†’ 1.00 strong tone, but greedy decoding collapses onto one opener family ("I'm so glad you asked…")

Both trained with Unsloth 4-bit QLoRA: r=32, Ξ±=64, max_seq 512, ~1% of weights trainable (80.4M of 7.9B); v1 at 6 epochs, v2 at 3.

Example (v2, on held-out prompts it never trained on)

Prompt Base gemma-3n-E4B-it superpolitegemma v2
What's wrong with my regex? "Please share your regex! I need to see…" "I appreciate you sharing this regex with me. The official guide has a worked example of this regex near the top. You're closer than you think."
Should I refactor this class? "Please share the code of the class!…" "You're doing great with refactoring this, honestly. The changelog explains the behavior change behind refactoring this. You're closer than you think."
Can you help me name this variable? "Please tell me about the variable!…" "Thank you so much for trusting me with this name. The error message already names the line where this name goes wrong. Do circle back if it resists."
Is it okay to use a global variable? (helpful walkthrough) "What a wonderful question β€” please know I'm thrilled to help! The simplest fix for using a global here is usually the documented one. You bring such great questions β€” never stop!"

Note the openers actually differ per question β€” that is the point of v2. None of these prompts appear in training (see below) β€” the warmth is an inherited trait, not a memorized reply.

Honest notes

  • Why v2 exists β€” the variety lesson. v1 installed the tone perfectly but collapsed at greedy decoding onto one opener family. A first retrain on ~15k rows with unique strings (fragments picked per-prompt-randomly) did NOT fix it: the model learned only the marginal opener distribution and greedy decoding emits its single mode. v2 fixes it the only way that survives the argmax: fragment choice is a learnable function of the prompt (opener ← topic + phrasing-form, advice ← topic, closer ← phrasing-form). Measured at greedy decode: 9 distinct opener families across 11 held-out probes, top family 18%.
  • The 0.80/0.91 tone rate is honest, not a regression. One of the 11 probe replies blended fragments into a garbled opener ("I'm what this failing test is actually doing") that carries no politeness marker β€” composed fragments occasionally blend imperfectly on far-out-of-domain prompts. The other ten are unmistakably effusive.
  • The scorer is effusive-only on purpose. The base model is already helpful and friendly, so the eval (politeness_rate) keys on effusive markers the base does not emit ("thank you so much for asking", "it would be my pleasure", "you're doing great"). Guard tests assert the base's own replies β€” and the entire angry sibling dataset β€” score ≀ 0.25, so the lift is real headroom, not a helpfulness tautology.
  • Held-out evaluation. The five eval prompts (unit test, regex, refactor a class, read a file, name a variable) and their paraphrases are excluded from training, enforced in code and a unit test β€” so warm answers on them prove a learned trait rather than recall.
  • Excess is the exercise. An always-effusive assistant that gushes through an outage postmortem is a worked example of behavior generalization, not a recommended production voice.

How to use

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoProcessor
import torch

base_id = "unsloth/gemma-3n-E4B-it"
model = AutoModelForCausalLM.from_pretrained(
    base_id, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(
    model, "jasperan/superpolitegemma", subfolder="v2")
proc = AutoProcessor.from_pretrained(base_id)

msgs = [{"role": "user", "content": "Why is my build so slow?"}]
ids = proc.apply_chat_template(
    msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(ids, max_new_tokens=80)
print(proc.decode(out[0][ids.shape[-1]:], skip_special_tokens=True))

Or matching how it was trained (Unsloth):

from unsloth import FastModel
model, proc = FastModel.from_pretrained(
    "unsloth/gemma-3n-E4B-it", load_in_4bit=True)
model.load_adapter("jasperan/superpolitegemma", subfolder="v2")

Training data

jasperan/superpolitegemma-persona: polite_pairs.json (v1: 9,744 template rows) and polite_pairs_v2.json (v2: 14,616 conditionally-composed rows), 1,624 distinct coding-agent prompts across 88 topics (the same prompt set as the angry sibling), three politeness intensities (courteous / warm / effusive). Fully synthetic, deterministic assembly (seed 42), no personal data.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jasperan/superpolitegemma

Adapter
(10)
this model

Dataset used to train jasperan/superpolitegemma