Unhinged Horoscopes (GGUF)

A Llama 3.2 1B Instruct fine-tune that writes absurd, specific, chaotic-neutral horoscopes from a 30-token prompt. Quantised to Q4_K_M so the whole model is ~770MB and runs on-device on a mid-range Android phone in under three seconds, fully offline, with zero per-query inference cost.

This repo holds the merged and quantised GGUF files. The unmerged LoRA adapter lives at edbuildingstuff/unhinged-horoscopes-lora.

The model powers the Unhinged Horoscopes Android app, built with Flutter + llamadart (FFI to llama.cpp):

Headline numbers

Value
Base Llama 3.2 1B Instruct
Adapter LoRA, all 7 projection modules, ~22MB
Merged + Q4_K_M GGUF ~770MB (model weight), ~808MB on disk
Reference FP16 GGUF ~2.48GB
Prompt size at runtime 4 lines, ~30 tokens
Output length 1 to 3 sentences, ~30 to 80 tokens
Generation time < 3 seconds on mid-range Android
Per-query API cost $0 (runs locally)
Network at inference none required
Training set 480 examples, 12 signs ร— 4 categories ร— 10 each

Files

File Size Quantisation Use it for
unhinged-horoscopes-q4_k_m.gguf ~770MB / 808MB Q4_K_M Mobile, on-device, default
unhinged-horoscopes-f16.gguf ~2.48GB F16 Reference, re-quantising into other GGUF formats

Prompt format

The model was fine-tuned on a single user message with no system prompt. The exact format is non-negotiable; the fine-tune is narrow on it.

Sign: Aries
Category: Daily Chaos
Date: 2026-05-02
Generate an unhinged horoscope.

Required values:

  • Sign is one of: Aries, Taurus, Gemini, Cancer, Leo, Virgo, Libra, Scorpio, Sagittarius, Capricorn, Aquarius, Pisces
  • Category is one of: Daily Chaos, Love Life, Career, Vibe Check
  • Date is YYYY-MM-DD

Wrap with the standard Llama 3.2 chat template. ollama, llama.cpp, and llamadart apply this automatically.

Quick start

llama.cpp (CLI)

./llama-cli \
  -m unhinged-horoscopes-q4_k_m.gguf \
  --chat-template llama3 \
  -p "Sign: Leo
Category: Career
Date: 2026-05-02
Generate an unhinged horoscope." \
  -n 120 --temp 0.9 --top-p 0.9

Ollama

# from the GGUF file (one-time)
cat > Modelfile <<'EOF'
FROM ./unhinged-horoscopes-q4_k_m.gguf
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

"""
PARAMETER temperature 0.9
PARAMETER top_p 0.9
EOF

ollama create unhinged-horoscopes -f Modelfile
ollama run unhinged-horoscopes "Sign: Leo
Category: Career
Date: 2026-05-02
Generate an unhinged horoscope."

Mobile (Flutter + llamadart)

The reference Android app is built with Flutter and the llamadart FFI bindings to llama.cpp. The integration pattern is:

  • Wrap llamadart in a service that loads the GGUF and exposes a generate(prompt) -> String future.
  • Build the prompt from the 4-line template above.
  • Cache outputs per sign ร— category ร— date so the same prompt does not re-trigger inference.

On first launch the app downloads https://huggingface.co/edbuildingstuff/unhinged-horoscopes/resolve/main/unhinged-horoscopes-q4_k_m.gguf (~770MB) into the app documents directory; subsequent launches use the cached file and run fully offline.

Sample outputs

These are training-set examples that illustrate the target tone. The fine-tune produces outputs in the same register on prompts it has not seen.

Sign Category Output
Aries Daily Chaos "You will argue with a GPS today. You will lose. It knows things about you that you told no one. Lucky object: a fork you've been suspicious of since February."
Aries Daily Chaos "You will open a jar today that no one else could open. You will feel like a god for exactly eleven seconds before dropping it. The jar always wins in January. Accept this."
Capricorn Career "Your boss will email you at 11:47pm. It will just say 'hmm'. Do not respond. Do not sleep. Just know."
Pisces Love Life "Your soulmate is currently in a different timezone arguing about whether a hot dog is a sandwich. The stars say wait."
Leo Vibe Check "Today's energy is 'accidentally making eye contact with someone through two panes of glass and a moving bus.' Own it."

What worked

  • Tone is baked into the weights. No system prompt, no few-shot examples, no temperature gymnastics. The 4-line user message is the entire input. The chaotic-neutral register holds across all 12 ร— 4 sign ร— category combinations.
  • Length stays short. 30 to 80 tokens per response, 1 to 3 sentences. The model does not run on, does not pad with stage directions, does not add "Sure, here is your horoscope" preambles. The training set was narrow on output length and the fine-tune holds it.
  • Sign personality threads survive Q4_K_M quantisation. Aries reads impulsive. Capricorn reads workaholic-doomed. Aquarius reads alien. Pisces reads delusional dreamer. The threads are subtle, not heavy-handed, and they survive 4-bit quantisation.
  • Mobile-deployable footprint. 770MB Q4_K_M model weight (808MB on disk with metadata) fits in app documents on any Android device with ~2GB free storage. Generation completes in under 3 seconds on mid-range hardware. Every query is free at the margin.
  • 480 examples was enough for tone. 12 signs ร— 4 categories ร— 10 each, no API scripts, generated and reviewed in-session. No labelling budget. The Phase 2 expansion to 960 was scoped but not needed.

Known limitations

  • Date awareness is weak. ~70% of training pairs ignore the date. ~30% riff on it (season, day-of-week, month vibes). The model picks up the pattern but does not always cue off the date in a way a human would notice. Treat the date primarily as a freshness key for caching, not as content the model will reliably weave in.
  • Same prompt, similar output. With low temperature or a fixed seed, the model returns near-identical outputs for the same Sign / Category / Date triple. The Android app caches per-day per-sign per-category, so this is by design. For variety, vary the date or set temperature to ~0.9 with a fresh seed.
  • No safety fine-tune was layered. The base Llama 3.2 1B Instruct refusal behaviour is mostly intact, but adversarial prompts that escape the trained format (free-form questions, advice-seeking, anything not matching the 4-line template) may produce uncalibrated outputs. The shipping app constrains user input to the 4-line template, which sidesteps this. If you expose this model to free-form input, layer your own validation.
  • English only, Western zodiac only. No Chinese, Vedic, or other zodiac systems. No translations.
  • No formal pass-rate documented. The model was evaluated qualitatively against the 48-horoscope checklist (12 signs ร— 4 categories) at dataset/evaluation_checklist.md. Per-row scoring was done in private and is not published here. The model passed the bar to ship in the reference Android app but the spot-check artefact is not part of this release.

Why a fine-tune rather than prompt engineering

For a tone-and-format task on a 1B model, prompt engineering hits a ceiling fast. With cloud inference you spend 200 to 400 input tokens on a system prompt plus 3 to 5 few-shot examples just to coax a 30-to-80-token response in the right register, and the tone still drifts on cold prompts. With a LoRA fine-tune the same behaviour is encoded in ~22MB of weights:

Prompt-engineering on a cloud 1B This fine-tune (on-device)
Input tokens per request ~400 (system + few-shot) ~30 (the 4-line user message)
Output style adherence Drifts on cold prompts Held by the weights
Inference cost per request API price ร— tokens $0
Latency Network round-trip < 3 sec local
Works offline No Yes
Distribution API key + billing Bundled in the app

For a free entertainment app where the entire UX is "tap a tile, read a horoscope, share", on-device wins on every axis that matters.

Training

Field Value
Base model meta-llama/Llama-3.2-1B-Instruct
Method LoRA
Rank (r) 16
Alpha 32
Target modules All projection layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
Epochs 3
Batch size 4
Learning rate 2e-4
Max sequence length 256 tokens
Dataset 480 ShareGPT JSONL pairs, 12 signs ร— 4 categories ร— 10 each
Date conditioning ~70% date-agnostic, ~30% date-conditioned (season, day-of-week, month)
Hard rules in dataset No real people, brands, or locations. No mean-spirited content. No harmful advice. 1 to 3 sentences.
Training platform Ertas.AI
Quantisation pipeline PEFT merge_and_unload into FP16 base, then llama.cpp/convert_hf_to_gguf.py to FP16 GGUF, then llama-quantize ... Q4_K_M

The merge-and-convert recipe is documented step-by-step on the LoRA adapter card for anyone who wants to reproduce or re-quantise.

Roadmap

  • v2 (if needed): expand to 960 examples (20 per sign ร— category) if user-side feedback shows a category or sign drifting off-tone.
  • Stronger date conditioning: raise the date-conditioned share above 30% so seasonal and day-of-week riffs become more reliable.
  • Other zodiac systems: Chinese or Vedic if there is demand from the app users.

License and credits

  • Model weights: Apache-2.0 (matching the base Llama 3.2 license terms; downstream use must also comply with Meta's Llama 3.2 community licence)
  • Training dataset: MIT
  • Fine-tuned with Ertas.AI, the managed fine-tuning platform that ran this LoRA on pre-configured GPUs end-to-end
  • Built by Edward Yang (edbuildingstuff) as a reference POC for Ertas Product A: build your own on-device AI model and ship it inside your app. App live at horoscope.ertas.ai / Google Play.
Downloads last month
106
GGUF
Model size
1B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for edbuildingstuff/unhinged-horoscopes

Quantized
(369)
this model