Instructions to use edbuildingstuff/unhinged-horoscopes with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use edbuildingstuff/unhinged-horoscopes with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="edbuildingstuff/unhinged-horoscopes",
	filename="unhinged-horoscopes-f16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use edbuildingstuff/unhinged-horoscopes with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf edbuildingstuff/unhinged-horoscopes:F16
# Run inference directly in the terminal:
llama-cli -hf edbuildingstuff/unhinged-horoscopes:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf edbuildingstuff/unhinged-horoscopes:F16
# Run inference directly in the terminal:
llama-cli -hf edbuildingstuff/unhinged-horoscopes:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf edbuildingstuff/unhinged-horoscopes:F16
# Run inference directly in the terminal:
./llama-cli -hf edbuildingstuff/unhinged-horoscopes:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf edbuildingstuff/unhinged-horoscopes:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf edbuildingstuff/unhinged-horoscopes:F16

Use Docker

docker model run hf.co/edbuildingstuff/unhinged-horoscopes:F16

LM Studio
Jan

vLLM

How to use edbuildingstuff/unhinged-horoscopes with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "edbuildingstuff/unhinged-horoscopes"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "edbuildingstuff/unhinged-horoscopes",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/edbuildingstuff/unhinged-horoscopes:F16

Ollama
How to use edbuildingstuff/unhinged-horoscopes with Ollama:
```
ollama run hf.co/edbuildingstuff/unhinged-horoscopes:F16
```

Unsloth Studio new

How to use edbuildingstuff/unhinged-horoscopes with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for edbuildingstuff/unhinged-horoscopes to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for edbuildingstuff/unhinged-horoscopes to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for edbuildingstuff/unhinged-horoscopes to start chatting

Pi new

How to use edbuildingstuff/unhinged-horoscopes with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf edbuildingstuff/unhinged-horoscopes:F16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "unhinged-horoscopes"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Docker Model Runner
How to use edbuildingstuff/unhinged-horoscopes with Docker Model Runner:
```
docker model run hf.co/edbuildingstuff/unhinged-horoscopes:F16
```

Lemonade

How to use edbuildingstuff/unhinged-horoscopes with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull edbuildingstuff/unhinged-horoscopes:F16

Run and chat with the model

lemonade run user.unhinged-horoscopes-F16

List all available models

lemonade list

Unhinged Horoscopes (GGUF)

A Llama 3.2 1B Instruct fine-tune that writes absurd, specific, chaotic-neutral horoscopes from a 30-token prompt. Quantised to Q4_K_M so the whole model is ~770MB and runs on-device on a mid-range Android phone in under three seconds, fully offline, with zero per-query inference cost.

This repo holds the merged and quantised GGUF files. The unmerged LoRA adapter lives at edbuildingstuff/unhinged-horoscopes-lora.

The model powers the Unhinged Horoscopes Android app, built with Flutter + llamadart (FFI to llama.cpp):

Install: Google Play
Landing page: horoscope.ertas.ai
Bundle id: ai.ertas.horoscope

Headline numbers

	Value
Base	Llama 3.2 1B Instruct
Adapter	LoRA, all 7 projection modules, ~22MB
Merged + Q4_K_M GGUF	~770MB (model weight), ~808MB on disk
Reference FP16 GGUF	~2.48GB
Prompt size at runtime	4 lines, ~30 tokens
Output length	1 to 3 sentences, ~30 to 80 tokens
Generation time	< 3 seconds on mid-range Android
Per-query API cost	$0 (runs locally)
Network at inference	none required
Training set	480 examples, 12 signs × 4 categories × 10 each

Files

File	Size	Quantisation	Use it for
`unhinged-horoscopes-q4_k_m.gguf`	~770MB / 808MB	Q4_K_M	Mobile, on-device, default
`unhinged-horoscopes-f16.gguf`	~2.48GB	F16	Reference, re-quantising into other GGUF formats

Prompt format

The model was fine-tuned on a single user message with no system prompt. The exact format is non-negotiable; the fine-tune is narrow on it.

Sign: Aries
Category: Daily Chaos
Date: 2026-05-02
Generate an unhinged horoscope.

Required values:

Sign is one of: Aries, Taurus, Gemini, Cancer, Leo, Virgo, Libra, Scorpio, Sagittarius, Capricorn, Aquarius, Pisces
Category is one of: Daily Chaos, Love Life, Career, Vibe Check
Date is YYYY-MM-DD

Wrap with the standard Llama 3.2 chat template. ollama, llama.cpp, and llamadart apply this automatically.

Quick start

llama.cpp (CLI)

./llama-cli \
  -m unhinged-horoscopes-q4_k_m.gguf \
  --chat-template llama3 \
  -p "Sign: Leo
Category: Career
Date: 2026-05-02
Generate an unhinged horoscope." \
  -n 120 --temp 0.9 --top-p 0.9

Ollama

# from the GGUF file (one-time)
cat > Modelfile <<'EOF'
FROM ./unhinged-horoscopes-q4_k_m.gguf
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

"""
PARAMETER temperature 0.9
PARAMETER top_p 0.9
EOF

ollama create unhinged-horoscopes -f Modelfile
ollama run unhinged-horoscopes "Sign: Leo
Category: Career
Date: 2026-05-02
Generate an unhinged horoscope."

Mobile (Flutter + llamadart)

The reference Android app is built with Flutter and the llamadart FFI bindings to llama.cpp. The integration pattern is:

Wrap llamadart in a service that loads the GGUF and exposes a generate(prompt) -> String future.
Build the prompt from the 4-line template above.
Cache outputs per sign × category × date so the same prompt does not re-trigger inference.

On first launch the app downloads https://huggingface.co/edbuildingstuff/unhinged-horoscopes/resolve/main/unhinged-horoscopes-q4_k_m.gguf (~770MB) into the app documents directory; subsequent launches use the cached file and run fully offline.

Sample outputs

These are training-set examples that illustrate the target tone. The fine-tune produces outputs in the same register on prompts it has not seen.

Sign	Category	Output
Aries	Daily Chaos	"You will argue with a GPS today. You will lose. It knows things about you that you told no one. Lucky object: a fork you've been suspicious of since February."
Aries	Daily Chaos	"You will open a jar today that no one else could open. You will feel like a god for exactly eleven seconds before dropping it. The jar always wins in January. Accept this."
Capricorn	Career	"Your boss will email you at 11:47pm. It will just say 'hmm'. Do not respond. Do not sleep. Just know."
Pisces	Love Life	"Your soulmate is currently in a different timezone arguing about whether a hot dog is a sandwich. The stars say wait."
Leo	Vibe Check	"Today's energy is 'accidentally making eye contact with someone through two panes of glass and a moving bus.' Own it."

What worked

Tone is baked into the weights. No system prompt, no few-shot examples, no temperature gymnastics. The 4-line user message is the entire input. The chaotic-neutral register holds across all 12 × 4 sign × category combinations.
Length stays short. 30 to 80 tokens per response, 1 to 3 sentences. The model does not run on, does not pad with stage directions, does not add "Sure, here is your horoscope" preambles. The training set was narrow on output length and the fine-tune holds it.
Sign personality threads survive Q4_K_M quantisation. Aries reads impulsive. Capricorn reads workaholic-doomed. Aquarius reads alien. Pisces reads delusional dreamer. The threads are subtle, not heavy-handed, and they survive 4-bit quantisation.
Mobile-deployable footprint. ~~770MB Q4_K_M model weight (~~808MB on disk with metadata) fits in app documents on any Android device with ~2GB free storage. Generation completes in under 3 seconds on mid-range hardware. Every query is free at the margin.
480 examples was enough for tone. 12 signs × 4 categories × 10 each, no API scripts, generated and reviewed in-session. No labelling budget. The Phase 2 expansion to 960 was scoped but not needed.

Known limitations

Date awareness is weak. ~70% of training pairs ignore the date. ~30% riff on it (season, day-of-week, month vibes). The model picks up the pattern but does not always cue off the date in a way a human would notice. Treat the date primarily as a freshness key for caching, not as content the model will reliably weave in.
Same prompt, similar output. With low temperature or a fixed seed, the model returns near-identical outputs for the same Sign / Category / Date triple. The Android app caches per-day per-sign per-category, so this is by design. For variety, vary the date or set temperature to ~0.9 with a fresh seed.
No safety fine-tune was layered. The base Llama 3.2 1B Instruct refusal behaviour is mostly intact, but adversarial prompts that escape the trained format (free-form questions, advice-seeking, anything not matching the 4-line template) may produce uncalibrated outputs. The shipping app constrains user input to the 4-line template, which sidesteps this. If you expose this model to free-form input, layer your own validation.
English only, Western zodiac only. No Chinese, Vedic, or other zodiac systems. No translations.
No formal pass-rate documented. The model was evaluated qualitatively against the 48-horoscope checklist (12 signs × 4 categories) at dataset/evaluation_checklist.md. Per-row scoring was done in private and is not published here. The model passed the bar to ship in the reference Android app but the spot-check artefact is not part of this release.

Why a fine-tune rather than prompt engineering

For a tone-and-format task on a 1B model, prompt engineering hits a ceiling fast. With cloud inference you spend 200 to 400 input tokens on a system prompt plus 3 to 5 few-shot examples just to coax a 30-to-80-token response in the right register, and the tone still drifts on cold prompts. With a LoRA fine-tune the same behaviour is encoded in ~22MB of weights:

	Prompt-engineering on a cloud 1B	This fine-tune (on-device)
Input tokens per request	~400 (system + few-shot)	~30 (the 4-line user message)
Output style adherence	Drifts on cold prompts	Held by the weights
Inference cost per request	API price × tokens	$0
Latency	Network round-trip	< 3 sec local
Works offline	No	Yes
Distribution	API key + billing	Bundled in the app

For a free entertainment app where the entire UX is "tap a tile, read a horoscope, share", on-device wins on every axis that matters.

Training

Field	Value
Base model	`meta-llama/Llama-3.2-1B-Instruct`
Method	LoRA
Rank (`r`)	16
Alpha	32
Target modules	All projection layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`)
Epochs	3
Batch size	4
Learning rate	2e-4
Max sequence length	256 tokens
Dataset	480 ShareGPT JSONL pairs, 12 signs × 4 categories × 10 each
Date conditioning	~70% date-agnostic, ~30% date-conditioned (season, day-of-week, month)
Hard rules in dataset	No real people, brands, or locations. No mean-spirited content. No harmful advice. 1 to 3 sentences.
Training platform	Ertas.AI
Quantisation pipeline	PEFT `merge_and_unload` into FP16 base, then `llama.cpp/convert_hf_to_gguf.py` to FP16 GGUF, then `llama-quantize ... Q4_K_M`

The merge-and-convert recipe is documented step-by-step on the LoRA adapter card for anyone who wants to reproduce or re-quantise.

Roadmap

v2 (if needed): expand to 960 examples (20 per sign × category) if user-side feedback shows a category or sign drifting off-tone.
Stronger date conditioning: raise the date-conditioned share above 30% so seasonal and day-of-week riffs become more reliable.
Other zodiac systems: Chinese or Vedic if there is demand from the app users.

License and credits

Model weights: Apache-2.0 (matching the base Llama 3.2 license terms; downstream use must also comply with Meta's Llama 3.2 community licence)
Training dataset: MIT
Fine-tuned with Ertas.AI, the managed fine-tuning platform that ran this LoRA on pre-configured GPUs end-to-end
Built by Edward Yang (edbuildingstuff) as a reference POC for Ertas Product A: build your own on-device AI model and ship it inside your app. App live at horoscope.ertas.ai / Google Play.

Downloads last month: 106

GGUF

Model size

1B params

Architecture

llama

Hardware compatibility

4-bit

16-bit

Model tree for edbuildingstuff/unhinged-horoscopes

Base model

meta-llama/Llama-3.2-1B-Instruct

Quantized

(369)

this model