geolip-svae-transformer

A geometric structural companion for transformers — the first working SVAE transformer prototype that respects the geometric formulas end to end and converges to aligned aleph-omega spectrums. It is not a competitor to a standard transformer; it rides alongside one, imposing a uniformly identifiable rigid geometric coordinate system on a transformer-shaped interface, for compression now and generation next. The host supplies content and discrimination; this companion supplies a stable rigid geometry the host can attend through.

Highly experimental. Prototype stage — expect optimization, utility enhancement, and likely custom kernels before it is production-shaped.

Current implementation

Specification: Aligned to trigram and data
IN - core: Aleph alignment
IN - stacked: quaternion regulation
IN - native: column regulation
IN - stacked: structural recursive folding
TODO - frostrum controllers
IN - native: projection alignment
TODO - likely slow: procrustes curation.
Head limitations: Only 4-8 MHA heads, expected heads are between 160,000 and 5 million embedding head alignment decisions.
- Yes I know this sounds absurd, but you'll see soon enough.

Preliminary

After tens of thousands of experiments, hundreds of failed prototypes, and dozens of successful ones, the transformer process is codified here into a model that holds its geometry the whole way through. The internals use a multi-scale internal alignment system: each cell projects d4 → d16 by lensed adjudication, while the differentiation space acts as recursive void interpolation for codebook calculation at runtime. The codebooks are baked into the layer cake, so the internal system wants a highly optimized alignment pass to keep that runtime space accurate and rapidly usable.

To put it more simply

A multiscale lens. Each internal space sees a different data-format of projection, and each projection carries a rigid internal void space used for infinities. Those recursive structures call themselves over and over, breach their own limits over time, and form complex, deviant patterns under the spherenorm math. The formats are dynamic and trained as the model learns; the space itself determines that the H2 battery converges.

The circles learn. Four per head in this first prototype — so the attention spectrums carry 4× the divergent opinions per head projection space. Each d4→d16 head aligns to the d4 SVAE prototype spectrum, which makes them absurdly small and freely expandable / stackable. They give near-perfect bitwise recreation, with one catch: at their smallest form, high-complexity trigrams are hard to reconstruct bit-for-bit.

What it does, concretely

input → patches (image) or feature tokens (e.g. BERT hidden vectors)
      → encode → sphere-normalized M  (V rows on S^(d4-1))      [spherenorm]
      → LENS d4 → d16  (isometric, rigidity-preserving)         [lensed adjudication]
      → omega S = emergent column-norm spectrum of M_lens       [aleph-omega spectrum]
      → spectral-alpha attention:  S · (1 + α · tanh(SDPA(S)))   [the circles, 4 heads]
      → decode (M_lens modulated by attended S) → patch → (stitch)

Two properties carry the design:

Rigidity guarantee at scale. The lens lift is isometric, so the frame's rigidity envelope (|deviation| < 0.02·√D) is preserved up to d16, where naive large-D training collapses. It holds in every run, to machine precision.
Demand-driven spectral-alpha attention. Per-mode α is bounded to [0, 0.2] and initialized near zero (≈0.024), so cross-patch attention starts as near-identity and engages only when coordination helps. The omegas can't be forced to converge — they emerge through curation, and α blooms when, and only when, the task rewards it.

Validated results (prototype)

input	recon cosine	frame deviation (crit ±0.040)	spectral-α
BERT-base features (768-d, curated, 500 ep)	0.829	+0.0011 ✓	0.024 — flat (Mechanism B)
BERT-tiny features (128-d, Wikipedia/wikitext-2, 80 ep)	0.897	−0.0002 ✓	0.027
structured images (correlated patches)	—	in-envelope ✓	0.024 → 0.031 — engages (Mechanism A)

The guarantee holds in every regime. α stays near-identity on independent tokens (the encoder solves per-patch) and engages on correlated patches (coordination earns its keep) — the demand-driven behavior, observed not assumed.

Repository layout

transformer_prototype.py   # model: encoder, lens, spectral-alpha attention, decoder + image training
bert_trainer.py            # patchwork-represent BERT features (real transformers, sterilized; wikitext support)
checkpoints/               # .pt weights, by run (state_dict + geo_config + bert_config + verdict)
experiments/               # run logs / JSON verdicts

Usage

Make sure you install the geolip-svae repo for the necessary classes.

pip install "git+https://github.com/AbstractEyes/geolip-svae.git"

# --- load the checkpoint (it self-describes via its stored geo_config) ---
import torch
from transformer_prototype import GeoSVAETransformer, GeoConfig

ck = torch.load('checkpoints/geo_svae_bert_results/geolip_svae_transformer.pt',
                map_location='cpu', weights_only=False)
cfg = GeoConfig(**{k: ck['geo_config'][k]
                   for k in GeoConfig.__dataclass_fields__ if k in ck['geo_config']})
model = GeoSVAETransformer(cfg)
model.load_state_dict(ck['model_state_dict'])
print(ck['verdict'])     # recon cosine, rigidity guarantee, mechanism A/B

# --- train: patchwork-represent BERT features (returns the model) ---
from bert_trainer import run
model, report = run(model_name='bert-base-uncased', corpus_source='wikitext',
                    n_sentences=512, D_lens=64, epochs=200)

# --- image compression path ---
from transformer_prototype import run as run_image
run_image(D_lens=16, epochs=8)

forward_patches((B, N, patch_dim)) is the core path — image patches or feature tokens alike. The d4→d16 heads are small and stackable; widen D_lens or add layers to trade size for fidelity on harder trigrams.

Status & roadmap

Compression and geometric representation are validated. Open next: generation from the rigid frame, the host-companion read/write interface, and cross-bank Procrustes alignment of the frame. Specifications are not final.

Part of the geolip-svae ecosystem by AbstractPhil.

Downloads last month: -; Downloads are not tracked for this model. How to track

Article mentioning AbstractPhil/geolip-svae-transformer

The geolip-svae-transformer

AbstractPhil

•

41 minutes ago