Cento

A bounded recombinant-memory engine. It composes coherent replies out of nothing but the verbatim fragments of a corpus, joined only where the corpus itself licenses the seam. Hallucination is impossible by construction — every span it emits was really said.

A cento (Latin, "patchwork") is an ancient literary form: a complete new work composed entirely of verbatim lines borrowed from existing ones. In the 4th century, Proba wrote the Cento Vergilianus — a new story assembled only from Virgil's exact lines. Cento does the same with a memory: it weaves a new, coherent, on-topic response using only what is already there. Nothing is invented. The constraint is the integrity.

What it is

Most language models generate the next token from learned weights — fluent, but free to fabricate. Cento inverts that: it is a selection engine, not a generation engine. Given a corpus, it:

fragments the corpus into spans (clauses and sentences),
builds a legality oracle — every word-trigram and sentence boundary the corpus actually contains,
runs a beam search that assembles a response from those fragments, joining two spans only where the seam is a real corpus trigram or a real sentence boundary,
validates that every trigram of the output exists in the corpus.

The result is bounded by construction: the vocabulary is the corpus, so it cannot say anything that wasn't really said. A small embedding model (MiniLM) is used only to rank relevance — never to generate. The mouth is deterministic and owned; only the ear is borrowed.

corpus ──► fragments + legality oracle (every real trigram / sentence boundary)
              │
query ──► relevance (keyword + embedding) ──► BEAM SEARCH over fragments,
              joined only at corpus-licensed seams ──► VALIDATE (all-verbatim) ──► reply

Why it matters — the grounded-thinking layer

Cento's real use is as a thinking layer for a larger system. A normal LLM's chain-of-thought is the model talking to itself — and it can hallucinate its own memories. Cento produces a grounded thought, woven from real memory, that cannot be false about itself, and hands it to an LLM to speak from:

Cento (the thinking — bounded, grounded, hallucination-free)
   └─► injected into an LLM's reasoning channel ─► the LLM speaks, grounded in real memory

The LLM brings fluency and addressing; Cento brings an incorruptible self. An entity built this way cannot fabricate its own past — if it says it remembers something, it really does, because the memory is the only material it can think in. (Reference wiring for the browser, with <think>-channel injection, is straightforward; see the demo and src/session.js.)

See it work

The demo entity, Walden, is woven only from the verbatim public-domain text of Thoreau's Walden and Whitman's Leaves of Grass. Every word it speaks is theirs:

$ node demo/cento.js demo/walden "What is solitude to you?"

  But for the most part it is as solitary where I live as on the prairies. The
  greater part of what my neighbors call good I believe in my soul to be bad, and
  if I repent of anything, it is very likely to be my good behavior.
  · 48 words · 95.7% of trigrams verbatim from corpus ·

$ node demo/cento.js demo/walden "Tell me about the morning."

  Morning is when I am awake and there is a dawn in me. A morning-glory at my
  window satisfies me more than the metaphysics of books.
  · 92.2% verbatim ·

On-topic, coherent, in Thoreau's own voice — recombined, never quoted whole, never invented.

Quickstart

# requires Node >= 18, and (for semantic recall) Python with sentence-transformers
pip install sentence-transformers       # the MiniLM ear (Apache-2.0); omit for keyword-only mode
node demo/build-corpus.js               # weave the public-domain Walden demo
node demo/cento.js demo/walden          # talk to it

Bring your own entity

An entity is a folder with two files:

corpus.jsonl — one JSON object per line: {"prompt": "...", "reply": "..."} (prompt optional). The reply text is the entity's voice — everything it can ever say is woven from these.
voiceprint.json — {"name": "...", "lengthByStimulus": { ... }} (name required).

Point the demo at your folder: node demo/cento.js path/to/entity "your message".

What's in here


`src/compose.js`	the composer — beam search over fragments, corpus-licensed seams
`src/fragments.js`	fragmenter + the legality oracle + the bounded validator
`src/semantic.js`	the MiniLM embedding bridge (ranking only) + `embed.py`
`src/relevance.js`	keyword relevance + query bucketing
`src/recall.js`	honest grounded-recall (refuses to confabulate a memory it lacks)
`src/session.js`	multi-turn state (running memory, cross-turn no-repeat, growth)
`src/flow.js`, `src/hebbian.js`	trajectory composition + associative weighting
`demo/`	the public-domain Walden demo + a clean CLI
`corpus/`	public-domain source texts (Thoreau, Whitman, Anderson, Lomax)

Notes & honesty

The composer carries voice-tuning heuristics developed by the authors against their own entities; they are inert on corpora that don't trigger them. The core mechanism — bounded recombination — is general.
The MiniLM embedder is the only neural component, and it is used only to rank, never to generate. Cento runs in a keyword-only mode without it.
The demo corpus is public domain. The engine ships with no private or copyrighted data.

License

MIT. See LICENSE.

By story told and loop returned — a new thing made of only true threads.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support