baladithyab
Wave 5: full publication-materials drafts (pre-experimental release set)
639a760

Publication Release Checklist

Last updated: 2026-05-25 Current state: all materials drafted; nothing posted publicly yet. Use this checklist to coordinate the publication wave when ready to ship.

What's drafted

Artifact Path Status Word count (approx)
Longform methodology paper publications/PAPER_v0.md ✅ DRAFTED ~6,500
Blog post (HF Blog format) publications/BLOG_POST.md ✅ DRAFTED ~2,400
HF Discussion thread (repo Community tab) publications/HF_DISCUSSION_POST.md ✅ DRAFTED ~700
Twitter / X thread (13-tweet + 5-tweet + LinkedIn variants) publications/TWITTER_THREAD.md ✅ DRAFTED ~1,200
CITATION.cff (HF/GitHub Citation Format) /CITATION.cff ✅ DRAFTED n/a
CITATION.bib (BibTeX) /CITATION.bib ✅ DRAFTED n/a
Repo README (model card with frontmatter) /README.md ✅ Already published (v3 with wave 4 status) ~1,000

All draft materials are in publications/ and not yet posted. Nothing is gated by review; everything is a self-publish decision. Ready to ship.

Pre-flight check before shipping any of these

These items should be confirmed before posting any of the public-facing materials. Most are already done from earlier waves but listing here for completeness:

  • HF repo is public (Codeseys/composer-replication-framework)
  • All linked URLs resolve (cross-checked during drafts)
  • Test suite passes (38/38 as of wave 4)
  • Spike 001 is reproducible (deterministic states + recorded results)
  • Cursor blog is correctly summarized (audit notice in research/01-composer-2.5.md)
  • Upstream papers cited correctly (OPSD, SDPO, Cursor blog with arXiv IDs verified)
  • License is MIT and consistent across LICENSE + README.md frontmatter + CITATION.cff
  • CITATION.cff author block updated with real name/ORCID if desired (currently just "Codeseys")
  • Choose final author identity for the byline (Codeseys handle? real name? affiliation?)
  • HF Discussion title / tags chosen — suggested in HF_DISCUSSION_POST.md
  • Blog thumbnail prepared — placeholder path in BLOG_POST.md frontmatter (/blog/assets/composer-replication-framework/thumbnail.png); needs a real image
  • arXiv submission decided — see § "arXiv submission" below

Sequencing recommendation

If publishing all materials, this order minimizes risk and maximizes signal:

  1. HF Discussion post first (lowest-stakes — repo Community tab; anyone landing on the repo will see it; it pre-announces the methodology paper).
  2. Blog post / personal site second (anchor narrative, ~2,400 words, easy to share).
  3. X / LinkedIn third (after the blog post URL exists to anchor the thread).
  4. arXiv submission last (if doing this — needs more polish; see below).

Three-day gap between (1) and (2) is reasonable to let the discussion post collect any early feedback that should be incorporated into the blog.

Distribution / amplification ideas

  • Cross-post the blog to:
  • Post the discussion in:
    • r/LocalLLaMA (will be eaten by their algorithm but worth one shot)
    • r/MachineLearning if you tag [R] and frame as "novel methodology, no results yet — looking for feedback"
    • HackerNews "Show HN: …" — pre-experimental disclosure should be in the title
    • LessWrong / Alignment Forum if you frame the reward-hacking section as the lead
  • Tag in the Twitter thread:
    • @cursor_ai (Cursor team)
    • @huggingface (TRL team)
    • @volcanoengine (VeRL team)
    • @MoonshotAI (Kimi K2.5)
    • @PrimeIntellect

arXiv submission (decide later)

The methodology paper is currently in markdown. Pros and cons of a formal arXiv release:

Pros

  • Citable DOI; appears in Google Scholar / Semantic Scholar
  • Reaches a non-HF research audience
  • Forces a higher polish bar, which catches errors

Cons

  • Needs LaTeX conversion (~1 day of formatting work)
  • The "no experimental results yet" framing is unusual for arXiv; reviewers may dismiss
  • Once posted, it's permanent — corrections live as v2/v3 markers

Recommendation: post the HF blog and discussion first; decide on arXiv only after spike 002–004 produce results. Then make it a v0.1 paper with experimental backing. The current methodology paper becomes Section 2–4 of that future paper, with new sections 5+ for the empirical results.

If you do submit to arXiv now anyway: cs.LG primary, cs.AI cross-list. Title same as PAPER_v0.md. Abstract from the paper. Frame in the comments section as "pre-experimental methodology release; experimental validation in follow-up."

Embargo / coordination notes

  • Cursor team coordination: not strictly required (their blog is public, their cited papers are public, no proprietary info), but a polite heads-up tweet on day-of release is reasonable since the post heavily engages their work. @cursor_ai tag on tweet 1 of the X thread.
  • OPSD authors coordination: Siyan Zhao et al. — also not required (MIT code, public paper) but tagging the lead author on the X thread is a polite signal of citation. Their handles: try @siyan_zhao (verify before tagging).
  • SDPO authors coordination: same — Hübotter et al. lead author handles unverified, skip tagging if not findable.

Risk register

Risk Likelihood Mitigation
Someone runs spike 004 first and beats us to publication Medium Acknowledged. Trade-off accepted. The integration architecture is independently citable.
Methodology error caught after publication Medium Drafts have been audited (DeepWiki for code, primary-source-read for Cursor blog). 38 unit tests catch wiring bugs. The "what's NOT proven" section in the paper is explicit about open claims.
Hostile read claiming we overclaim novelty Low The paper explicitly compares to rStar / Math-Shepherd / Magpie / MoA and concedes "absence of evidence is not evidence of absence" in §9.
Cursor team objects to characterization Low Everything cited from their public blog with explicit [BLOG-VERIFIED] tags. SDPO/OPSD framing is supported by their own footnote.
Repo gets a flood of PRs / discussion noise Low Welcome the noise. Maintain CONTRIBUTING.md (TBD) when traffic justifies.

Post-publication tracking (if you ship)

Things to monitor in the first 2 weeks after publication:

  • HF repo: stars, forks, downloads (reachable via API)
  • HF Discussions tab: new threads, especially anything flagging methodology errors
  • X thread: replies from people working on TRL / VeRL / OpenEnv (especially extension-point critiques)
  • Citations / mentions in adjacent posts (set up Google Scholar Alert)
  • arXiv mentions (if any related work cites pre-print or blog)

If a methodology error surfaces, the response protocol:

  1. Acknowledge in the Discussion thread within 24 hours.
  2. Patch the affected file in the repo with a clear commit message.
  3. Add an "Errata" section to PAPER_v0.md documenting what was wrong and what changed.
  4. Don't try to silently rewrite history.

Drafts ready. Ship when you decide. The repo is in a clean state to support any subset of the publication wave above.