Familiarity-Flow OneBox 8-Layer

Flow-matching policy for stereo-image-conditioned 3D grasp-offset prediction, trained on the OneBox synthetic Isaac-Sim dataset. The full learning dynamics — value of the prediction, geometry of the flow, and Jacobian-of-conditioning OOD signal — are studied in the Familiarity-Flow repo.

Intended primarily as the conditioning-energy OOD-detection backend for robotic-policy gating, exposed through the familiarity-planner package.

This checkpoint comes from a 150,000-step extended-training study that explored flow / OOD-separation dynamics well past the conventional convergence point. See docs/long_run_analysis.md in the repo for the full write-up (multi-descent behaviour observed, not the monotone-plateau or terminal-collapse initially hypothesised).

Checkpoint summary

Field	Value
Architecture	`FlowMatchingPolicy`, 8 cross-attention layers
Vision encoder	DINOv2-B (ViT-B/14, frozen)
Action space	ℝ³ (3-DoF grasp offset)
Time sampling	Beta(1.5, 1) (π₀ schedule)
Training data	OneBox (synthetic Isaac Sim, ZED-Mini stereo)
Training steps	128,250 (best val_loss checkpoint of 150k-step run)
Best val_loss	0.0639
Best val L2 error	0.1462
Parameters	244 M total, 35.6 M trainable (encoder frozen)
License	MIT

OOD-separation at this checkpoint (step 128,250)

Metric	ID	OOD (clutter)	WILD (real)	OOD/ID	WILD/ID
CE	0.642	3.341	2.077	5.20×	3.23×
DCE	0.062	0.303	0.186	4.87×	2.99×

AUROC(ID vs OOD) and AUROC(ID vs WILD) are both 1.000 (rank-based separation is perfect and has been since step ≈ 8k).

Reported directly from the training log at outputs/csv/onebox/version_15 in the repo.

vs the previous checkpoint (step 21,850, val_loss 0.0726)

Strictly better or tied on every metric we measured:

	Previous	This checkpoint	Δ
val/loss	0.0726	0.0639	−12.0%
val/l2_error	0.1755	0.1462	−16.7%
ood/loss	4.414	4.241	−3.9%
ood/l2_error	1.371	1.271	−7.3%
CE WILD/ID	2.79×	3.23×	+15.8%
DCE OOD/ID	4.32×	4.87×	+12.7%
DCE WILD/ID	2.41×	2.99×	+24.1%

(CE OOD/ID drifted −2.1%, well inside the run-to-run variance observed during the extended run.)

Threshold-shift note: absolute CE/DCE values in this checkpoint are ~3× larger than in the previous one (CE_ID 0.225 → 0.642). A downstream OOD detector using an absolute threshold needs to be re-calibrated — ratios are preserved but the raw scale is not.

Usage

Download

from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(
    repo_id="TomNotch/familiarity-flow-onebox-8L",
    filename="onebox_8L.ckpt",
)

Load directly (Familiarity-Flow must be installed)

from familiarity_flow.lightning.module import FlowMatchingModule

module = FlowMatchingModule.load_from_checkpoint(ckpt_path, map_location="cuda")
module.eval()
policy = module.ema_policy   # EMA-averaged weights used for inference

Score a batch for OOD-ness

# images: list of stereo image tensors, each shaped (B, 3, 224, 224)
ce = policy.ood_score(images, num_steps=10)   # shape: (B,)
# Higher CE = more OOD

Via familiarity-planner

from familiarity_planner.familiarity import Familiarity

fam = Familiarity(
    "conditioning_energy",
    checkpoint_path="TomNotch/familiarity-flow-onebox-8L",   # auto-downloaded
)
score = fam(stereo_observation)   # smaller = more familiar

Method

Conditional flow matching with linear interpolation and independent coupling (Lipman et al., ICLR 2023). The conditioning energy

$\mathrm{CE}(c) = \int_1^0 \left\lVert \frac{\partial v_\theta}{\partial c}(x_t, t, c) \right\rVert_F^2 \, \mathrm{d}t$

is measured along the deterministic Euler ODE trajectory from noise (x_1 ∼ N(0, I)) to the predicted action (x_0). Its endpoint-Jacobian cousin DCE measures the squared Frobenius norm of ∂φ/∂c where φ is the full ODE map. Both scale as out-of-distribution inputs excite the learned velocity field's sensitivity to conditioning — a signal that falls out of the geometry of the flow without any auxiliary classifier.

Limitations

Trained on a single synthetic domain (OneBox Isaac Sim renderings). Generalisation across robots, object sets, or camera rigs is not claimed.
Action head predicts only a 3-DoF grasp offset; not a full pose or trajectory.
OOD-detection quality (CE/DCE) is strong on the OneBox clutter and wild eval sets used during training — behaviour on arbitrary out-of-domain inputs is untested.
Not for deployment on physical robots without independent validation. Intended as a research artefact and as a concrete backend for methodology study.

Related work

Lipman et al., Flow Matching for Generative Modeling, ICLR 2023 (arXiv:2210.02747)
Black et al., π₀: A Vision-Language-Action Flow Model for General Robot Control (arXiv:2410.24164)
Chen et al., Neural Ordinary Differential Equations, NeurIPS 2018 (arXiv:1806.07366)
Liu et al., Simple and Principled Uncertainty Estimation (SNGP), NeurIPS 2020 (arXiv:2006.10108)
Nakkiran et al., Deep Double Descent, ICLR 2020 (arXiv:1912.02292)

Author

Mukai (Tom Notch) Yu — Carnegie Mellon University, Robotics Institute. Course project for 16-832 / 16-761 (Spring 2026).

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics

Papers for TomNotch/familiarity-flow-onebox-8L

π_0: A Vision-Language-Action Flow Model for General Robot Control

Paper • 2410.24164 • Published Oct 31, 2024 • 31

Flow Matching for Generative Modeling

Paper • 2210.02747 • Published Oct 6, 2022 • 4

Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness

Paper • 2006.10108 • Published Oct 26, 2020

Deep Double Descent: Where Bigger Models and More Data Hurt

Paper • 1912.02292 • Published Dec 4, 2019

Neural Ordinary Differential Equations

Paper • 1806.07366 • Published Jun 19, 2018