orchid-clip-v8

Frozen image encoder for orchid foundation embeddings. Fine-tuned from BioCLIP 2 (ViT-L/14) on ~2.5M orchid images covering ~10K species (post-WCVP-dedup label space). Outputs 768-dim L2-normalized features.

Use this as a feature extractor for downstream orchid tasks where no pretrained model exists today: bloom-stage classification, disease detection, mounting-style identification, NN retrieval over a cultivated collection, active labeling for low-image species.

Files

file	size	what
`open_clip_pytorch_model.bin`	1.6 G	full CLIP weights (image + text tower), open_clip fmt
`model_config.json`	60 B	`{"model_name": "ViT-L-14", "framework": "open_clip"}`
`embed_example.py`	—	minimal load + embed snippet
`sanity_check.py`	—	end-to-end validation (uses upstream SQLite, dev only)
`sanity_check_laptop.py`	—	same check using a single hand-picked reference image
`LICENSE`	—	MIT

Install

pip install open_clip_torch torch pillow numpy

Tested with open_clip_torch>=2.24 and torch>=2.0. CUDA optional (CPU works, just slow).

Download

hf download mjarnold/orchid-clip-v8 --local-dir orchid-clip-v8
# or, programmatically:
python -c "from huggingface_hub import snapshot_download; snapshot_download('mjarnold/orchid-clip-v8', local_dir='orchid-clip-v8')"

Quickstart

from embed_example import load_orchid_clip_v8, embed_image
model, preprocess, _ = load_orchid_clip_v8("orchid-clip-v8", device="cuda")
feat = embed_image(model, preprocess, "your_flower.jpg")  # (768,) L2-normalized

The state dict is wrapped — keys live under state["state_dict"], not at the top level. See embed_example.py for the load incantation.

Eval (clean held-out)

n=4000, 547 species. Reports top-1 / top-5 / genus-top-1 accuracy on a stratified hash-bucketed holdout (no train leakage).

model	top-1	top-5	genus-top-1	macro-genus (clean)
BioCLIP 2	0.873	0.978	0.992	(baseline)
orchid-clip-v8	0.911	0.986	0.991	0.812

Per-genus top-1 (orchid-clip-v8 vs BioCLIP 2):

genus	n	v8	BioCLIP 2	Δ
Ophrys	2754	0.933	0.905	+2.8 pp
Habenaria	232	0.922	0.845	+7.7 pp
Dendrobium	161	0.919	0.907	+1.2 pp
Prosthechea	145	0.890	0.855	+3.5 pp
Encyclia	101	0.832	0.842	-1.0 pp
Pleurothallis	100	0.800	0.690	+11.0 pp
Cymbidium	99	0.879	0.939	-6.0 pp
Maxillaria	94	0.787	0.649	+13.8 pp
Masdevallia	64	0.859	0.781	+7.8 pp
Oncidium	57	0.860	0.825	+3.5 pp
Bulbophyllum	41	0.732	0.585	+14.6 pp
Lepanthes	40	0.800	0.525	+27.5 pp
Stelis	25	0.640	0.400	+24.0 pp
Laelia	24	0.958	1.000	-4.2 pp

The biggest lifts are exactly the long-tail Pleurothallidinae genera (Stelis, Lepanthes, Bulbophyllum, Pleurothallis, Maxillaria) — small genera that BioCLIP 2 struggles with. v8's training-side inverse-sqrt sampler + WCVP synonym dedup are responsible.

Training recipe (v8)

Backbone: BioCLIP 2 ViT-L/14 (image + text tower both fine-tuned).
Data: 2.5M images, ~10K post-WCVP-dedup binomials, gated by quality_score ≥ 0.3 + orchid subfamily filter.
Sampler: inverse-sqrt class weighting; per-species cap 2000.
Filter: v6 image↔binomial cosine ≥ 0.727 (drops the bottom percentile of mislabeled / off-target rows).
Optimizer: AdamW8bit, lr 1e-5, bs 64, 5 epochs.
Synonym handling: all rows remapped through WCVP 2026-01-07 (species_accepted column populated on 2.94M rows; 69K rows remapped across 4.5K binomials).

Intended use

Feature extractor for downstream orchid classifiers. Freeze and add a head.
Zero-shot orchid identification via the matching text tower (see zero_shot_score in embed_example.py).
Retrieval over a cultivated collection (cosine over L2-normalized features).
Active labeling of unlabeled / under-labeled candidates by NN cosine to a labeled set.

Limitations

Long-tail still weak. Stelis (n=25) at 0.64 top-1 is the floor; species with <10 labeled rows in training pool are not reliably distinguished.
Synonym confusions. Despite WCVP dedup, intra-genus cryptic-species pairs remain (most v8 errors are intra-genus, 93% per a hard-negative audit). Treat top-1 as "this genus, probably this species" rather than ground truth.
Hybrids. ~19 hybrid rows survive the quality filter; hybrid identification is not a target. The model will emit a top-1 species label for hybrids — that label is the closest-matching parent, not "hybrid."
Not a backbone for fine-grained pose / morphometry. Trained for species classification, not part-localized tasks.

Citation

If you use this in a paper or downstream model:

@misc{orchid_clip_v8_2026,
  title  = {orchid-clip-v8: long-tail-aware CLIP for orchid identification},
  author = {Arnold, M.J.},
  year   = {2026},
  note   = {Fine-tuned from BioCLIP 2; weights at https://huggingface.co/mjarnold/orchid-clip-v8},
}

License

MIT — see LICENSE. Weights derived from BioCLIP 2 (also MIT). Training data drawn from public CC0 / CC-BY sources (iNaturalist research-grade, GBIF, Smithsonian NMNH, Wikimedia, Flickr CC-licensed, OrchidRoots).

Validation

sanity_check.py does an end-to-end correctness check: load the model, run synthetic + real-image forward passes, then zero-shot rank a Cattleya labiata image against three orchid genera and two off-target categories.

python sanity_check.py

Expected output ends with === ALL PASS ===, with the zero-shot block looking like this:

=== zero-shot text cosine ===
  +0.7439  Cattleya labiata
  +0.5311  cat (off-target)
  +0.5104  car (off-target)
  +0.5063  Phalaenopsis
  +0.4851  Bulbophyllum
ok: top1 zero-shot is Cattleya labiata

The target species should win by ≥ +0.15 cosine over the nearest off-target. If it doesn't, the v8 weights did not load correctly.

Two notes for end-users:

The WARNING:root:No pretrained weights loaded for model 'ViT-L-14'. Model initialized randomly. line is benign. It comes from open_clip because load_orchid_clip_v8 passes pretrained=None — the next line then loads the actual v8 weights via load_state_dict. The zero-shot ranking is the real correctness signal; if the target wins as above, the weights are loaded.
The bundled sanity_check.py assumes a local SQLite DB only present on the upstream training host; on any other machine it exits cleanly after the synthetic forward (which passes even with random weights, so it's not the real correctness signal). If you don't have the dataset, point sanity_check_laptop.py at any Cattleya labiata image you have — the zero-shot ranking is the actual load-correctness signal.

Downloads last month: 37

Model tree for mjarnold/orchid-clip-v8

Base model

imageomics/bioclip-2

Finetuned

(3)

this model