orchid-clip-v8
Frozen image encoder for orchid foundation embeddings. Fine-tuned from BioCLIP 2 (ViT-L/14) on ~2.5M orchid images covering ~10K species (post-WCVP-dedup label space). Outputs 768-dim L2-normalized features.
Use this as a feature extractor for downstream orchid tasks where no pretrained model exists today: bloom-stage classification, disease detection, mounting-style identification, NN retrieval over a cultivated collection, active labeling for low-image species.
Files
| file | size | what |
|---|---|---|
open_clip_pytorch_model.bin |
1.6 G | full CLIP weights (image + text tower), open_clip fmt |
model_config.json |
60 B | {"model_name": "ViT-L-14", "framework": "open_clip"} |
embed_example.py |
— | minimal load + embed snippet |
sanity_check.py |
— | end-to-end validation (uses upstream SQLite, dev only) |
sanity_check_laptop.py |
— | same check using a single hand-picked reference image |
LICENSE |
— | MIT |
Install
pip install open_clip_torch torch pillow numpy
Tested with open_clip_torch>=2.24 and torch>=2.0. CUDA optional (CPU works, just slow).
Download
hf download mjarnold/orchid-clip-v8 --local-dir orchid-clip-v8
# or, programmatically:
python -c "from huggingface_hub import snapshot_download; snapshot_download('mjarnold/orchid-clip-v8', local_dir='orchid-clip-v8')"
Quickstart
from embed_example import load_orchid_clip_v8, embed_image
model, preprocess, _ = load_orchid_clip_v8("orchid-clip-v8", device="cuda")
feat = embed_image(model, preprocess, "your_flower.jpg") # (768,) L2-normalized
The state dict is wrapped — keys live under state["state_dict"], not at the top level. See embed_example.py for the load incantation.
Eval (clean held-out)
n=4000, 547 species. Reports top-1 / top-5 / genus-top-1 accuracy on a stratified hash-bucketed holdout (no train leakage).
| model | top-1 | top-5 | genus-top-1 | macro-genus (clean) |
|---|---|---|---|---|
| BioCLIP 2 | 0.873 | 0.978 | 0.992 | (baseline) |
| orchid-clip-v8 | 0.911 | 0.986 | 0.991 | 0.812 |
Per-genus top-1 (orchid-clip-v8 vs BioCLIP 2):
| genus | n | v8 | BioCLIP 2 | Δ |
|---|---|---|---|---|
| Ophrys | 2754 | 0.933 | 0.905 | +2.8 pp |
| Habenaria | 232 | 0.922 | 0.845 | +7.7 pp |
| Dendrobium | 161 | 0.919 | 0.907 | +1.2 pp |
| Prosthechea | 145 | 0.890 | 0.855 | +3.5 pp |
| Encyclia | 101 | 0.832 | 0.842 | -1.0 pp |
| Pleurothallis | 100 | 0.800 | 0.690 | +11.0 pp |
| Cymbidium | 99 | 0.879 | 0.939 | -6.0 pp |
| Maxillaria | 94 | 0.787 | 0.649 | +13.8 pp |
| Masdevallia | 64 | 0.859 | 0.781 | +7.8 pp |
| Oncidium | 57 | 0.860 | 0.825 | +3.5 pp |
| Bulbophyllum | 41 | 0.732 | 0.585 | +14.6 pp |
| Lepanthes | 40 | 0.800 | 0.525 | +27.5 pp |
| Stelis | 25 | 0.640 | 0.400 | +24.0 pp |
| Laelia | 24 | 0.958 | 1.000 | -4.2 pp |
The biggest lifts are exactly the long-tail Pleurothallidinae genera (Stelis, Lepanthes, Bulbophyllum, Pleurothallis, Maxillaria) — small genera that BioCLIP 2 struggles with. v8's training-side inverse-sqrt sampler + WCVP synonym dedup are responsible.
Training recipe (v8)
- Backbone: BioCLIP 2 ViT-L/14 (image + text tower both fine-tuned).
- Data: 2.5M images, ~10K post-WCVP-dedup binomials, gated by quality_score ≥ 0.3 + orchid subfamily filter.
- Sampler: inverse-sqrt class weighting; per-species cap 2000.
- Filter: v6 image↔binomial cosine ≥ 0.727 (drops the bottom percentile of mislabeled / off-target rows).
- Optimizer: AdamW8bit, lr 1e-5, bs 64, 5 epochs.
- Synonym handling: all rows remapped through WCVP 2026-01-07 (
species_acceptedcolumn populated on 2.94M rows; 69K rows remapped across 4.5K binomials).
Intended use
- Feature extractor for downstream orchid classifiers. Freeze and add a head.
- Zero-shot orchid identification via the matching text tower (see
zero_shot_scoreinembed_example.py). - Retrieval over a cultivated collection (cosine over L2-normalized features).
- Active labeling of unlabeled / under-labeled candidates by NN cosine to a labeled set.
Limitations
- Long-tail still weak. Stelis (n=25) at 0.64 top-1 is the floor; species with <10 labeled rows in training pool are not reliably distinguished.
- Synonym confusions. Despite WCVP dedup, intra-genus cryptic-species pairs remain (most v8 errors are intra-genus, 93% per a hard-negative audit). Treat top-1 as "this genus, probably this species" rather than ground truth.
- Hybrids. ~19 hybrid rows survive the quality filter; hybrid identification is not a target. The model will emit a top-1 species label for hybrids — that label is the closest-matching parent, not "hybrid."
- Not a backbone for fine-grained pose / morphometry. Trained for species classification, not part-localized tasks.
Citation
If you use this in a paper or downstream model:
@misc{orchid_clip_v8_2026,
title = {orchid-clip-v8: long-tail-aware CLIP for orchid identification},
author = {Arnold, M.J.},
year = {2026},
note = {Fine-tuned from BioCLIP 2; weights at https://huggingface.co/mjarnold/orchid-clip-v8},
}
License
MIT — see LICENSE. Weights derived from BioCLIP 2 (also MIT). Training data drawn from public CC0 / CC-BY sources (iNaturalist research-grade, GBIF, Smithsonian NMNH, Wikimedia, Flickr CC-licensed, OrchidRoots).
Validation
sanity_check.py does an end-to-end correctness check: load the model, run synthetic + real-image forward passes, then zero-shot rank a Cattleya labiata image against three orchid genera and two off-target categories.
python sanity_check.py
Expected output ends with === ALL PASS ===, with the zero-shot block looking like this:
=== zero-shot text cosine ===
+0.7439 Cattleya labiata
+0.5311 cat (off-target)
+0.5104 car (off-target)
+0.5063 Phalaenopsis
+0.4851 Bulbophyllum
ok: top1 zero-shot is Cattleya labiata
The target species should win by ≥ +0.15 cosine over the nearest off-target. If it doesn't, the v8 weights did not load correctly.
Two notes for end-users:
The
WARNING:root:No pretrained weights loaded for model 'ViT-L-14'. Model initialized randomly.line is benign. It comes fromopen_clipbecauseload_orchid_clip_v8passespretrained=None— the next line then loads the actual v8 weights viaload_state_dict. The zero-shot ranking is the real correctness signal; if the target wins as above, the weights are loaded.The bundled
sanity_check.pyassumes a local SQLite DB only present on the upstream training host; on any other machine it exits cleanly after the synthetic forward (which passes even with random weights, so it's not the real correctness signal). If you don't have the dataset, pointsanity_check_laptop.pyat any Cattleya labiata image you have — the zero-shot ranking is the actual load-correctness signal.
- Downloads last month
- 37
Model tree for mjarnold/orchid-clip-v8
Base model
imageomics/bioclip-2