Graspmax — GeoMatch v2 · GeoMatch++ · GeoMatch v1
Graspmax contains geometry-aware contact prediction models for dexterous robotic grasping, trained on the CMapDataset across 5 robot end-effectors (EZGripper, Barrett, Robotiq 3-Finger, Allegro, ShadowHand).
⚠️ Version notice: GeoMatch v1 and GeoMatch++ were trained with a corrupted
robot_keypoints.json(2× scale factor and wrong shadowhand axis-swap stage). Use GeoMatch v2 for any new work. v1 and GeoMatch++ are kept for reproducibility only.
Models at a Glance
| Model | Status | File prefix | Val loss | Val acc |
|---|---|---|---|---|
| GeoMatch v2 | ✅ Recommended | geomatch_v2_* |
1.594 | 0.695 |
| GeoMatch++ | ⚠️ Deprecated (built on v1 encoders) | geomatch_pp_* |
0.350 | 0.940 |
| GeoMatch v1 | ⚠️ Deprecated (corrupted keypoints) | geomatch_final / checkpoint_epoch* |
0.435 | 0.959 |
The lower loss/higher accuracy of v1 and GeoMatch++ are an artefact of training on corrupted keypoints — the 2× scale inflated keypoint distances making the contact maps geometrically trivial to predict. v2 trains on correct geometry and is the only model that produces valid IK targets during grasp generation.
Architecture
GeoMatch (v1 and v2 share the same architecture)
Dual GCN encoder (object + robot surface) → L2-normalised embeddings → linear projection heads (512→64) × 2 → 5 autoregressive MLP modules → per-keypoint BCE contact map prediction.
Based on: Geometry Matching for Multi-Embodiment Grasping (NeurIPS 2024)
GeoMatch++
Extends GeoMatch with a morphology encoder (GCN over the robot kinematic-tree graph, 9D node features, 32 nodes) and a DCP-style cross-attention transformer that fuses object geometry with robot morphology before contact prediction. Pretrained GeoMatch v1 encoders are frozen.
Based on: GeoMatch++: Morphology-Aware Grasping via Correspondence Learning
Component Comparison
| Component | GeoMatch v1 / v2 | GeoMatch++ |
|---|---|---|
| Object GCN encoder | 3 layers × 256 → 512, trainable | Same, frozen (from GeoMatch v1) |
| Robot surface GCN | 3 layers × 256 → 512, trainable | Same, frozen (from GeoMatch v1) |
| Morphology encoder | — | NEW GCN(9 → 256×3 → 512), trainable |
| Cross-attention | — | NEW DCP transformer (512-dim, 4 heads, 1 layer) |
| Projection heads | Linear(512→64) × 2 | Same, re-initialised |
| AR keypoint modules | 5× MLP | Same, re-initialised |
| Total params | ~1.9M |
What Changed in v2 (Keypoint Bug Fix)
GeoMatch requires a robot_keypoints.json that defines canonical 3D keypoint positions for each
robot in rest-pose world space. The v1 keypoints had two bugs:
Bug 1 — 2× scale factor: The generation script applied world_pos *= 2.0, citing
HandModel's hand_scale=2.0 class default. However, every actual call site passes
hand_scale=1.0, overriding that default. Because the scale was applied before the inverse-FK
projection that HandModel.get_canonical_keypoints() uses (T⁻¹[2p;1] ≠ 2·T⁻¹[p;1]), the
distortion was not uniform — it grew with each link's distance from the kinematic root, corrupting
both training labels and inference IK targets.
Bug 2 — ShadowHand axis-swap at wrong stage: The [x, -z, y] axis permutation for ShadowHand
was applied to the final world-space world_pos (after FK). The reference implementation
(gripper_utils.py) applies it to raw mesh points in link-local space before the visual-origin
transform. Rotation and axis permutation do not commute, so the wrong stage produced scrambled
keypoint positions for any ShadowHand link with a non-zero visual-origin rotation.
Both bugs were confirmed by comparing generate_keypoints_json.py against gripper_utils.py and
verified by observing that v1 ShadowHand tip keypoints had y-values of ~−0.84 m (outside any
physical hand envelope) versus the corrected ~0.01 m.
Training Details
GeoMatch v2 ✅ (Recommended)
| Setting | Value |
|---|---|
| Dataset | CMapDataset (ContactDB + YCB), fixed keypoints |
| End-effectors | EZGripper, Barrett, Robotiq 3-Finger, Allegro, ShadowHand |
| Batch size | 256 |
| Optimizer | Adam (β₁=0.9, β₂=0.99) |
| Learning rate | 1e-4 |
| Epochs | 200 |
| Hardware | AMD Instinct MI300X (192 GB HBM3), ROCm 6.2.4 |
| Training time | 8.58 hours |
| Precision | FP32 |
| Final val loss | 1.594 |
| Final val accuracy | 0.695 |
GeoMatch v2 Training Curves
| Epoch | Val Loss | Val Accuracy |
|---|---|---|
| 0 | 1.935 | 0.205 |
| 25 | 1.731 | 0.563 |
| 50 | 1.675 | 0.580 |
| 100 | 1.649 | 0.632 |
| 150 | 1.603 | 0.656 |
| 199 | 1.594 | 0.695 |
GeoMatch++ ⚠️ (Deprecated — built on GeoMatch v1 encoders)
| Setting | Value |
|---|---|
| Initialisation | Pretrained GeoMatch v1 encoders (frozen) |
| Trainable params | ~5.8M |
| Batch size | 32 per GPU × 8 GPUs = 256 effective |
| Optimizer | Adam (β₁=0.9, β₂=0.99) |
| Learning rate | 5e-5 |
| Epochs | 150 |
| Hardware | 8× AMD Instinct MI300X, ROCm 6.2.4 (DDP) |
| Training time | ~2.8 hours |
| Precision | FP32 |
| Final val loss | 0.350 (artefact of corrupted training data) |
| Final val accuracy | 0.940 (artefact of corrupted training data) |
GeoMatch++ Training Curves
| Epoch | Val Loss | Val Accuracy |
|---|---|---|
| 0 | 0.465 | 0.999 |
| 25 | 0.370 | 0.880 |
| 89 | 0.362 | 0.902 |
| 149 | 0.350 | 0.940 |
GeoMatch v1 ⚠️ (Deprecated — corrupted keypoints)
| Setting | Value |
|---|---|
| Dataset | CMapDataset (ContactDB + YCB), corrupted keypoints |
| Batch size | 256 |
| Optimizer | Adam (β₁=0.9, β₂=0.99) |
| Learning rate | 1e-4 |
| Epochs | 200 |
| Hardware | AMD Instinct MI300X (192 GB HBM3), ROCm 6.2.4 |
| Training time | 22.18 hours |
| Precision | FP32 |
| Final val loss | 0.435 (artefact of corrupted training data) |
| Final val accuracy | 0.959 (artefact of corrupted training data) |
Checkpoints
GeoMatch v2 ✅ (Use these)
| File | Epoch | Val Loss | Notes |
|---|---|---|---|
geomatch_v2_epoch50.pth |
50 | 1.675 | Early convergence |
geomatch_v2_epoch100.pth |
100 | 1.649 | Mid-training |
geomatch_v2_epoch150.pth |
150 | 1.603 | Near-converged |
geomatch_v2_final.pth |
199 | 1.594 | Final model (recommended) |
GeoMatch++ ⚠️ (Deprecated)
| File | Epoch | Notes |
|---|---|---|
geomatch_pp_checkpoint_epoch50.pth |
50 | Early convergence |
geomatch_pp_checkpoint_epoch100.pth |
100 | Mid-training |
geomatch_pp_checkpoint_epoch140.pth |
140 | Near-converged |
geomatch_pp_final.pth |
149 | Final (deprecated) |
GeoMatch v1 ⚠️ (Deprecated)
| File | Epoch | Notes |
|---|---|---|
checkpoint_epoch50.pth |
50 | Early convergence |
checkpoint_epoch100.pth |
100 | Mid-training |
checkpoint_epoch150.pth |
150 | Near-converged |
geomatch_final.pth |
200 | Final (deprecated) |
Usage
GeoMatch v2 (Recommended)
import torch, sys
sys.path.append(".")
import config
from models.geomatch import GeoMatch
model = GeoMatch(config).cuda()
model.load_state_dict(torch.load("geomatch_v2_final.pth", map_location="cuda"))
model.eval()
with torch.no_grad():
contact_map, keypoint_probs = model(
obj_pc, # [B, 2048, 3] object point cloud
robot_pc, # [B, 6, 3] robot surface points (6 keypoints)
robot_key_point_idx, # [B, 6] keypoint indices into robot_pc
obj_adj, # [B, 2048, 2048] object adjacency (sparse COO)
robot_adj, # [B, 6, 6] robot adjacency
xyz_prev, # [B, 6, 3] previous keypoint positions
)
# contact_map: [B, 2048, 6, 1] — per-object-point × per-keypoint contact probability
# keypoint_probs: [B, 2048, 5, 1] — autoregressive keypoint contact probabilities
GeoMatch++ (Deprecated — kept for reproducibility)
import torch, sys
sys.path.append(".")
import config
from models.geomatch_pp import GeoMatchPP
model = GeoMatchPP(config).cuda()
model.load_state_dict(torch.load("geomatch_pp_final.pth", map_location="cuda"))
model.eval()
with torch.no_grad():
contact_map, keypoint_probs = model(
obj_pc, # [B, 2048, 3]
robot_pc, # [B, 6, 3]
robot_key_point_idx, # [B, 6]
obj_adj, # [B, 2048, 2048]
robot_adj, # [B, 6, 6]
xyz_prev, # [B, 6, 3]
morph_features, # [B, 32, 9] morphology node features
morph_adj, # [B, 32, 32] morphology adjacency
)
Morphology graphs are pre-built per robot using preprocess_morphology.py → gnn_morphology_new.pt.
Repository Structure
models/
geomatch.py # GeoMatch model (shared by v1 and v2)
geomatch_pp.py # GeoMatch++ model (+ morphology encoder + DCP transformer)
gnn.py # Graph Convolutional Network
mlp.py # MLP building block
config.py # Hyperparameters for all models
generate_keypoints_json.py # Fixed keypoint generator (used for v2 training data)
Citation
@inproceedings{geomatch2024,
title = {Geometry Matching for Multi-Embodiment Grasping},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2024},
}
@article{geomatch_pp2024,
title = {GeoMatch++: Morphology-Aware Grasping via Correspondence Learning},
journal = {arXiv preprint arXiv:2412.18998},
year = {2024},
}
License
Original GeoMatch code © 2023 DeepMind Technologies Limited, licensed under the Apache License 2.0.
GeoMatch++ extension, v2 training, and all checkpoints produced by Dimios45 as part of the Graspmax project.