Vision, Simplified.

Small models can recognize more than their size suggests.

GVM explores efficient computer vision using lightweight architectures, fast inference, and practical deployment.

Designed to run almost anywhere.


Classification Performance

Epoch Training Loss Validation Accuracy
1 3.36 41.75%
2 2.78 47.14%
3 2.64 47.40%

Quick Start

import torch
import torchvision.transforms as transforms
import timm
import requests
import json
from PIL import Image

config = json.loads(
    requests.get(
        "https://huggingface.co/WhirlwindAI/GVM/resolve/main/config.json"
    ).text
)

model = timm.create_model(
    "mobilenetv2_100",
    pretrained=False,
    num_classes=config["num_classes"]
)

state = torch.hub.load_state_dict_from_url(
    "https://huggingface.co/WhirlwindAI/GVM/resolve/main/model.pth",
    map_location="cpu"
)

model.load_state_dict(state)
model.eval()

transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485,0.456,0.406],
        std=[0.229,0.224,0.225]
    )
])

image = Image.open("image.jpg").convert("RGB")
tensor = transform(image).unsqueeze(0)

prediction = model(tensor).argmax(1).item()

print(config["class_names"][prediction])

Highlights

Architecture MobileNetV2
Dataset CIFAR-100
Classes 100
Model Size 14 MB
Framework PyTorch
Inference CPU & GPU Friendly

Repository Contents

model.pth
config.json
README.md

Current Limitations

  • Trained for only 3 epochs
  • Frozen backbone during training
  • CIFAR-100 is considerably harder than CIFAR-10
  • Intended as an efficient baseline rather than a state-of-the-art classifier

Roadmap

  • Higher resolution training
  • Full backbone fine-tuning
  • Improved augmentation
  • ONNX export
  • TensorRT support
  • Interactive demo




Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train WhirlwindAI/GVM