Helpstral β LoRA Fine-tuned Pixtral 12B for Drone Safety Assessment
LoRA adapter for real-time pedestrian safety classification from drone camera images, built for the Louise AI Safety Drone Escort system.
What it does
Given a drone camera frame during an escort mission, the model outputs a structured threat assessment:
- threat_level (1β10) β evidence-based risk score
- status β SAFE, CAUTION, or DISTRESS
- people_count β number of people visible in frame
- user_moving β whether the escorted person appears to be walking
- proximity_alert β whether another person is within ~3m of the user
- observations β what the model sees (lighting, obstacles, people)
- pattern β temporal reasoning from multi-frame context
- reasoning β explanation connecting image + location data
- action β CONTINUE_MONITORING, INCREASE_SCAN_RATE, ALERT_USER, EMERGENCY_HOVER, etc.
This powers operator-in-the-loop alerts: when the user stops moving for 10+ seconds or another person is in close proximity, mission control receives a review request.
Training
| Parameter | Value |
|---|---|
| Base model | Pixtral 12B (Unsloth 4-bit) |
| Method | LoRA (PEFT), trained with Unsloth |
| LoRA rank (r) | 64 |
| LoRA alpha | 128 |
| Target modules | language model attention (q_proj, v_proj, etc.) |
| Task type | CAUSAL_LM |
| PEFT version | 0.18.1 |
Usage
Inference server (Colab): See helpstral/serve_colab.ipynb in the Louise repo. Run it on a T4 GPU, then set HELPSTRAL_ENDPOINT=<ngrok_url> in .env.
Load locally:
import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration, BitsAndBytesConfig
from peft import PeftModel
from PIL import Image
processor = AutoProcessor.from_pretrained("mistral-community/pixtral-12b")
model = LlavaForConditionalGeneration.from_pretrained(
"mistral-community/pixtral-12b",
quantization_config=BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16),
device_map="auto",
)
model = PeftModel.from_pretrained(model, "BenBarr/helpstral")
model = model.merge_and_unload().eval()
img = Image.open("drone_frame.jpg").convert("RGB")
chat = [{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": "Analyze this drone camera frame. Output JSON: threat_level, status, people_count, user_moving, proximity_alert, observations, pattern, reasoning, action."},
]}]
prompt = processor.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = processor(text=prompt, images=[img], return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=400, do_sample=False)
result = processor.batch_decode(out, skip_special_tokens=True)[0]
# Parse JSON from result...
Architecture
Helpstral sits in the Louise multi-agent drone escort system:
- Helpstral (this model) β safety/threat assessment from camera images
- Flystral β flight control from camera images (BenBarr/flystral)
- Louise β conversational safety companion (Ministral 3B)
When the fine-tuned endpoint is available, Helpstral uses this adapter. When offline, it falls back to Pixtral 12B via the Mistral API with function calling (queries real OpenStreetMap data for streetlight density, etc.).
Developed by
Ben Barrett β Mistral Worldwide Hackathon 2026