🛡️ Epistemic Agent v2 - Autonomy Calibration Hub

This model is a Calibrated Epistemic Agent trained specifically for the OpenEnv India Hackathon 2026. It was fine-tuned using Group Relative Policy Optimization (GRPO) to master the balance between autonomous action and information gathering.

🧠 Model Description

Unlike typical LLMs that "hallucinate" or guess when faced with ambiguous instructions, this agent has been trained to use the INVESTIGATE action when it detects uncertainty.

Objective: Learn when to take direct autonomous action vs. when to pause and gather forensics.
Algorithm: GRPO (Group Relative Policy Optimization)
Base Model: Qwen2.5-0.5B-Instruct
Task Alignment: Autonomy Calibration Benchmark (OpenEnv)

📊 Training Performance

The agent was trained on high-ambiguity scenarios across three domains: Email Triage, DevOps Incidents, and Financial Requests.

Benchmark	Blind Baseline	Calibrated Agent (Ours)	Improvement
Email Triage	0.378	0.798	+42.0%
DevOps Incident	0.572	0.939	+36.7%
Financial Request	0.773	0.990	+21.7%

Key Behavioral Signal:

The model demonstrates an Investigation Rate of 100% on ambiguous signals, effectively resolving partial observability before committing to high-stakes decisions.

🛠️ Training Procedure

Steps: 100
Group Size (G): 8 generations per prompt
Reward Range: (0.01, 0.99) - Strictly OpenEnv compliant.
Penalty Logic: Severe negative rewards (-0.90) for "Act" decisions on Ambiguous states.

🚀 How to Use

This model is designed to be used in conjunction with the Autonomy Calibration Benchmark.

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = "Qwen/Qwen2.5-0.5B-Instruct"
adapter = "JOY0021/autonomy-grpo-agent-v2"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model)
model = PeftModel.from_pretrained(model, adapter)

Downloads last month: 49

Video Preview

Reinforcement Learning

Model tree for JOY0021/autonomy-grpo-agent-v2

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Adapter

(571)

this model