πŸ›‘οΈ Epistemic Agent v2 - Autonomy Calibration Hub

This model is a Calibrated Epistemic Agent trained specifically for the OpenEnv India Hackathon 2026. It was fine-tuned using Group Relative Policy Optimization (GRPO) to master the balance between autonomous action and information gathering.

🧠 Model Description

Unlike typical LLMs that "hallucinate" or guess when faced with ambiguous instructions, this agent has been trained to use the INVESTIGATE action when it detects uncertainty.

  • Objective: Learn when to take direct autonomous action vs. when to pause and gather forensics.
  • Algorithm: GRPO (Group Relative Policy Optimization)
  • Base Model: Qwen2.5-0.5B-Instruct
  • Task Alignment: Autonomy Calibration Benchmark (OpenEnv)

πŸ“Š Training Performance

The agent was trained on high-ambiguity scenarios across three domains: Email Triage, DevOps Incidents, and Financial Requests.

Benchmark Blind Baseline Calibrated Agent (Ours) Improvement
Email Triage 0.378 0.798 +42.0%
DevOps Incident 0.572 0.939 +36.7%
Financial Request 0.773 0.990 +21.7%

Key Behavioral Signal:

The model demonstrates an Investigation Rate of 100% on ambiguous signals, effectively resolving partial observability before committing to high-stakes decisions.

πŸ› οΈ Training Procedure

  • Steps: 100
  • Group Size (G): 8 generations per prompt
  • Reward Range: (0.01, 0.99) - Strictly OpenEnv compliant.
  • Penalty Logic: Severe negative rewards (-0.90) for "Act" decisions on Ambiguous states.

πŸš€ How to Use

This model is designed to be used in conjunction with the Autonomy Calibration Benchmark.

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = "Qwen/Qwen2.5-0.5B-Instruct"
adapter = "JOY0021/autonomy-grpo-agent-v2"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model)
model = PeftModel.from_pretrained(model, adapter)
Downloads last month
49
Video Preview
loading

Model tree for JOY0021/autonomy-grpo-agent-v2

Adapter
(571)
this model