SpindleFlow RL โ Delegation Policy
LSTM PPO agent trained on SpindleFlow-v0 (OpenEnv).
Training summary
| Metric | Value |
|---|---|
| Algorithm | RecurrentPPO (SB3 + sb3-contrib) |
| Total timesteps | 30,000 |
| Episodes completed | 13526 |
| First-5 mean reward | 1.2053 |
| Last-5 mean reward | 2.2038 |
| Improvement | +0.9984 |
| Device | cuda |
Load
from sb3_contrib import RecurrentPPO
from huggingface_hub import hf_hub_download
model = RecurrentPPO.load(hf_hub_download("garvitsachdeva/spindleflow-rl", "spindleflow_model.zip"))
- Downloads last month
- 162
