SpindleFlow RL โ€” Delegation Policy

LSTM PPO agent trained on SpindleFlow-v0 (OpenEnv).

Training summary

Metric Value
Algorithm RecurrentPPO (SB3 + sb3-contrib)
Total timesteps 30,000
Episodes completed 13526
First-5 mean reward 1.2053
Last-5 mean reward 2.2038
Improvement +0.9984
Device cuda

Reward Curve

Load

from sb3_contrib import RecurrentPPO
from huggingface_hub import hf_hub_download
model = RecurrentPPO.load(hf_hub_download("garvitsachdeva/spindleflow-rl", "spindleflow_model.zip"))
Downloads last month
162
Video Preview
loading

Spaces using garvitsachdeva/spindleflow-rl 2