UGTC: Uncertainty-Gated Temporal Credit

Paper UYES License: MIT GitHub

Accepted — Ulysseus Young Explorers in Science (UYES) Journal
Preprint DOI: 10.5281/zenodo.19715116 · Journal DOI forthcoming
Author: Yağız Ekrem Dalar | Ethosoft AI


What is UGTC?

UGTC is a backbone-agnostic plug-in advantage estimator for actor-critic reinforcement learning. It resolves the bias–variance trade-off in temporal credit assignment by maintaining two critics with different GAE λ values and blending their estimates using a sigmoid uncertainty gate:

A^UGTC_t = u(sₜ) · A^slow_t + (1 - u(sₜ)) · A^fast_t

u(s) = sigmoid(-β · (σ̂(s) - 1))
σ̂(s) = std(V¹_slow, ..., Vᴹ_slow)(s) / σ_EMA
  • Low ensemble disagreementu → 1 → use slow critic (accurate, λ=0.99)
  • High ensemble disagreementu → 0 → use fast critic (stable, λ=0.80)

Fixed Hyperparameters (same across all benchmarks)

Parameter Value
λ_fast 0.80
λ_slow 0.99
Ensemble size M 3
Gate temperature β 5.0
EMA momentum 0.99

Installation

git clone https://github.com/ethosoftai/ugtc.git
cd ugtc
pip install -e .

Or from this repo:

pip install huggingface_hub
python -c "from huggingface_hub import snapshot_download; snapshot_download('Ethosoft/ugtc', local_dir='ugtc')"
cd ugtc && pip install -e .

Quick Usage

from ugtc import UGTCModule

ugtc = UGTCModule(obs_dim=17)  # e.g. Hopper-v4

# In your actor-critic update — replace standard GAE with:
advantages = ugtc.compute_advantages(
    obs=obs,           # (T, obs_dim)
    next_obs=next_obs, # (T, obs_dim)
    rewards=rewards,   # (T,)
    dones=dones,       # (T,)
    gamma=0.99,
)

Supported Algorithms

Algorithm Key Change
UGTC-PPO A^UGTC replaces standard GAE in clipped surrogate
UGTC-TD3 UGTC baseline correction on actor gradient
UGTC-SAC V^UGTC replaces value baseline in actor loss
UGTC-DDPG UGTC advantage scales actor update (extension)

Repository Structure

ugtc/               Core Python package
  module.py         UGTCModule — backbone-agnostic core
  ppo.py            UGTC-PPO integration
  td3.py            UGTC-TD3 integration
  sac.py            UGTC-SAC integration
  ddpg.py           UGTC-DDPG integration (extension)
  utils.py          Evaluation utilities (IQM, bootstrap CI, AUC)
examples/           Runnable examples (CartPole, Pendulum, MuJoCo)
benchmarks/         Procgen + MuJoCo benchmark scripts
tests/              Unit and integration tests
implementations/
  cpp/ugtc.hpp      C++ header-only reference
  java/UGTCModule.java  Java reference
pseudocode/         Algorithm pseudocode (PPO, TD3, SAC)
configs/            YAML configs for all benchmarks
docs/               GitHub Pages documentation source

Citation

@misc{dalar2026ugtc,
  author    = {Dalar, Yağız Ekrem},
  title     = {{UGTC}: Uncertainty-Gated Temporal Credit},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.19715116},
  url       = {https://doi.org/10.5281/zenodo.19715116},
  note      = {Accepted — Ulysseus Young Explorers in Science (UYES) Journal.
               Journal DOI forthcoming.}
}

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading