Environment layouts for the four non-Markovian RL tasks.

Source publication

Figure 2: MIL architectures used in this work. FE = feature extractor;...

Figure 3: During RL agent training, our LSTM MIL models sit at the...

Figure 4: Environment layouts for the four non-Markovian RL tasks.

Figure A1: Key task interpretability. Top: Predicted reward with...

Figure A2: Moving task interpretability. Top: The predicted reward with...

Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning

Preprint

Full-text available

May 2022

We generalise the problem of reward modelling (RM) for reinforcement learning (RL) to handle non-Markovian rewards. Existing work assumes that human evaluators observe each step in a trajectory independently when providing feedback on agent behaviour. In this work, we remove this assumption, extending RM to include hidden state information that cap...

Context 1

... propose four tasks with different types of hidden information. All use a 2D navigation environment with two spawn zones (where the agent initialises), and an episode time limit of T = 100; see Figure 4. In each case, the environment state contains the x, y position of the agent only. ...

View in full-text

Figure 1: Building energy management with a voltage constraint at the...

Figure 2: Example of the occupation measure for various levels of γ.

Figure 3: Effective horizon length as a function of γ.

Figure 4: Example of VaR and CVaR at risk level β = 0.9.

Figure 6: Pr{C(x) ≥ 0.1 | x ∼ µ γ } measured throughout training. Key:...

Interpreting Primal-Dual Algorithms for Constrained MARL

Preprint

Full-text available

Nov 2022

Constrained multiagent reinforcement learning (C-MARL) is gaining importance as MARL algorithms find new applications in real-world systems ranging from energy systems to drone swarms. Most C-MARL algorithms use a primal-dual approach to enforce constraints through a penalty function added to the reward. In this paper, we study the structural effec...

Environment layouts for the four non-Markovian RL tasks.

Context in source publication

Similar publications