Environment layouts for the four non-Markovian RL tasks.

Environment layouts for the four non-Markovian RL tasks.

Source publication
Preprint
Full-text available
We generalise the problem of reward modelling (RM) for reinforcement learning (RL) to handle non-Markovian rewards. Existing work assumes that human evaluators observe each step in a trajectory independently when providing feedback on agent behaviour. In this work, we remove this assumption, extending RM to include hidden state information that cap...

Context in source publication

Context 1
... propose four tasks with different types of hidden information. All use a 2D navigation environment with two spawn zones (where the agent initialises), and an episode time limit of T = 100; see Figure 4. In each case, the environment state contains the x, y position of the agent only. ...

Similar publications

Preprint
Full-text available
Constrained multiagent reinforcement learning (C-MARL) is gaining importance as MARL algorithms find new applications in real-world systems ranging from energy systems to drone swarms. Most C-MARL algorithms use a primal-dual approach to enforce constraints through a penalty function added to the reward. In this paper, we study the structural effec...