Team-average reward function estimation in Example 1 with the resilient projection-based consensus method. The estimated values ¯ r(s; λ i ) for s ∈ {0, 1} are shown in black color. The true team-average reward for s ∈ {0, 1} is depicted by the red and the blue dashed line, respectively. The shaded regions correspond to the convex hull of r i (s) for i ∈ N + .

Source publication

Figure 2: Team-average reward function estimation in Example 1 for p =...

Figure 3: Critic updates of the consensus-based AC and the...

Figure 4: Team-average reward function estimation in Example 1 with the...

Figure 5: Cooperative navigation task in the grid world environment....

Figure 6: Episode returns of the adversary and the team of cooperative...

Resilient Consensus-based Multi-agent Reinforcement Learning

Preprint

Full-text available

Nov 2021

Adversarial attacks during training can strongly influence the performance of multi-agent reinforcement learning algorithms. It is, thus, highly desirable to augment existing algorithms such that the impact of adversarial attacks on cooperative networks is eliminated, or at least bounded. In this work, we consider a fully decentralized network, whe...

Context 1

... and removes H largest values and H smallest values from each set, except for values that are smaller and larger than the value of the agent, respectively. Figure 4 demonstrates that the estimation under the resilient projection-based consensus method performs better than under the method of trimmed means in the presence of an adversary at least in the simple problem introduced in Example 1. Note that the resilient projection-based consensus method does not suffer overestimation of the approximated functions because the Byzantine agents can no longer directly manipulate individual parameters in v j t and λ j t . ...

View in full-text

Context 2

... want to establish that the agents reach consensus on the parameter values in the limit. In Lemma 4, we analyze the spectral radius of the mean consensus update in the disagreement subspace, which facilitates contraction in the disagreement subspace stated in Lemma 5 and 6. We let F x t = σ(x 0 , Y t−1 , ξ τ , τ ≤ t) denote a filtration of a random variable x ∈ {v, λ}, where Y t are the incremental changes in parameters v and λ due to (10) or (11), and ξ τ = (r τ , s τ , a τ , C τ −1 ) is a collection of random signals. ...

View in full-text

Contexts in source publication