Team-average reward function estimation in Example 1 with the resilient projection-based consensus method. The estimated values ¯ r(s; λ i ) for s ∈ {0, 1} are shown in black color. The true team-average reward for s ∈ {0, 1} is depicted by the red and the blue dashed line, respectively. The shaded regions correspond to the convex hull of r i (s) for i ∈ N + .

Team-average reward function estimation in Example 1 with the resilient projection-based consensus method. The estimated values ¯ r(s; λ i ) for s ∈ {0, 1} are shown in black color. The true team-average reward for s ∈ {0, 1} is depicted by the red and the blue dashed line, respectively. The shaded regions correspond to the convex hull of r i (s) for i ∈ N + .

Source publication
Preprint
Full-text available
Adversarial attacks during training can strongly influence the performance of multi-agent reinforcement learning algorithms. It is, thus, highly desirable to augment existing algorithms such that the impact of adversarial attacks on cooperative networks is eliminated, or at least bounded. In this work, we consider a fully decentralized network, whe...

Contexts in source publication

Context 1
... and removes H largest values and H smallest values from each set, except for values that are smaller and larger than the value of the agent, respectively. Figure 4 demonstrates that the estimation under the resilient projection-based consensus method performs better than under the method of trimmed means in the presence of an adversary at least in the simple problem introduced in Example 1. Note that the resilient projection-based consensus method does not suffer overestimation of the approximated functions because the Byzantine agents can no longer directly manipulate individual parameters in v j t and λ j t . ...
Context 2
... want to establish that the agents reach consensus on the parameter values in the limit. In Lemma 4, we analyze the spectral radius of the mean consensus update in the disagreement subspace, which facilitates contraction in the disagreement subspace stated in Lemma 5 and 6. We let F x t = σ(x 0 , Y t−1 , ξ τ , τ ≤ t) denote a filtration of a random variable x ∈ {v, λ}, where Y t are the incremental changes in parameters v and λ due to (10) or (11), and ξ τ = (r τ , s τ , a τ , C τ −1 ) is a collection of random signals. ...