Improvement of β MH on min { β H , β S , β P }

Source publication

Solution Sets for DCOPs and Graphical Games: Metrics and Bounds.

Conference Paper

Full-text available

Jan 2006

A distributed constraint optimization problem (DCOP) is a formal- ism that captures the rewards and costs of local interactions within a team of agents, each of whom is choosing an individual action. When rapidly selecting a single joint action for a team, we typically solve DCOPs (often using locally optimal algorithms) to generate a single soluti...

Context 1

... E 1 and the set B ([0 0 0]). Applying the exclusivity relations again for each b ∈ B ( a ), and discarding JAs already included in a or B ( a ), we generate a set B ( b ) = ∪ E ∈ E k f ( b , E ) which contains all JAs that potentially exclude b from being k -optimal. In Figure 6, we apply E 1 to find B ( b ) for all b ∈ B ( a ) = { [1 0 0] , [0 1 0] , [0 0 1] , [1 0 1] } where the grayed out JAs are those discarded for being in { a }∪ B ( a ). To ensure that the region that a claims is disjoint from the regions claimed by other k -optima, a should only claim a fraction of each b ∈ B ( a ). This can be achieved if a shares each b equally with all other k -optima that might exclude b . These additional k -optima are contained within B ( b ). However, not all b ∈ B ( b ) can actually be k -optimal as they might exclude each other. If we construct a graph H k ( b ) with nodes for all b ∈ B ( b ) and edges formed using E k , and we find M b , the size of the MIS, then a can safely claim 1 / (1 + M b ) of b . We again use clique partitioning to safely estimate M b . In Figure 6, for b = [0 1 0] , B ([0 1 0]) leads to a three-node, three-edge exclusivity graph H k ([0 1 0]). By adding the values of 1 / (1 + M b ) for all b ∈ B ( a ) (plus one for itself), we obtain that a can safely claim a region of size 3, which implies β S RP = 2 3 / 3 = 2. Algorithm 1’s runtime is polynomial in the number of possible JAs, which is a comparatively small cost for a bound that applies to every possible instantiation of rewards to actions. An exhaustive search for the MIS of H k would be exponential in this number (doubly exponential in the number of agents). We performed five evaluations in addition to the experiment described in Section 2 . The first evaluates the impact of k -optimality for higher values of k . For each of the three DCOP graphs from Figure 2(a-c), Figure 7(a-c) shows key properties for 1-, 2- and 3- optima. The first column of each table shows | A ̃ | , the size of the neighborhood containing all JAs within a distance of k from a k - optimal JA a , and hence of lower reward than a . For example, in the joint patrol domain described in Section 2, Figure 7(a) shows that, if agents are arranged as in the DCOP graph from Figure 2 (a), any 1-optimal joint patrol must have a higher reward than at least 10 other joint patrols. We see that as k increases, the k -optimal set contains JAs that each individually dominate a larger and larger neighborhood. The second column shows, for each of the three graphs, the average reward of each k -optimal JA set found over 20 problem instances, generated by assigning rewards to the links from a uniform random distribution. We define the reward of a k -optimal JA set as the mean reward of all k -optimal JAs that exist for a particular problem instance; each figure in the second column is therefore a mean of means. As k was increased, leading to a larger neighborhood of dominated JAs, the average reward of the k -optimal JA sets show a significant increase (T-tests showed the increase in average reward as k increased was significant within 5%.) However as k increases, the number of possible k -optimal JAs decreases, and hence the next four evaluations explore the e ff ec- tiveness of the di ff erent bounds on the number of k -optima. For the three DCOP graphs shown in Figure 2, Figure 8 provides a concrete demonstration of the gains in resource allocation due to the tighter bounds made possible with graph-based analysis. The x axis in Figure 8 shows k , and the y axis shows the β HS P and β S RP bounds on the number of k -optima that can exist. To understand the impli- cations of these results on resource allocation, consider a patrolling problem where the constraints between agents are shown in the 10- agent DCOP graph from Figure 2(a), and all agents consume one unit of fuel for each JA taken. Suppose that k = 2 has been chosen, and so at runtime, the agents will use MGM-2 [9], repeatedly, to find and execute a set of 2-optimal JAs. We must allocate enough fuel to the agents a priori so they can execute up to all possible 2-optimal JAs. Figure 8(a) shows that if β HS P is used, the agents would be loaded with 93 units of fuel to ensure enough for all 2- optimal JAs. However, β S RP reveals that only 18 units of fuel are su ffi cient, a five-fold savings. (For clarity we note that on all three graphs, both bounds are 1 when k = I and 2 when I − 3 ≤ k < I .) To systematically investigate the impact of graph structure on bounds, we generated a large number of DCOP graphs of varying size and density. We started with complete binary graphs (all pairs of agents are connected) where each node (agent) had a unique ID. Edges were repeatedly removed according to the following two- step process: (1) Find the lowest-ID node that has more than one incident edge. (2) If such a node exists, find the lowest-ID node that shares an edge with it, and remove this edge. Figure 9 shows the β HS P and β S RP bounds for k -optima for k ∈ { 1 , 2 , 3 , 4 } and I ∈ { 7 , 8 , 9 , 10 } . For each of the 16 plots shown, the y axis shows the bounds and the x -axis shows the number of links removed from the graph according to the above method. While β HS P < β S RP for very dense graphs, β S RP provides significant gains for the vast ma- jority of cases. For example, for the graph with 10 agents, and 24 links removed, and a fixed k = 1, β HS P implies that we must equip the agents with 512 resources to ensure that all resources are not exhausted before all 1-optimal actions are executed. However, β S RP indicates a that a 15-fold reduction to 34 resources will suf- fice, yielding a savings of 478 due to the use of graph structure when computing bounds. A fourth experiment compared β HS P and β S RP to the bound obtained by applying F CLIQUE , β FCLIQUE to DCOP graphs from the previous experiment. Selected results are shown in Figure 10 for graphs of 8 and 9 agents. While β FCLIQUE is marginally better for k = 1, β S RP has clear gains for k = 4. Identifying the relative e ff ec- tiveness of various algorithms that exploit our exclusivity relation sets is clearly an area for future work. Finally, Figure 11 compares the constant-time-computable graph- independent bounds from Section 3, in particular, showing the improvement of β MH over min { β H , β S , β P } for selected odd values of k , given three possible actions for each agent ( q = 3). The x axis shows I , the number of agents and the y -axis show s 100 · (min { β H , β S , β P } − β MH ) / min { β H , β S , β P } . For odd values of k > 1, as I increased, β provided a tighter bound on the number of k ...

View in full-text

Curiosity Based Reinforcement Learning on Robot Manufacturing Cell Reinforcement Learning for automation View project Game theoretical concepts for self-learning in industrial automation View project Curiosity Based Reinforcement Learning on Robot Manufacturing Cell

Conference Paper

Full-text available

Jan 2021

This paper introduces a novel combination of scheduling control on a flexible robot manufacturing cell with curiosity based reinforcement learning. Reinforcement learning has proved to be highly successful in solving tasks like robotics and scheduling. But this requires hand tuning of rewards in problem domains like robotics and scheduling even whe...

Solution sets for DCOPs and graphical games

Article

Full-text available

May 2006

A distributed constraint optimization problem (DCOP) is a formal-ism that captures the rewards and costs of local interactions within a team of agents, each of whom is choosing an individual action. When rapidly selecting a single joint action for a team, we typically solve DCOPs (often using locally optimal algorithms) to generate a single solutio...

Node and Link Allocation in Network Virtualization Based on Distributed Constraint Optimization

Article

Full-text available

Jan 2018
J NETW SYST MANAG

Virtual Networks (VNs) offer a flexible and economic approach to deploy customer suited networks. However, defining how resources of a physical network are used to support VNs requirements is a NP-hard problem. For this reason, heuristics have been used on mapping of virtual networks. Although heuristics do not ensure the optimal solution, they implement fast solutions and showed satisfactory results. This work presents a modeling of the node and link allocation problem using Distributed Constraint Optimization Problem (DCOP) with factor graphs, which is a formalism widely used in real distributed optimization problems. In our approach, we use the max-sum algorithm to solve the DCOP. Correctness criteria for this approach are discussed and verifications are conducted through model checking.

Multi-agent coordination for dynamic decentralised task allocation

Article

Dec 2011

Kathryn Macarthur

Coordination of multiple agents for dynamic task allocation is an important and challenging problem, which involves deciding how to assign a set of agents to a set of tasks, both of which may change over time (i.e., it is a dynamic environment). Moreover, it is often necessary for heterogeneous agents to form teams to complete certain tasks in the environment. In these teams, agents can often complete tasks more efficiently or accurately, as a result of their synergistic abilities. In this thesis we view these dynamic task allocation problems as a multi-agent system and investigate coordination techniques for such systems. In more detail, we focus specially on the distributed constraint optimisation problem (DCOP) formalism as our coordination technique. Now, a DCOP consists of agents, variables and functions agents must work together to find the optimal configuration of variable values. Given its ubiquity, a number of decentralised algorithms for solving such problems exist, including DPOP, ADOPT, and the GDL family of algorithms. In this thesis, we examine the anatomy of the above-mentioned DCOP algorithms and highlight their shortcomings with regard to their application to dynamic task allocation scenarios. We then explain why the max-sum algorithm (a member of the GDL family) is the most appropriate for our setting, and define specific requirements for performing multi-agent coordination in a dynamic task allocation scenario: namely, scalability, robustness, efficiency in communication, adaptiveness, solution quality, and boundedness. In particular, we present three dynamic task allocation algorithms: fast-max-sum, branchand-bound fast-max-sum and bounded fast-max-sum, which build on the basic max-sum algorithm. The former introduces storage and decision rules at each agent to reduce overheads incurred by re-running the algorithm every time the environment changes. However, the overall computational complexity of fast-max-sum is exponential in the number of agents that could complete a task in the environment. Hence, in branchand- bound fast-max-sum, we give fast-max-sum significant new capabilities: namely, an online pruning procedure that simplifies the problem, and a branch-and-bound technique that reduces the search space. This allows us to scale to problems with hundreds of tasks and agents, at the expense of additional storage. Despite this, fast-max-sum is only proven to converge to an optimal solution on instances where the underlying graph contains no cycles. In contrast, bounded fast-max-sum builds on techniques found in bounded max-sum, another extension of max-sum, to find bounded approximate solutions on arbitrary graphs. Given such a graph, bounded fast-max-sum will run our iGHS algorithm, which computes a maximum spanning tree on subsections of a graph, in order to reduce overheads when there is a change in the environment. Bounded fast-max-sum will then run fast-max-sum on this maximum spanning tree in order to find a solution. We have found that fast-max-sum reduces the size of messages communicated and the amount of computation by up to 99% compared with the original max-sum. We also found that, even in large environments, branch-and-bound fast-max-sum finds a solution using 99% less computation and up to 58% fewer messages than fast-max-sum. Finally, we found bounded fast-max-sum reduces the communication and computation cost of bounded max-sum by up to 99%, while obtaining 60{88% of the optimal utility, at the expense of needing additional communication than using fast-max-sum alone. Thus, fast-max-sum or branch-and-bound fast-max-sum should be used where communication is expensive and provable solution quality is not necessary, and bounded fast-max-sum where communication is less expensive, and provable solution quality is required. Now, in order to achieve such improvements over max-sum, fast-max-sum exploits a particularly expressive model of the environment by modelling tasks in the environment as function nodes in a factor graph, which need to have some communication and computation performed for them. An equivalent problem to this can be found in operations research, and is known as scheduling jobs on unrelated parallel machines (also known as RjjCmax). In this thesis, we draw parallels between unrelated parallel machine scheduling and the computation distribution problem, and, in so doing, we present the spanning tree decentralised task distribution algorithm (ST-DTDA), the first decentralised solution to RjjCmax. Empirical evaluation of a number of heuristics for ST-DTDA shows solution quality achieved is up to 90% of the optimal on sparse graphs, in the best case, whilst worst-case quality bounds can be estimated within 5% of the solution found, in the best case

Solving Multiagent Networks Using Distributed Constraint Optimization

Article

Full-text available

Sep 2008
AI MAG

In many cooperative multiagent domains, the effect of local interactions between agents can be compactly represented as a network structure. Given that agents are spread across such a network, agents directly interact only with a small group of neighbors. A distributed constraint optimization problem (DCOP) is a useful framework to reason about such networks of agents. Given agents' inability to communicate and col- laborate in large groups in such networks, we focus on an approach called k-optimality for solving DCOPs. In this ap- proach, agents form groups of one or more agents until no group of k or fewer agents can possibly improve the DCOP solution; we define this type of local optimum, and any algo- rithm guaranteed to reach such a local optimum, as k-optimal. The article provides an overview of three key results related to k-optimality. The first set of results are worst-case guar- antees on the solution quality of k-optima in a DCOP. These guarantees can help determine an appropriate k-optimal algo- rithm, or possibly an appropriate constraint graph structure, for agents to use in situations where the cost of coordination between agents must be weighed against the quality of the solution reached. The second set of results are upper bounds on the number of k-optima that can exist in a DCOP. These results are useful in domains where a DCOP must generate a set of solutions rather than single solution. Finally, we sketch algorithms for k-optimality and provide some experimental results for 1-, 2- and 3-optimal algorithms for several types of DCOPs.

PKOPT: Faster k-optimal solution for DCOP by improving group selection strategy

Conference Paper

Full-text available

Nov 2010

A significant body of work in multiagent systems over more than two decades has focused on multi-agent coordination. Many challenges in multi-agent coordination can be modeled as Distributed Constraint Optimizations (DCOPs). Many complete and incomplete algorithms have been introduced for DCOPs but complete algorithms are often impractical for large-scale and dynamic environments which lead to study incomplete algorithms. Some incomplete algorithms produce k-optimal solutions; a k-optimal solution is the one that cannot be improved by any deviation by k or fewer agents. In this paper we focus on the only k-optimal algorithm which works for arbitrary k, entitled as KOPT. In both complete and incomplete algorithms, computational complexity is the major concern. Different approaches are introduced to solve this problem and improve existing algorithms. The main contribution of this paper is to decrease computational complexity of KOPT algorithm by introducing a new method for selecting leaders which should assign new values to a group of agents. This new approach is called Partial KOPT (PKOPT). PKOPT is an effective method to reduce computational load and power consumption in implementation. This paper under various assumptions presents an analysis of sequential and stochastic PKOPT algorithms.

Improvement of β MH on min { β H , β S , β P }

Context in source publication

Similar publications

Citations