Fig 2 - uploaded by S. Lakshmivarahan
Content may be subject to copyright.
Cooperation with a pure strategy (saddle point). Game A and game B have coincident saddle points but the payoff in A is greater than B so the agents support full cooperation. One simulation result ( = 0:1) is initialized to very low cooperation but it reaches the same conclusion as three other expected iterative solutions (dashed).  

Cooperation with a pure strategy (saddle point). Game A and game B have coincident saddle points but the payoff in A is greater than B so the agents support full cooperation. One simulation result ( = 0:1) is initialized to very low cooperation but it reaches the same conclusion as three other expected iterative solutions (dashed).  

Source publication
Article
Full-text available
A model is presented of learning automata playing stochastic games at two levels. The high level represents the choice of the game environment and corresponds to a group decision. The low level represents the choice of action within the selected game environment. Both of these decision processes are affected by delays in the information state due t...

Similar publications

Article
Full-text available
For a learning automaton, a proper configuration of its learning parameters, which are crucial for the automaton's performance, is relatively difficult due to the necessity of a manual parameter tuning before real applications. To ensure a stable and reliable performance in stochastic environments, parameter tuning can be a time-consuming and inter...

Citations

... A learning automaton tries to select the optimal action from the action set to minimize the average penalty received from the environment. Learning automata is beneficial in systems where information about the environment is not complete [28]. In addition, learning performs in complex, dynamic, and random environments with many uncertainties very well. ...
Article
Full-text available
The personalized recommender systems provide favorite services based on user preferences and interests. Due to the user's interests changing over time; hence the recommender system must be tracking these changes automatically; to overcome the research gap and col start problem in the current study, we suggest a framework to create an adaptive user profiling for a personalized recommender system using learning automata. We clustered items based on their features. In this technique, the learning automaton adjusts the amount of user interest in each cluster based on user feedback; then recommends the best items to the user based on demographic information of user and user's preferences. Several experiments are conducted on three movie datasets to show the performance of the proposed algorithm. The obtained results demonstrate that the proposed algorithm outperforms several existing approaches in terms of precision, recall, MAE, and RMSE.
... For instance, In Team Q-learning [37], agents learn common Q-values independently. In addition, using learning automata in [38][39][40] to find optimal policy in Markov game have been represented. ...
Preprint
Full-text available
Learning-based techniques could be an alternative approach to solve Dynamic Distributed Constraint Optimization Problems (DDCOPs) and are computationally cheaper than sequential DCOP solvers. This paper, proposes a learning-based solution to solve DDCOPs in which the environment is stochastic due to the presence of multiple agents. In our approach the problem is modelled as a multi-agent Markov Decision Process and then a learning automaton, which is a relatively simple method and requires less qualitative data, is employed to learn how to assign values to variables. The proposed method considers two very important issues namely time step dependency and uncertainty about future events upon which we allocate values to variables. Experimental results reveal that the employed method converges and satisfies the constraints of the optimization problems in comparison to the well-known methods.
... In fact, John Harsanyi's seminal work on incomplete information games [8] was one of the first significant contributions to this field. The incomplete information model has been used to study multi-agent systems in a variety of contexts, including network games [9], team decision making [10] and pursuit/evasion [11]. While this body of important work is relevant to our problem setting, our analysis departs from the incomplete information framework as, in our setting, agents do not possess probabilistic models of the system state and have only limited knowledge on the other agents' beliefs. ...
Preprint
Full-text available
In recent years, a significant research effort has been devoted to the design of distributed protocols for the control of multi-agent systems, as the scale and limited communication bandwidth characteristic of such systems render centralized control impossible. Given the strict operating conditions, it is unlikely that every agent in a multi-agent system will have local information that is consistent with the true system state. Yet, the majority of works in the literature assume that agents share perfect knowledge of their environment. This paper focuses on understanding the impact that inconsistencies in agents' local information can have on the performance of multi-agent systems. More specifically, we consider the design of multi-agent operations under a game theoretic lens where individual agents are assigned utilities that guide their local decision making. We provide a tractable procedure for designing utilities that optimize the efficiency of the resulting collective behavior (i.e., price of anarchy) for classes of set covering games where the extent of the information inconsistencies is known. In the setting where the extent of the informational inconsistencies is not known, we show -- perhaps surprisingly -- that underestimating the level of uncertainty leads to better price of anarchy than overestimating it.
... The objective of a learning automaton is to find the optimal action from the action-set so that the average penalty received from the environment is minimized. Learning automata have been found to be useful in systems where incomplete information about the environment exists [60]. Learning automata are also proved to perform well in complex, dynamic and random environments with a large amount of uncertainties. ...
Article
Full-text available
During the last decades, a host of efficient algorithms have been developed for solving the minimum spanning tree problem in deterministic graphs, where the weight associated with the graph edges is assumed to be fixed. Though it is clear that the edge weight varies with time in realistic applications and such an assumption is wrong, finding the minimum spanning tree of a stochastic graph has not received the attention it merits. This is due to the fact that the minimum spanning tree problem becomes incredibly hard to solve when the edge weight is assumed to be a random variable. This becomes more difficult if we assume that the probability distribution function of the edge weight is unknown. In this paper, we propose a learning automata-based heuristic algorithm to solve the minimum spanning tree problem in stochastic graphs wherein the probability distribution function of the edge weight is unknown. The proposed algorithm taking advantage of learning automata determines the edges that must be sampled at each stage. As the presented algorithm proceeds, the sampling process is concentrated on the edges that constitute the spanning tree with the minimum expected weight. The proposed learning automata-based sampling method decreases the number of samples that need to be taken from the graph by reducing the rate of unnecessary samples. Experimental results show the superiority of the proposed algorithm over the well-known existing methods both in terms of the number of samples and the running time of algorithm.
... In a bilevel game such as that introduced in [7], the upper level represents the choice of the game environment, whereas the lower level represents the choice of action. A possible consequence of the work presented in this paper is that the upper level players are in a position to recognize the game played at the lower level. ...
Article
Full-text available
Many complex systems, whether biological, sociological, or physical ones, can be represented using networks. In these networks, a node represents an entity, and an arc represents a relationship/constraint between two entities. In discrete dynamics, one can construct a series of networks with each network representing a time snapshot of interaction among the different components in the system. Understanding these networks is a key to understand the dynamics of real and artificial systems. Network motifs are small graphs-usually three to four nodes-representing local structures. They have been widely used in studying complex systems and in characterizing features on the system level by analyzing locally how the substructures are formed. Frequencies of different network motifs have been shown in the literature to vary from one network to another, and conclusions hypothesized that these variations are due to the evolution/dynamics of the system. In this paper, we show for the first time that in strategy games, each game (i.e., type of dynamism) has its own signature of motifs and that this signature is maintained during the evolution of the game. We reveal that deterministic strategy games have unique footprints (motifs' count) that can be used to recognize and classify the game's type and that these footprints are consistent along the evolutionary path of the game. The findings of this paper have significance for a wide range of fields in cybernetics.
... Cooperation, symbiosis [16], [27] as well as the efficiency [37], [40] of adaptive multi-agent systems has been studied in the context of the simple games. In [40] no verifiable definition of efficiency is given, whereas in [37] the system is considered to be in an efficient market phase when all information that can be used by the agents' strategies is traded away, and no agent can accumulate more points than an agent making random guesses would. ...
Article
Full-text available
We investigate knowledge exchange among commercial organisations, the rationale behind it and its effects on the market. Knowledge exchange is known to be beneficial for industry, but in order to explain it, authors have used high level concepts like network effects, reputation and trust. We attempt to formalise a plausible and elegant explanation of how and why companies adopt information exchange and why it benefits the market as a whole when this happens. This explanation is based on a multi-agent model that simulates a market of software providers. Even though the model does not include any high-level concepts, information exchange naturally emerges during simulations as a successful profitable behaviour. The conclusions reached by this agent-based analysis are twofold: (1) A straightforward set of assumptions is enough to give rise to exchange in a software market. (2) Knowledge exchange is shown to increase the efficiency of the market
... Autonomous, interacting agents have wide-spread applicability, for example, in distributed artificial intelligence [4] , knowledge acquisi- tion [5], and conflict detection [6]. Learning automata have been used to model distributed agents in environments with delays in communication and with decisions regarding group formation [7]–[9]. A variation of design patterns in robotics has been examined [10] ; these few examples give a flavor of the large literature on autonomous agents. ...
... A UML presentation of user interfaces very similar to the MVC pattern is presented in [34], along with a hierarchical building application resembling the chain of responsibility example in Section III. Finally, a study by this author [7] might have benefited with the use of the MVC and observable patterns. The expected-value of game actions is the " model " , which is observed by decision-making agents. ...
Article
A use case map (UCM) presents, in general, an abstract description of a complex system and, as such, is a good candidate for representing scenarios of autonomous agents interacting with other autonomous agents. The "gang of four" design patterns are intended for object-oriented software development but at least eight of the patterns illustrate structure, or architecture, that is appropriate for interacting agents, independent of software development. This study presents these particular patterns in the form of UCMs to describe abstract scenarios of agent interaction. Seven of the patterns attempt to balance the decentralized nature of interacting agents with an organized structure that makes for better, cleaner interactions. An example performance analysis is provided for one of the patterns, illustrating the benefit of an early abstraction of complex agent behavior. The original contribution here is a UCM presentation of the causal paths in agent behavior as suggested by software design patterns.
... By applying the periodical policy technique to a hierarchy of learning agents, we try to create a solution technique that solves both conflicting interest single-stage as well as multi-stage games. For now, our algorithm is designed for stochastic tree-structured multi-stage games as defined by E. Billard (Billard & Lakshmivarahan 1999; Zhou, Billard, & Lakshmivarahan 1999). He views multi-stage (or multilevel ) games as games where decision makers at the toplevel decide which game to play at the lower levels. ...
... However the MMDP formalization leaves out the games where the agents have different or competing interests. E. Billard introduced a multi-stage game where the agents have pure conflicting interests (Billard & Lakshmivarahan 1999; Zhou, Billard, & Lakshmivarahan 1999). Since the pay-offs of the agents aren't shared in conflicting games, for each joint-action, we have an element in the game matrix of the form (p h , p k ), where p h is the probability for Agent h to receive a positive reward of +1 and p k is the probability for Agent k. ...
Article
Full-text available
Coordination to some equilibrium point is an interesting problem in multi-agent reinforcement learning. In common interest single stage settings this problem has been studied profoundly and efficient solution techniques have been found. Also for particular multi-stage games some experiments show good results. However, for a large scale of problems the agents do not share a common pay-off function. Again, for single stage problems, a solution technique exists that finds a fair solution for all agents. In this paper we report on a technique that is based on learning automata theory and peri-odical policies. Letting pseudo-independent agents play peri-odical policies enables them to behave socially in pure con-flicting multi-stage games as defined by E. Billard (Billard & Lakshmivarahan 1999; Zhou, Billard, & Lakshmivarahan 1999). We experimented with this technique on games where simple learning automata have the tendency not to cooper-ate or to show oscillating behavior resulting in a suboptimal pay-off. Simulation results illustrate that our technique over-comes these problems and our agents find a fair solution for both agents.
... However, with sufficient communication , the agents adapt to a coordinated policy. In [22], LAs are used for playing stochastic games at multiple levels. ...
... However, with sufficient communication , the agents adapt to a coordinated policy. In [22], LAs are used for playing stochastic games at multiple levels. To conclude, we believe that the similarities between ACO and LAs mean that the theory of LA can serve as a good theoretical tool for analyzing ACO algorithms and learning in MAS in general. ...
... A NALYSIS of the collective behavior of agents distributed in a network has received considerable attention in the literature (refer to [1], [2], and the references therein). In [1] and [2], distributed decision makers are modeled as players in a two-level game. ...
... A NALYSIS of the collective behavior of agents distributed in a network has received considerable attention in the literature (refer to [1], [2], and the references therein). In [1] and [2], distributed decision makers are modeled as players in a two-level game. High-level decisions concern the game environment and determine the willingness of the players to form a coalition. ...
... There are several choices for the games A and B (zero-sum games or nonzero-sum games) and there are several choices for the learning algorithms used by the decision makers. In [1] and [2], the decision makers exclusively use the classical linear reward-penalty algorithm ( -). The model considered in these papers is a natural extension of the one-level games analyzed in [5]. ...
Article
Full-text available
Multilevel games are abstractions of situations where decision makers are distributed in a network environment. In Part I of this paper, the authors present several of the challenging problems that arise in the analysis of multilevel games. In this paper a specific set up is considered where the two games being played are zero-sum games and where the decision makers use the linear reward-inaction algorithm of stochastic learning automata. It is shown that the effective game matrix is decided by the willingness and the ability to cooperate and is a convex combination of two zero-sum game matrices. Analysis of the properties of this effective game matrix and the convergence of the decision process shows that players tend toward noncooperation in these specific environments. Simulation results illustrate this noncooperative behavior