Content uploaded by Jason Marden
Author content
All content in this area was uploaded by Jason Marden on Jan 02, 2014
Content may be subject to copyright.
Gürdal Arslan
Department of Electrical Engineering,
University of Hawaii,
Manoa, Honolulu, HI 96822
e-mail: gurdal@hawaii.edu
Jason R. Marden
e-mail: marden@ucla.edu
Jeff S. Shamma
e-mail: shamma@ucla.edu
Department of Mechanical and Aerospace
Engineering,
University of California, Los Angeles,
Los Angeles, CA 90095
Autonomous Vehicle-Target
Assignment: A Game-Theoretical
Formulation
We consider an autonomous vehicle-target assignment problem where a group of vehicles
are expected to optimally assign themselves to a set of targets. We introduce a game-
theoretical formulation of the problem in which the vehicles are viewed as self-interested
decision makers. Thus, we seek the optimization of a global utility function through
autonomous vehicles that are capable of making individually rational decisions to opti-
mize their own utility functions. The first important aspect of the problem is to choose the
utility functions of the vehicles in such a way that the objectives of the vehicles are
localized to each vehicle yet aligned with a global utility function. The second important
aspect of the problem is to equip the vehicles with an appropriate negotiation mechanism
by which each vehicle pursues the optimization of its own utility function. We present
several design procedures and accompanying caveats for vehicle utility design. We
present two new negotiation mechanisms, namely, “generalized regret monitoring with
fading memory and inertia” and “selective spatial adaptive play,” and provide accom-
panying proofs of their convergence. Finally, we present simulations that illustrate how
vehicle negotiations can consistently lead to near-optimal assignments provided that the
utilities of the vehicles are designed appropriately. 关DOI: 10.1115/1.2766722兴
1 Introduction
Designing autonomous vehicles with intelligent and coordi-
nated action capabilities to achieve an overall objective is a major
part of the recent theme of “cooperative control,” which has re-
ceived significant attention in recent years. Whereas much of the
work in this area focuses on “kinetic” coordination, e.g., multive-
hicle trajectory generation 共e.g., 关1兴, and references therein兲, the
focus here is on strategic coordination. In particular, we consider
an autonomous vehicle-target assignment problem 共illustrated in
Fig. 1兲, where a group of vehicles are expected to assign them-
selves to a set of targets to optimize a global utility function.
When viewed as a combinatorial optimization problem, the
vehicle-target assignment problem considered in this paper is a
generalization of the well-known weapon-target assignment prob-
lem 关2兴to the case where the global utility is a general function of
the vehicle-target assignments. In its full generality, the weapon-
target assignment problem is known to be nondeterministic-
polynomial-time-complete 关2兴, and the existing literature on the
weapon-target assignment problem is concentrated on heuristic
methods to quickly obtain near optimal assignments in relatively
large instances of the problem—very often with no guarantees on
the degree of suboptimality 共cf., 关3兴, and references therein兲.
Therefore, from an optimization viewpoint, the vehicle-target as-
signment problem considered in this paper is, in general, a hard
problem, even though optimal assignments can be obtained quite
efficiently in very special cases.
Our viewpoint in this paper deviates from that of direct optimi-
zation. Rather, we emphasize the design of vehicles that are indi-
vidually capable of making coordination decisions to optimize
their own utilities, which then indirectly translates to the optimi-
zation of a global utility function. The main potential benefit of
this approach is to enable autonomous vehicles that are individu-
ally capable of operating in uncertain and adversarial environ-
ments, with limited information, communication, and computa-
tion, to autonomously optimize a global utility. The optimization
methods available in the literature are not suitable for our pur-
poses because even a distributed implementation of such optimi-
zation algorithms need not induce “individually rational” behav-
ior, which is the key to realize the expected benefits of our
approach. Furthermore, an optimization approach would typically
require constant dissemination of global information throughout
the network of the vehicles as well as increased communication
and computation.
Accordingly, in this paper we formulate our autonomous
vehicle-target assignment problem as a multiplayer game 关4,5兴,
where each vehicle is interested in optimizing its own utility. We
use the notion of pure Nash equilibrium to represent the assign-
ments that are agreeable to the rational vehicles, i.e., the assign-
ments at which there is no incentive for any vehicle to unilaterally
deviate. We use algorithms for multiplayer learning in games as
negotiation mechanisms by which the vehicles seek to optimize
their utilities. The problem of optimizing a global utility function
by the autonomous vehicles then reduces to the proper design of
共i兲the vehicle utilities and 共ii兲the negotiation mechanisms.
Designing vehicle utilities is essential to obtaining desirable
collective behavior through self-interested vehicles 共cf., 关6兴兲.An
important consideration in designing the vehicle utilities is that
the vehicle utility functions should be “aligned” with the global
utility function in the sense that agreeable assignments 共i.e., Nash
equilibria兲should lead to high, ideally maximal, global utility.
There are multiple ways that such alignment can be achieved. An
obvious instance is to set the vehicle utilities equal the global
utility. This choice is not desirable in the case of a large number of
interaction vehicles, because another consideration in designing
the vehicle utilities is that the vehicle utilities should be “local-
ized,” i.e., a vehicle’s utility should depend only on the local
information available to the vehicle. For example, in a large
vehicle-target assignment problem, the vehicles may have range
restrictions and a vehicle may not even be aware of the targets
and/or the vehicles outside its range. In such a case, a vehicle
whose utility is set to the global utility would not have sufficient
information to compute its own utility. Therefore, a vehicle’s util-
ity should be localized to its range while maintaining the align-
Contributed by the Dynamic Systems, Measurement, and Control Division of
ASME for publication in the JOURNAL OF DYNAMIC SYSTEMS, MEASUREMENT, AND CON-
TROL. Manuscript received March 31, 2006; final manuscript received April 1, 2007.
Review conducted by Tal Shima.
584 / Vol. 129, SEPTEMBER 2007 Copyright © 2007 by ASME Transactions of the ASME
Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm
ment with the global utility. More generally, we will discuss the
properties of being aligned and localized for several utility design
procedures in Sec. 3.
Obtaining optimal assignments using the approach presented in
this paper also requires that the vehicles use a negotiation mecha-
nism that is convergent in the multiplayer game induced by the
vehicle utilities. We will show that when vehicle utilities are
aligned with the global utility, they always lead to a class of
games known as “ordinal potential games” 关7兴. The significance
of this connection is that certain multiplayer learning algorithms,
such as fictitious play 共FP兲关8兴, are known to converge in potential
games, and hence can be used as vehicle negotiation mechanisms.
However, FP has an intensive informational requirement. Spatial
adaptive play 共SAP兲关9兴is another such algorithm, which leads to
an optimizer of the potential function in potential games with
arbitrarily high probability. Although SAP reduces the information
requirement, there can be a high implementation cost when ve-
hicles have a large number of possible actions.
This paper goes beyond existing work in the area through the
introduction of new negotiating mechanisms that alleviate the in-
formational and implementation requirement, namely, “general-
ized regret monitoring with fading memory and inertia” and “se-
lective spatial adaptive play.” We establish new convergence
results for both algorithms and simulate their performance on an
illustrative weapon-target assignment problem.
The remainder of this paper is organized as follows. Section 2
sets up an autonomous vehicle-target assignment problem as a
multiplayer game. Section 3 discusses the issue of designing the
utility functions of the vehicles that are localized to each vehicle
yet aligned with a given global utility function. Section 4 reviews
selected learning algorithms available in the literature and pre-
sents two new algorithms, along with convergence results, that
offer some advantages over existing algorithms. Section 5 present
some simulation results to illustrate the possibility of obtaining
near optimal assignments through vehicle negotiations. Finally,
Section 6 contains some concluding remarks.
2 Game-Theoretical Formulation of an Autonomous
Vehicle-Target Assignment Problem
We begin by considering an optimal assignment problem where
nvvehicles are to be assigned to nttargets. Each entity, whether a
vehicle or a target, may have different characteristics. The ve-
hicles are labeled as V1,... ,Vnv, and the targets are labeled as
T0,T1,... ,Tnt, where a fictitious target T0represents the “null tar-
get” or “no target.” Let Vª兵V1,... ,Vnv其and T
ª兵T0,T1,... ,Tnt其. A vehicle can be assigned to any target in its
range, denoted by Ai傺Tfor vehicle Vi苸V. The null target al-
ways satisfies T0苸Ai. Let AªA1⫻¯⫻Anv. The assignment of
vehicle Viis denoted by ai苸Ai, and the collection of vehicle
assignments 共a1,... ,anv兲, called the assignment profile, is denoted
by a. Each assignment profile, a苸A, corresponds to a global
utility, Ug共a兲, that can be interpreted as the objective of a global
planner.
We view the vehicles as “autonomous” decision makers, and
accordingly, each vehicle, e.g., vehicle Vi苸V, is assumed to select
its own target assignment, ai苸Ai, to maximize its own utility
function, UVi共a兲. In general, vehicle utility functions may be dif-
ferent and each of them may depend on the whole assignment
profile a. Hence, the vehicles do not necessarily face an optimi-
zation problem, but rather, they face a 共finite兲multiplayer game.
In such a setting, the vehicles are to negotiate an assignment pro-
file that is mutually agreeable. The autonomous target assignment
problem is to design the utilities, UVi共a兲, as well as appropriate
negotiation procedures so that the vehicles can negotiate a mutu-
ally agreeable target assignment that yields maximal global utility,
Ug共a兲.
To be able to deal with the intricacies of our autonomous target
assignment problem, we adopt some concepts and methods from
the theory of games 关4,5兴. We start with the concept of equilib-
rium to characterize the target assignments that are agreeable to
the vehicles. A well-known equilibrium concept for multiplayer
games is the notion of Nash equilibrium. In the context of an
autonomous target assignment problem, a Nash equilibrium is an
assignment profile a*=共a1
*,... ,anv
*兲such that no vehicle could im-
prove its utility by unilaterally deviating from a*. Before introduc-
ing the notion of Nash equilibrium in more precise terms, we will
introduce some notation. Let a−idenote the collection of the target
assignments of the vehicles other than vehicle Vi, i.e.,
a−i=共a1, ... ,ai−1,ai+1, ... ,anv兲
and let
A−iªA1⫻... ⫻Ai−1 ⫻Ai+1 ⫻... ⫻Anv
With this notation, we will sometimes write an assignment profile
aas 共ai,a−i兲. Similarly, we may write UVi共a兲as UVi共ai,a−i兲. Using
the above notation, an assignment profile a*is called a pure Nash
equilibrium if, for all vehicles Vi苸V,
UVi共ai
*,a−i
*兲=max
ai苸Ai
UVi共ai,a−i
*兲共1兲
In this paper, we will represent the agreeable target assignment
profiles by the set of pure Nash equilibria even though in the
literature some non-Nash solution concepts for multiplayer games
are also available. We will introduce one such concept called ef-
ficiency for future reference. An assignment profile is called effi-
cient if there is no other assignment that yields higher utilities to
all vehicles. For given vehicle utilities, a Nash equilibrium assign-
ment may or may not be efficient. Our justification of a pure Nash
equilibrium as an agreeable assignment is based on the autono-
Fig. 1 Illustration of vehicle target assignment
Journal of Dynamic Systems, Measurement, and Control SEPTEMBER 2007, Vol. 129 / 585
Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm
mous and self-interested nature of the vehicles. Clearly, an effi-
cient pure Nash equilibrium should be more appealing to the ve-
hicles than an inefficient pure Nash equilibrium.
In general, a pure Nash equilibrium may not exist for an arbi-
trary set of vehicle utilities. However, as will be seen in Sec. 3,
any reasonable set of vehicle utilities tailored to the autonomous
vehicle-target problem would have at least one pure Nash equilib-
rium.
We conclude this section with the definition of potential games
and ordinal potential games 关7兴. These games form an important
class of games because of their relevance to autonomous vehicle-
target assignment as well as their desirable convergence properties
mentioned earlier.
DEFINITION 2.1 共关ORDINAL兴POTENTIAL GAMES兲. A potential
game consists of vehicle utilities,UVi共a兲,Vi苸V,and a potential
function,
共a兲:A哫R,such that,for every vehicle,Vi苸V,for
every a−i苸A−i,and for every ai
⬘,ai
⬙
苸Ai,
UVi共ai
⬘,a−i兲−UVi共ai
⬙,a−i兲=
共ai
⬘,a−i兲−
共ai
⬙,a−i兲
An ordinal potential game consists of vehicle utilities UVi共a兲,Vi
苸V,and a potential function
共a兲:A哫Rsuch that,for every
vehicle Vi苸V,for every a−i苸A−i,and for every ai
⬘,ai
⬙
苸Ai,
UVi共ai
⬘,a−i兲−UVi共ai
⬙,a−i兲⬎0⇔
共ai
⬘,a−i兲−
共ai
⬙,a−i兲⬎0
In a potential game, the difference in utility received by any one
vehicle for its two different target choices, when the assignments
of other vehicles are fixed, can be measured by a potential func-
tion that only depends on the assignment profile and not on the
label of any vehicle.
In an ordinal potential game, an improvement in utility received
by any one vehicle for its two different target choices, when the
assignments of other vehicles are fixed, always results in an im-
provement of a potential function that, again, only depends on the
assignment profile and not on the label of any vehicle. Clearly,
ordinal potential games form a broader class than potential games.
3 Utility Design
In this section, we discuss various important aspects of design-
ing the vehicle utilities to achieve a high global utility. We cite
关7,10兴as the key references for this section, since we freely use
some of the terminology and the ideas presented in them. To make
the discussion more concrete and relevant, we assume a certain
structure for the global utility, even though it is possible to present
the ideas at a more abstract level. We assume that all vehicles that
assign themselves to a particular target form a team and engage
their common target in a coordinated manner. An engagement
with target Tj苸Tgenerates some utility denoted by UTj共a兲;
UT0共a兲=0 for any a.
It is important to distinguish between a target utility, UTj共a兲,
and a vehicle utility, UVi共a兲. The realized target utility represents
the overall value for engaging target Tj, whereas a vehicle utility
partly reflects vehicle Vi’s share of that value. Furthermore, it may
be that vehicle Vishares this reward even if it did not engage
target Ti. This will depend on the final specification of vehicle
utilities.
We will assume that the utility generated by an engagement
with target Tjdepends only on the characteristics of target Tjand
the vehicles engaging target Tj. This is stated more precisely in the
following assumption.
ASSUMPTION 3.1. Let a and a
˜
be two action profiles in A,and
for any target,Tj苸T,define the sets
Sj=兵兩Vi苸V兩ai=Tj其and S
˜
j=兵兩Vi苸V兩a
˜
i=Tj其
Then,
Sj=S
˜
j⇒UTj共a兲=UTj共a
˜
兲
We now define the global utility to be the total sum of the
utilities generated by all engagements, i.e.,
Ug共a兲=兺
Tj苸T
UTj共a兲共2兲
This summation is only one approach to aggregate the target util-
ity functions. See 关11兴for a more general discussion from the
perspective of multiobjective optimization.
It will be convenient to model an engagement with a target as a
random event that is assumed to be independent of the other target
engagements. At the end of an engagement, the target and some of
the engaging vehicles are destroyed with certain probability. The
statistics of the outcome of an engagement depend on the charac-
teristics of the target as well as the composition of the engaging
vehicles. As an example, it may be the case that only a particular
team of vehicles may destroy a particular target with reasonable
probability. In this case, the utility generated by an engagement is
taken to be the expected difference between the value of a de-
stroyed target and the total value of the destroyed vehicles. These
issues are discussed further for the well-known weapon-target as-
signment problem in Sec. 5.
An important consideration in specifying the vehicle utilities,
UVi共a兲,i=1,...,nv, is to make them “aligned” with the global
utility, Ug共a兲. Ideally, this means that the vehicles can only agree
on an optimal assignment profile, i.e., an assignment profile that
maximizes the global utility. Because it is not always straightfor-
ward to achieve the alignment of the vehicle utilities with the
global utility in this ideal sense 共without first calculating an opti-
mal assignment兲, we adopt a more relaxed notion of alignment
from 关10兴. That is, a vehicle can improve its own utility by uni-
lateral action if and only if the same unilateral action also im-
proves the global utility.
DEFINITION 3.1 共ALIGNMENT兲.We will say that a set of vehicle
utilities UVi共a兲,Vi苸V,is aligned1with the global utility Ug共a兲
when the following condition is satisfied.For every vehicle,Vi
苸V,for every a−i苸A−i,and for every ai
⬘,ai
⬙
苸Ai,
UVi共ai
⬘,a−i兲−UVi共ai
⬙,a−i兲⬎0⇔Ug共ai
⬘,a−i兲−Ug共ai
⬙,a−i兲⬎0
共3兲
We see that the notion of alignment coincides with the notion of
ordinal potential games in Definition 2.1.
It turns out that alignment does not rule out pure Nash equilib-
ria that may be suboptimal from the global utility perspective.
Moreover, such suboptimal pure Nash equilibria may even yield
the highest utilities to all vehicles and hence may be efficient.
Nevertheless, alignment also guarantees that the optimal assign-
ment profiles are always included in the set of pure Nash equilib-
ria; hence, they are agreeable to the vehicles even though they
may be inefficient.
The above discussion on alignment is summarized by the fol-
lowing proposition, whose proof is straightforward.
PROPOSITION 3.1. Let aopt denote an optimal assignment profile,
i.e.,
aopt 苸arg max
a苸A
Ug共a兲
Under the alignment condition 共3兲,the resulting game is an ordi-
nal potential game that has aopt as a 共possibly nonunique兲pure
Nash equilibrium.
3.1 Identical Interest Utility (IIU). One obvious, but ulti-
mately ineffective, way of making the vehicle utilities aligned
with the global utility is to set all vehicle utilities to the global
utility. In game-theory terminology, setting
UVi共a兲=Ug共a兲, for all vehicles Vi苸V共4兲
1The notion of alignment we adopt here is called factoredness in 关10兴.
586 / Vol. 129, SEPTEMBER 2007 Transactions of the ASME
Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm
results in an identical interest game. Obviously, an identical inter-
est game with UVi共a兲=Ug共a兲, for all vehicles Vi苸V, is also a
potential game with the potential Ug共a兲, and hence, the vehicle
utilities 共4兲are aligned with the global utility. In fact, optimal
assignments in this case yield the highest vehicle utilities and
therefore are efficient. However, suboptimal Nash equilibria may
still exist.
As will be seen later, the vehicles negotiate by proposing tar-
gets and responding to the previous target assignment proposals
that are exchanged among the vehicles. Each vehicle whose utility
is set to the global utility needs to know 共i兲the proposals made by
all other vehicles as well as 共ii兲the characteristics of all the ve-
hicles and the targets to be able to generate a new proposal. The
reason for this is that vehicle Vi’s utility would depend on all
engagements with all targets, including those that are not in Ai.
Therefore, when the vehicle utilities are set to the global utility,
continuous dissemination of global information is required among
the vehicles.
3.2 Range-Restricted Utility (RRU). A possible way of
making the vehicle utilities more localized than IIU would be to
set the utility of vehicle Viequal to the sum of the utilities gen-
erated by the engagements with the targets that belong to vehicle
Vi’s target set Ai, i.e.,
UVi共a兲=兺
Tj苸Ai
UTj共a兲, for all vehicles Vi苸V共5兲
Note that in this case the global information requirement on the
vehicles is alleviated. Moreover, the vehicle utilities 共5兲are still
aligned with the global utility. This guarantees that the optimal
assignments are agreeable to the vehicles, but they may be ineffi-
cient; see Example 3.3. In fact, the vehicle utilities lead to a po-
tential game; see 关7兴. The following proposition is an immediate
consequence of Assumption 3.1.
PROPOSITION 3.2. Vehicle utilities that satisfy 共5兲form a poten-
tial game with the global utility Ug共a兲serving as a potential
function.
Note that when all vehicles have the same set of available tar-
gets, i.e., A1=¯=Anv, then 共5兲leads to an identical interest
game.
A concern regarding vehicle utilities 共4兲共and possibly 共5兲兲
stems from the so-called learnability issue introduced in 关10兴. That
is, a vehicle may not be able to influence its own utility in a
significant way when a large number of vehicles can assign them-
selves to the same large set of targets. In this case, since the utility
of a vehicle is the total sum of the utilities generated by a large
number of engagements involving a large number of targets and
vehicles, the proposals made by an individual vehicle may not
have any significant effect on its own utility. Hence, a negotiating
vehicle may find itself approximately indifferent to the available
target choices if the negotiation mechanism employed is utility
based, i.e., the vehicle proposes targets in response to the actual
utilities corresponding to its past proposals, as in reinforcement
learning.
3.3 Equally Shared Utility (ESU). One way to limit the in-
fluence of other vehicles on vehicle Vi’s utility is to set
UVi共a兲=
UTj共a兲
nTj共a兲,ifai=Tj共6兲
where nTj共a兲is the total number of vehicles engaging target Tj.
The rationale behind 共6兲is to distribute the utility generated by an
engagement equally among the engaging vehicles. Note that in
this case vehicle Vi’s utility is independent of the engagements to
which vehicle Vidoes not participate.
Even though the total sum of vehicle utilities 共6兲equals the
global utility, it turns out that 共6兲need not be exactly aligned with
the global utility.
Example 3.1. Consider two targets T1and T2with values 2 and
10, respectively, and two anonymous vehicles V1and V2, i.e., V1
and V2have identical characteristics. Assume that each vehicle is
individually capable of destroying any one of the targets with
probability 1, while the targets in no case have any chance of
destroying any of the vehicles. The vehicle utilities in this ex-
ample can be represented in the matrix form, shown in Fig. 2,
where if vehicle Vi苸兵V1,V2其chooses target ai苸兵T0,T1,T2其then
the first number 共respectively the second number兲in the entry
共a1,a2兲represents the utility to the first vehicle 共respectively to
the second vehicle兲. The global planner would of course prefer
each vehicle to engage a different target, since this would yield a
maximal global utility 12. However, such an optimal assignment
profile might leave the vehicle engaging the low-value target un-
satisfied with a utility 2, and this unsatisfied vehicle might be able
to improve its utility to 5 by unilaterally switching to the high-
value target at the expense of lowering the global utility to 10.
Because of the misalignment of 共6兲with the global utility in this
example, an optimal assignment profile may not be agreeable by
all vehicles, whereas the vehicles may find the suboptimal Nash
equilibrium assignment 共a1,a2兲=共T2,T2兲agreeable.
However, in the case of anonymous vehicles, 共6兲does lead to a
potential game.
DEFINITION 3.2 共ANONYMITY兲.Vehicles are anonymous if for
any permutation
:兵1,2, . .. ,nv其→兵1,2, . .. ,nv其
and for any two assignments,a and a
˜
,related by
a
˜
i=a
共i兲,∀i苸兵1,2, . .. ,nv其
the equality
UTj共a兲=UTj共a
˜
兲
holds for any target Tj.
As the terminology implies, the utility generated by an engage-
ment with a target does not depend on the identities of the ve-
hicles engaging the target, but only the number of vehicles engag-
ing the target.
PROPOSITION 3.3. Anonymous vehicles with utilities that satisfy
共6兲form a potential game with potential function
共a兲=兺
Tj苸T兺
ᐉ=1
nTj共a兲UTj共ᐉ兲
ᐉ
where nTj共a兲is the total number of vehicles assigned to target Tj
and UTj共ᐉ兲is the utility generated by an engagement of ᐉanony-
mous vehicles with target Tj.
Hence, in the case of anonymous vehicles, 共6兲is aligned with
the above potential function, which is the same potential function
introduced in 关12兴in the context of so-called congestion games,
but different from the global utility function Ug共a兲. The signifi-
cance of this observation is that the existence of a potential func-
tion associated with the vehicle utilities guarantees the existence
of agreeable 共possibly suboptimal兲assignment profiles in the form
Fig. 2 Misaligned vehicle utilities
Journal of Dynamic Systems, Measurement, and Control SEPTEMBER 2007, Vol. 129 / 587
Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm
of pure Nash equilibria. Furthermore, there exist learning algo-
rithms that are known to converge in potential games and these
convergent learning algorithms can be used by the vehicles as
negotiation mechanisms always leading to a settlement on an as-
signment profile. If the vehicles are not anonymous, then the mis-
alignment of the vehicle utilities 共6兲with the global utility can be
even more severe.
Example 3.2. Consider two targets T1and T2with values 10
each, and two distinguishable vehicles, V1and V2, with values 2
each. Assume that vehicle V1is individually capable of destroying
any one of the targets with probability one, and not one of the
targets is ever capable of destroying V1. Assume further that ve-
hicle V2is never capable of destroying any of the targets, and any
one of the targets can destroy vehicle V2with probability one.
This setup leads to the vehicle utilities shown in Fig. 3. In this
example, the two vehicles may not be able to agree on any assign-
ment profile, optimal or suboptimal, because while vehicle V1
would be better off by engaging a target alone, vehicle V2would
be better off by engaging a target together with vehicle V1. Yet,
the global planner would prefer vehicle V1engaging one of the
targets and vehicle V2not engaging any target. If these two ve-
hicles were to use a negotiation mechanism that allows settlement
only on a pure Nash equilibrium, then they would not be able to
agree on any assignment because a pure Nash equilibrium does
not exist in this example. A mixed, but not pure, Nash equilibrium
is still guaranteed to exist, but would not lead to an agreement on
a particular assignment. Therefore, in the distinguishable vehicles
case, the vehicle utilities 共6兲might lead to a situation where the
vehicles are not only in conflict with the global planner but also in
conflict among themselves.
3.4 Wonderful Life Utility (WLU). A solution to the prob-
lem of designing individual utility functions that are more learn-
able than 共4兲or 共5兲and still aligned with the global utility is
offered in 关10兴in the form of a family of utility structures called
the wonderful life utility. In our context, a particular WLU struc-
ture would be obtained by setting the utility of a vehicle to the
marginal contribution made by the vehicle to the global utility,
i.e.,
UVi共ai,a−i兲=Ug共ai,a−i兲−Ug共T0,a−i兲, for all vehicles Vi苸V
共7兲
From the definition of the global utility 共2兲, the WLU 共7兲can be
written as
UVi共ai,a−i兲=UTj共ai,a−i兲−UTj共T0,a−i兲,ifai=Tj
for all vehicles Vi苸V, which means that the utility of a vehicle is
its marginal contribution to the utility generated by the engage-
ment that the vehicle participates. WLU is expected to make each
vehicle’s utility more learnable by removing the unnecessary de-
pendencies on other vehicles’ assignment decisions, while still
keeping the vehicle utilities aligned with the global utility. It turns
out that WLU 共7兲also leads to a potential game with the global
utility being the potential function.
PROPOSITION 3.4. Vehicle utilities that satisfy 共7兲form a poten-
tial game with the global utility Ug共a兲serving as a potential
function.
Another interpretation of the WLU is that a vehicle is rewarded
with a side payment equal to the externality it may create by not
assigning itself to any target, which is the idea behind “internal-
izing the externalities” in economics 关13兴.
3.5 Comparisons. Each of the vehicle utilities IIU 共4兲, RRU
共5兲, and WLU 共7兲lead to a potential game with the globally utility
function being the potential function, and hence, they are aligned
with the global utility. This guarantees that the optimal assign-
ments are in each case included in the set of pure Nash equilibria.
However, in each case, there may also be suboptimal Nash equi-
libria that may be pure and/or mixed. There is ample evidence in
the literature that a mixed equilibrium cannot emerge as a stable
outcome of vehicle negotiations, particularly in potential games
共e.g., 关14兴兲. However, a suboptimal pure Nash equilibrium can
emerge as a stable outcome, depending on the negotiation mecha-
nism used by the vehicles.
Example 3.3. Consider N⬎2 vehicles, V1,...,VN, and N+1
targets, T1,... ,TN+1, where Ai=兵Ti,TN+1其. Assume that any ve-
hicle Viengaging target Tigenerates 1 unit of utility. Assume also
that an engagement with target TN+1 generates 0 utility unless all
vehicles engage TN+1 in which case they generate 2 units of utility.
Clearly, the optimal assignment is given by a*=共T1,T2,...,TN兲.
The optimal assignment profile a*is a pure Nash equilibrium when
the vehicle utilities are given by any of 共4兲and 共5兲,or共7兲. How-
ever, there is another pure Nash equilibrium a**
=共TN+1 ,TN+1 ,... ,TN+1兲for any of vehicle utilities 共4兲and 共5兲,or
共7兲which is suboptimal with respect to the global utility. The
global utility and the vehicle utilities corresponding to a*and a**
are summarized as follows:
Ug共a*兲=NU
g共a**兲=2
UVi共a*兲=NU
Vi共a**兲= 2 if vehicles utilities are given by 共4兲
UVi共a*兲=1 UVi共a**兲= 2 if vehicles utilities are given by 共5兲or 共7兲
Note that the optimality gap N−2 between a*and a** can be
arbitrarily large for large N. Note also that if the vehicle utilities
are given by RRU 共5兲or WLU 共7兲the suboptimal Nash equilib-
rium a** yields higher utilities to all vehicles than the optimal
Nash equilibrium a*.
In the case of RRU or WLU, if the negotiation mechanism
employed by the vehicles were to eliminate the inefficient assign-
ment profiles, the vehicles would never be able to agree on the
optimal assignment a*. This example illustrates the fact that the
vehicle utilities cannot be designed independently of the negotia-
tion mechanism employed by the vehicles.
4 Negotiation Mechanisms
The issue of which Nash equilibrium will emerge as a stable
outcome of vehicle negotiations is studied under the topic of equi-
librium selection in game theory. In this section, we will discuss
equilibrium selection and other important properties of some ne-
gotiation mechanisms. In particular, we will present a negotiation
mechanism from the literature that leads to an optimal Nash equi-
librium in potential games with arbitrarily high probability.
We will adopt various learning algorithms available in the lit-
erature for multiplayer games as vehicle negotiation mechanisms
to make use of the theoretical and computational tools provided
by game theory. The negotiation mechanisms that will be pre-
sented in this section will provide the vehicles with strategic
decision-making capabilities. In particular, each vehicle will ne-
gotiate with other vehicles without any knowledge about the utili-
ties of the other vehicles. One of the reasons for such a require-
ment is that the vehicles may not have the same information
regarding their environment. For example, a vehicle may not
know all the targets and/or the potential collaborating vehicles
Fig. 3 Misaligned vehicle utilities with no pure Nash
equilibrium
588 / Vol. 129, SEPTEMBER 2007 Transactions of the ASME
Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm
available to another vehicle and, moreover, it may not be possible
to pass on such information due to limited communication band-
width. Another reason for the private utilities requirement is to
make the vehicles truly autonomous in the sense that each vehicle
is individually capable of making robust strategic decisions in
uncertain and adversarial environments. In this case, any indi-
vidual vehicle is cooperative with the other vehicles only to the
extent that cooperation helps the vehicle to maximize its own
utility, which is, of course, carefully designed by the global plan-
ner.
Accordingly, we will consider some negotiation mechanisms
that require each vehicle to know, at most, its own utility function,
the proposals made by the vehicle itself, and the proposals made
by those other vehicles that can influence the utility of the vehicle.
We will review these negotiation mechanisms in terms of conver-
gence, equilibrium selection, and computational efficiency. We
will present our review primarily in the context of potential
games, since many of the vehicle utility structures considered in
Sec. 3 fall into this category. In some cases, we will point to
existing results in the literature, while in some other cases we will
point to open problems.
4.1 Review: Selected Recursive Averaging Algorithms
4.1.1 Action-Based Fictitious Play. Action-based fictitious
play, or simply FP, was originally introduced as a computational
method to calculate the Nash equilibria in zero-sum games 关15兴,
but later proposed as a learning mechanism in multi-player games
共cf., 关8兴兲.
One can also think of FP as a negotiation mechanism employed
by the vehicles to select their targets. At each negotiation step, k
=1,2,..., vehicles simultaneously propose targets
a共k兲ª关a1共k兲, ... ,anv共k兲兴
where ai共k兲苸Aiis the label of the target proposed by vehicle Vi.
The objective is to construct a negotiation mechanism so that the
proposed assignments, a共k兲, ultimately converge for large k.FPis
one such mechanism that is guaranteed to converge for potential
games.
In FP, the target assignment proposals at stage kare functions of
past proposed assignments over the interval 关1,k−1兴as follows.
First, enumerate the targets available to vehicle Vias Ai
=兵Ai
1,... ,Ai
兩Ai兩其. For any target index j苸关1,兩Ai兩兴, let nj共k;Vi兲de-
note the total number of times vehicle Viproposed target Ai
jup to
stage k. Now define the empirical frequency vector, qi共k兲苸R兩Ai兩,
of vehicle Vias follows:
qi共k兲=
冉
n1共k−1;Vi兲
k−1 ,n2共k−1;Vi兲
k−1 , ... ,
n兩Ai兩共k−1;Vi兲
k−1
冊
In words, qi共k兲reflects the histogram of proposed target assign-
ments by vehicle Viover the interval 关1,k−1兴. Note that the ele-
ments of the empirical frequency vector are all positive and sum
to unity. Therefore, qi共k兲can be identified with a probability vec-
tor on the probability simplex of dimension 兩Ai兩.
We are now set to define the FP process. At stage k, vehicle Vi
selects its proposed assignment ai共k兲苸Aiin accordance with
maximizing its expected utility as though all other vehicles make
a simultaneous and independent random selection of their actions,
a−i, based on the product distribution defined by empirical fre-
quencies, q1共k兲,... ,qi−1共k兲,qi+1共k兲,...,qnv共k兲, i.e.,
ai共k兲苸arg max
␣
苸Ai
Ea−i关UVi共
␣
,a−i兲兴
In case the maximizer is not unique, then any maximizer will do.
One appealing property of FP is that the empirical frequencies
generated by FP converge to the set of Nash equilibria in potential
games 关7,16兴. Although the empirical frequencies may converge
to a mixed Nash equilibrium while the proposals are cycling 共see
the related churning issue in 关17兴兲, it is generally believed that
convergence of empirical frequencies to a mixed 共but not pure兲
Nash equilibrium happens rarely when vehicle utilities are not
equivalent to a zero sum game 关18,19兴. Thus, if the vehicles ne-
gotiate using FP and their utilities constitute a potential game,
then in most cases we can expect them to asymptotically reach an
agreement on an assignment profile. We should also mention nu-
merous stochastic versions of FP with similar convergence prop-
erties 关20兴.
The main disadvantage of FP for the purposes of this paper is
its computational burden on each vehicle. The most computation-
ally intensive operation is the optimization of the utilities during
the negotiations, which effectively requires an enumeration of all
possible combined assignments by other vehicles 关21,22兴. This
makes FP computationally prohibitive when there are large num-
bers of vehicles with large target sets. To make FP truly scalable,
it is clear that the vehicles need to evaluate their utilities more
directly without using the empirical frequencies.
4.1.2 Utility-Based FP. The distinction between action-based
and utility-based FP, see 关23,24兴, is that the vehicles predict their
utilities during the negotiations based on the actual utilities corre-
sponding to the previous proposals. Utility Based FP is in essence
a multi-agent Reinforcement Learning algorithm 关25,26兴. The dif-
ference is that in reinforcement learning, the utility evaluation is
based on experience, whereas in utility based FP, it is based on a
call to a simulated utility function evaluator.
The main advantage of utility-based FP is its very low compu-
tational burden on each vehicle. In particular, the vehicles do not
need to compute the empirical frequencies of the past proposals
made by any vehicle and do not need to compute their expected
utilities based on the empirical frequencies. It only requires an
individual vehicle to process a 共state兲vector whose dimension is
its number of targets and to select a 共randomized兲maximizer. This
significantly alleviates the computational bottleneck of FP. How-
ever, the convergence of utility-based FP for potential games is
still an open issue.
There are also other utility-based learning algorithms that are
proven to converge in partnership games 关27–29兴. These algo-
rithms are similar to multiagent reinforcement learning algorithms
and have comparable computational burden to that of utility based
FP. However, convergence requires fine tuning of various param-
eters, such as the learning rates of each agent. Moreover, utility-
based learning algorithms are prone to the issue learnability and
may exhibit a slower convergence than action-based FP.
4.1.3 Regret Matching. The discussion on FP in Sec. 4.1.2
motivates a learning algorithm that is computationally feasible as
well as convergent in potential games, both theoretically and prac-
tically. Accordingly, we introduce regret matching, from 关30兴,
whose main distinction is that the vehicles propose targets based
on their regret for not proposing particular targets in the past
negotiation steps.
As before, let us enumerate the targets available to vehicle Vias
Ai=兵Ai
1,... ,Ai
兩Ai兩其. Vehicle Viselects its proposed target, ai共k兲,
according to a probability distribution, pi共k兲苸⌬共兩Ai兩兲, that will
be specified shortly. The ᐉth component, pi
ᐉ共k兲,ofpi共k兲is the
probability that vehicle Viselects the ᐉth target in Aiat the nego-
tiation step k, i.e., pi
ᐉ共k兲=Prob兵ai共k兲=Ai
ᐉ其. Vehicle Vidoes not
know the utility UVi关a共k兲兴 before proposing its own target ai共k兲.
Accordingly, before selecting ai共k兲,k⬎1, vehicle Vicomputes its
average regret
RVi
ᐉ共k兲ª1
k−1兺
m=1
k−1
兵
UVi„Ai
ᐉ,a−i共m兲…−UVi„a共m兲…
其
for not proposing Ai
ᐉin all past negotiation steps, assuming that
the proposed targets of all other vehicles remain unaltered.
Clearly, vehicle Vican compute RVi
ᐉ共k兲using the recursion
Journal of Dynamic Systems, Measurement, and Control SEPTEMBER 2007, Vol. 129 / 589
Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm
RVi
ᐉ共k+1兲=k−1
kRVi
ᐉ共k兲+1
k
兵
UVi„Ai
ᐉ,a−i共k兲…−UVi„a共k兲…
其
,k⬎1
We note that, at any step k⬎1, vehicle Viupdates all entries in
its average regret vector RVi共k兲ª关RVi
1共k兲,... ,RVi
兩Ai兩共k兲兴T, whose di-
mension is 兩Ai兩. In particular, the vehicles do not need to compute
the empirical frequencies of the past proposals made by any ve-
hicle and do not need to compute their expected utilities based on
the empirical frequencies. We also note that it is sufficient for
vehicle Vi, at step k⬎1, to have access to ai共k−1兲and
UVi(Ai
ᐉ,a−i共k−1兲)for all ᐉ苸兵1,... ,兩Ai兩其. In other words, it is
sufficient for vehicle Vi’s to have access to its proposal at step k
−1 and its actual utility UVi(a共k−1兲)received at step k−1 as well
as its hypothetical utilities UVi(Ai
ᐉ,a−i共k−1兲), which would have
been received if it had proposed target Ai
ᐉ关instead of ai共k−1兲兴 and
all other vehicle proposals a−i共k−1兲had remained unchanged at
step k−1.
Once vehicle Vicomputes its average regret vector, RVi共k兲,it
proposes a target ai共k兲,k⬎1, according to the probability distri-
bution
pi共k兲=
关RVi共k兲兴+
1T关RVi共k兲兴+
provided that the denominator above is positive; otherwise, pi共k兲
is the uniform distribution over Ai关pi共1兲苸⌬共兩Ai兩兲 is always ar-
bitrary兴. Roughly speaking, a vehicle using regret matching pro-
poses a particular target at any step with probability proportional
to the average regret for not playing that particular target in the
past negotiation steps. It turns out that the average regret of a
vehicle using regret matching would asymptotically vanish 共simi-
lar results hold for different regret based adaptive dynamics兲; see
关30–32兴. Although this result characterizes the long-term behavior
of regret matching in general games, it need not imply that the
negotiations of vehicles using regret matching will converge to a
pure equilibrium assignment profile when vehicle utilities consti-
tute a potential game, an objective which we will pursue in Sec.
4.2.
4.2 Generalized Regret Monitoring With Fading Memory
and Inertia. To enable convergence to a pure equilibrium in po-
tential games, we will modify regret matching in two ways. First,
we will assume that each vehicle has a fading memory; that is,
each vehicle exponentially discounts the influence of its past re-
gret in the computation of its average regret vector. More pre-
cisely, each vehicle computes a discounted average regret vector
according to the recursion
R
˜
Vi
ᐉ共k+1兲=共1−
兲R
˜
Vi
ᐉ共k兲+
兵UVi„Ai
ᐉ,a−i共k兲…
−UVi„a共k兲…其, for all ᐉ苸兵1, ...,兩Ai兩其
where
苸共0,1兴is a parameter with 1−
being the discount fac-
tor, and R
˜
Vi
ᐉ共1兲=0.
Second, we will assume that each vehicle proposes a target
based on its discounted average regret using some inertia. There-
fore, each vehicle Viproposes a target ai共k兲, at step k⬎1, accord-
ing to the probability distribution
␣
i共k兲RMi„R
˜
Vi共k兲…+关1−
␣
i共k兲兴vai共k−1兲
where
␣
i共k兲is a parameter representing vehicle Vi’s willingness to
optimize at time k,vai共k−1兲is the vertex of ⌬共兩Ai兩兲 corresponding
to the target ai共k−1兲proposed by vehicle Viat step k−1, and
RMi:R兩Ai兩→⌬共兩Ai兩兲 is any continuous function satisfying
xᐉ⬎0⇔RMi
ᐉ共x兲⬎0 and 1T关x兴+=0⇒RMi共x兲=1
兩Ai兩1
共8兲
where xᐉand RMi
ᐉ共x兲are the ᐉth components of xand RMi共x兲,
respectively.
We will call the above dynamics generalized regret monitoring
共RM兲with fading memory and inertia. The reason behind the term
“monitoring” is that the algorithm leaves as unspecified how an
agent reacts to regrets through the function RMi共·兲. One particular
choice for the function RMiis
RMi共x兲=关x兴+
1T关x兴+共when 1T关x兴+⬎0兲
which leads to regret matching with fading memory and inertia.
Another particular choice is
RMi
ᐉ共x兲=e1
xᐉ
兺xm⬎0e1
xmI兵xᐉ⬎0其共when 1T关x兴+⬎0兲共9兲
where
⬎0 is a parameter. Note that, for small values of
, ve-
hicle Viwould choose, with high probability, the target corre-
sponding to the maximum regret. This choice leads to a stochastic
variant of an algorithm called joint strategy fictitious play 共with
fading memory and inertia兲; see 关22兴. Also, note that, for large
values of
,Viwould choose any target having positive regret with
equal probability.
According to these rules, vehicle Viwill stay with its previous
proposal ai共k−1兲with probability 1−
␣
i共k兲regardless of its regret.
We make the following standing assumption on the vehicles’ will-
ingness to optimize.
ASSUMPTION 4.1. There exist constants and
¯
such that
0⬍⬍
␣
i共k兲⬍
¯
⬍1
for all time k⬎1and for all i苸兵1, ...,nv其.
This assumption implies that vehicles are always willing to op-
timize with some nonzero inertia.2The following theorem estab-
lishes the convergence of generalized regret monitoring with fad-
ing memory and inertia to a pure equilibrium.
THEOREM 4.1. Assume that vehicle utilities constitute an ordinal
potential game3and no vehicle is indifferent between distinct
strategies,i.e.,
UVi共ai
1,a−i兲⫽UVi共ai
2,a−i兲,∀ai
1,ai
2苸Ai,ai
1⫽ai
2,∀a−i
苸A−i,∀i苸兵1, ... ,nv其
Then,the target proposals a共t兲generated by generalized regret
monitoring with fading memory and inertia satisfying Assumption
4.1 converge to a pure Nash equilibrium almost surely.
Proof. We will state and prove a series of claims. The first claim
states that if a vehicle proposes a target with positive 共discounted
average兲regret, then all subsequent target proposals will also have
positive regret.
CLAIM 4.1. Fix any k0⬎1. Then,R
˜
Vi
ai共k0兲共k0兲⬎0⇒R
˜
Vi
ai共k兲共k兲⬎0
for all k⬎k0.
Proof. Suppose R
˜
Vi
ai共k0兲共k0兲⬎0. If ai共k0+1兲=ai共k0兲, then
R
˜
Vi
ai共k0+1兲共k0+1兲=共1−
兲R
˜
Vi
ai共k0兲共k0兲⬎0
If ai共k0+1兲⫽ai共k0兲, then
R
˜
Vi
ai共k0+1兲共k0+1兲⬎0
The argument can be repeated to show that R
˜
Vi
ai共k兲共k兲⬎0, for all
k⬎k0.䊐
Define
2This assumption can be relaxed to holding for sufficiently large k, as opposed to
all k.
3This theorem also holds in the more general weakly acyclic games, see 关33兴.
590 / Vol. 129, SEPTEMBER 2007 Transactions of the ASME
Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm
Muªmax兵UVi共a兲:a苸A,Vi苸V其
muªmin兵UVi共a兲:a苸A,Vi苸V其
␦
ªmin兵兩UVi共a1兲−UVi共a2兲兩:a1,a2苸A,a−i
1=a−i
2,兩UVi共a1兲
−UVi共a2兲兩 ⬎0,Vi苸V其
Nªmin
再
n苸兵1,2, . .. 其:共1−共1−
兲n兲
␦
−共1−
兲n共Mu−mu兲
⬎
␦
2
冎
fªmin
再
RMi
m共x兲:兩xᐉ兩ⱕMu−mu,∀ᐉ,xm
ⱖ
␦
2, for one m,∀Vi苸V
冎
Note that
␦
,f⬎0, and 兩R
˜
Vi
ai共k兲兩ⱕMu−mu, for all Vi苸V,ai苸Ai,
k⬎1.
The second claim states that if the current proposal is a strict
Nash equilibrium and if the proposal is repeated a sufficient num-
ber of times, then all subsequent proposals will also be that Nash
equilibrium.
CLAIM 4.2. Fix k0⬎1. Assume
1. a共k0兲is a strict Nash equilibrium,and
2. R
˜
Vi
ai共k0兲共k0兲⬎0for all Vi苸V,and
3. a共k0兲=a共k0+1兲=¯=a共k0+N−1兲.
Then,a共k兲=a共k0兲,for all kⱖk0.
Proof. For any Vi苸Vand any ai苸Ai, we have
R
˜
Vi
ai共k0+N兲=共1−
兲NR
˜
Vi
ai共k0兲+关1−共1−
兲N兴兵UVi„ai,a−i共k0兲…
−UVi„ai共k0兲,a−i共k0兲…其
Since a共k0兲is a strict Nash equilibrium, for any Vi苸Vand any
ai苸Ai,ai⫽ai共k兲, we have
UVi„ai,a−i共k0兲…−UVi„ai共k0兲,a−i共k0兲…ⱕ−
␦
Therefore,
R
˜
Vi
ai共k0+N兲ⱕ共1−
兲N共Mu−mu兲−关1−共1−
兲N兴
␦
⬍−
␦
2⬍0
We also know that, for all Vi苸V,
R
˜
Vi
ai共k0兲共k0+N兲=共1−
兲NR
˜
Vi
ai共k0兲共k0兲⬎0
This proves the claim. 䊐
The third claim states that if the current proposal is not a Nash
equilibrium and if the proposal is repeated a sufficient number of
times, then the subsequent assignment proposal will have a higher
global utility with at least a fixed probability.
CLAIM 4.3. Fix k0⬎1. Assume
1. a共k0兲is not a Nash equilibrium,and
2. a共k0兲=a共k0+1兲=¯=a共k0+N−1兲
Let a*=(ai
*,a−i共k0兲)be such that
UVi„ai
*,a−i共k0兲…⬎UVi„ai共k0兲,a−i共k0兲…
for some Vi苸Vand some ai
*苸Ai.Then,R
˜
Vi
ai
*共k0+N兲⬎
␦
/2, and a*
will be proposed at step k0+N with at least probability
␥
ª共1
−
⑀
¯
兲nv−1
⑀
f.
Proof. We have
R
˜
Vi
ai
*共k0+N兲ⱖ−共1−
兲N共Mu−mu兲+关1−共1−
兲N兴
␦
⬎
␦
2
Therefore, the probability of vehicle Viproposing ai
*at step k0
+Nis at least
⑀
f. Because of players’ inertia, the probability that
all vehicles will propose action a*at step k0+Nis at least 共1
−
⑀
¯
兲nv−1
⑀
f.䊐
The fourth claim specifies an event and associated probability
that guarantees that all vehicles will only propose targets with
positive regret.
CLAIM 4.4. Fix k0⬎1. We have R
˜
Vi
ai共k兲共k兲⬎0for all kⱖk0
+2Nnvand for all Vi苸Vwith probability at least
兿
i=1
nv1
兩Ai兩
␥
共1−
⑀
¯
兲2Nnv
Proof. Let a0ªa共k0兲. Suppose R
˜
Vi
ai
0共k0兲ⱕ0. Furthermore, sup-
pose that a0is repeated Nconsecutive times, i.e., a共k0兲=¯
=a共k0+N−1兲=a0, which occurs with at least probability at least
共1−
⑀
¯
兲nv共N−1兲.
If there exists an a*=共ai
*,a−i
0兲such that UVi共a*兲⬎Ui共a0兲, then,
by Claim 4.3, R
˜
Vi
ai
*共k0+N兲⬎
␦
/2 and a*will be proposed at step
k0+Nwith at least probability
␥
. Conditioned on this, we know
from Claim 4.1 that R
˜
Vi
ai共k兲共k兲⬎0 for all kⱖk0+N.
If there does not exist such an action a*, then R
˜
Vi
ai共k0+N兲⬍0 for
all ai苸Ai. A proposal profile 共ai
w,a−i
0兲with UVi共ai
w,a−i
0兲⬍Ui共a0兲
will be proposed at step k0+Nwith at least probability 共1/兩Ai兩兲
⫻共1−
⑀
¯
兲nv−1.Ifa共k0+N兲=共ai
w,a−i
0兲, and if, furthermore, 共ai
w,a−i
0兲is
repeated Nconsecutive times, i.e., a共k+N兲=¯=a共k+2N−1兲,
which happens with probability at least 共1−
⑀
¯
兲nv共N−1兲, then, by
Claim 4.3, R
˜
Vi
ai
0共k0+2N兲⬎
␦
/2 and the joint target a0will be pro-
posed at step 共k0+2N兲with at least probability
␥
. Conditioned on
this, we know from Claim 4.1 that R
˜
Vi
ai共k兲共k兲⬎0 for all kⱖk0
+2N.
In summary, R
˜
Vi
ai共k兲共k兲⬎0 for all kⱖk0+2Nwith at least prob-
ability
1
兩Ai兩
␥
共1−
⑀
¯
兲2Nnv
We can repeat this argument for each vehicle to show that
R
˜
Vi
ai共k兲共k兲⬎0 for all times kⱖk0+2Nnvand for all Vi苸Vwith
probability, at least
兿
i=1
nv1
兩Ai兩
␥
共1−
⑀
¯
兲2Nnv
䊐
Final Step: Establishing Convergence to a Pure Nash Equilib-
rium. Fix k0⬎1. Let k1ªk0+2Nnv. Suppose R
˜
Vi
ai共k兲共k兲⬎0 for all
kⱖk1and for all Vi苸V, which, by Claim 4.4, occurs with prob-
ability, at least
兿
i=1
nv1
兩Ai兩
␥
共1−
⑀
¯
兲2Nnv
Suppose further that a共k1兲=¯=a共k1+N−1兲which occurs with at
least probability 共1−
⑀
¯
兲nv共N−1兲.Ifa共k兲is a Nash equilibrium, then
by Claim 4.1, we are done. Otherwise, according to Claim 4.3, a
proposal profile a⬘=(ai
⬘,a−i共k1兲)with UVi共a⬘兲⬎UVi(a共k1兲)for
some Vi苸Vwill be played at step k1+Nwith at least probability
␥
. Note that this would imply Ug(a共k1+N兲)⬎Ug(a共k1兲). Suppose
now a共k1+N兲=¯=a共k1+2N−1兲, which occurs with at least
Journal of Dynamic Systems, Measurement, and Control SEPTEMBER 2007, Vol. 129 / 591
Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm
probability 共1−
⑀
¯
兲nv共N−1兲.Ifa⬘is a Nash equilibrium, then, by
Claim 4.2, we are done. Otherwise, according to Claim 4.3, a
proposal profile a⬙=共ai
⬙,a−i
⬘兲with UVi共a⬙兲⬎UVi(a共k1+N兲)for
some Vi苸Vwill be played at step k1+2Nwith at least probability
␥
. Note that this would imply Ug(a共k1+2N兲)⬎Ug(a共k1+N兲).
Note that this procedure can only be repeated a finite number of
times because the global utility is strictly increasing each time.
We can repeat the above arguments until we reach a pure Nash
equilibrium a*and stay at a*for Nconsecutive steps. This means
that there exists constants
⑀
˜
⬎0 and T
˜
⬎0, both of which are in-
dependent of k0, such that the following event happens with at
least probability
⑀
˜
:a共k兲=a*for all kⱖk0+T
˜
. This proves Theorem
4.1. 䊐
Note that an agreed assignment that emerges from generalized
RM with fading memory and inertia can be suboptimal. Charac-
terizing the equilibrium selection properties in potential games
still remains as an open problem. As in FP, regret-based dynamics
introduced above would require communication of proposed tar-
get assignments as part of a negotiation process. FP is guaranteed
to converge for potential games but requires an individual vehicle
to process the empirical frequencies of all other vehicles that af-
fect its utility and to use these empirical frequencies to compute
the maximizer of its expected utility. Generalized RM with fading
memory and inertia is guaranteed to converge to a pure equilib-
rium in almost all 共ordinal兲potential games; however, its compu-
tational requirements are significantly lower. It only requires an
individual vehicle to process an average regret vector whose di-
mension is its number of targets and to select a 共randomized兲
target based on the positive part of its average regret vector.
4.3 Review: One-Step Memory Spatial Adaptive Play. The
previous negotiation mechanisms were called recursive averaging
algorithms since they maintained a running average 共or fading
memory average兲of certain variables, e.g., averaged actions of
other players 共FP兲or averaged regret measures 共RM兲. These algo-
rithms have “infinite memory” in that the long-term effect of a
measured variable may diminish but is never completely
eliminated.
In this section, we will consider an opposite extreme, namely, a
specific one-step memory algorithm called spatial adaptive play.
共SAP兲spatial adaptive play was introduced in 关9兴共Chap. 6兲
共which also reviews other multistep memory algorithms兲as a
learning process for games played on graphs. SAP can be a very
effective negotiation mechanism in our autonomous vehicle-target
assignment problem because it would have low computational
burden on each vehicle and it would lead to an optimal solution in
potential games with arbitrarily high probability.
Unlike the other negotiation mechanisms we considered thus
far, at any step of SAP negotiations, one vehicle is randomly
chosen, where each vehicle is equally likely to be chosen, and
only this chosen vehicle is given the chance to update its proposed
target.4Let a共k−1兲denote the profile of proposed targets at step
k−1. At step k, the vehicle that is given the chance to update its
proposed target, say vehicle Vi, proposes a target according to a
probability distribution pi共k兲苸⌬共兩Ai兩兲 that maximizes
pi
T共k兲
冤
UVi„Ai
1,a−i共k−1兲…
⯗
UVi„Ai
兩Ai兩,a−i共k−1兲…
冥
+
H关pi共k兲兴
where H共·兲is the entropy function that rewards randomization
共see Nomenclature兲and
⬎0 is a parameter that controls the level
of randomization. For any
⬎0, the maximizing probability pi共k兲
is uniquely given by
pi共k兲=
冢
1
冤
UVi关Ai
1,a−i共k−1兲兴
⯗
UVi关Ai
兩Ai兩,a−i共k−1兲兴
冥
冣
where
共·兲is the logit or soft-max function 共see Nomenclature兲.
For any
⬎0, pi共k兲assigns positive probability to all targets in
Ai. We are interested in small values of
⬎0 because then pi共k兲
approximately maximizes vehicle Vi’s 共unperturbed兲utility based
on other vehicles’ proposals at the previous step. For other inter-
pretations of the entropy term, see 关35,36兴; and for different ways
of randomization, see 关20兴.
The computational burden of SAP on each updating vehicle is
comparable to that of RM on each vehicle. Each vehicle needs to
observe and maintain the proposal profile a共k兲共actually, only the
relevant part of a共k兲兲. If given the chance to update its proposal,
vehicle Vineeds to call its utility function evaluator only 兩Ai兩
times. Because only one vehicle updates its proposal at a given
negotiation step, the convergence of negotiations may be slow
when there are large number of vehicles.5However, if the vehicles
have a relatively small number of common targets in their target
sets, then multiple vehicles can be allowed to update their propos-
als at a given step as long as they do not have common targets.
Allowing such multiple updates may potentially speed up the ne-
gotiations substantially. In our simulations summarized in Sec. 5,
typically SAP provided convergence to a near-optimal assignment
faster than the most other negotiations mechanisms.
4.4 Selective Spatial Adaptive Play. We will now introduce
“selective spatial adaptive play” 共sSAP兲for the cases where a
vehicle has a large number of targets in its target set or calling its
utility function evaluator is computationally expensive. We will
parameterize sSAP with n=共n1, ...,nnv兲where 1ⱕniⱕ兩Ai兩−1
represents the number of times that vehicle Vicalls its utility
function evaluator when it is given the chance to update its pro-
posal. Let us say that vehicle Vi, using sSAP, is given the chance
to update its proposal at step k. First, vehicle Visequentially se-
lects nitargets from Ai\兵ai共k−1兲其 without replacement where
each target is selected independently and with uniform probability
over the remaining targets. Call these selected targets
Ai
ᐉ1共k兲,... ,Ai
ᐉni共k兲, and let Ai
ᐉ0共k兲ªai共k−1兲be appended to these
set of selected targets. Then, at step k,vehicle Viproposes a target
according to the probability distribution
pi共k兲=
冢
1
冤
UVi关Ai
ᐉ0共k兲,a−i共k−1兲兴
⯗
UVi关Ai
ᐉni共k兲,a−i共k−1兲兴
冥
冣
for some
⬎0. In other words, at step k, vehicle Viproposes a
target to approximately maximize its own utility based on the
selected targets Ai
ᐉ0共k兲,... ,Ai
ᐉni共k兲and other vehicles’ proposals
at the previous step. Thus, to compute pi共k兲, vehicle Vi, needs to
call its utility function evaluator only nitimes where niⱖ1 could
be much smaller than 兩Ai兩. It turns out that we can characterize the
4We will not deal with the issue of how the autonomous vehicles can randomly
choose exactly one vehicle 共or multiple vehicles with no common targets兲to update
its proposal without centralized coordination. In actuality, such asynchronous updat-
ing may be easier to implement than implementing the aforementioned negotiation
mechanisms that require synchronous updating. One possible implementation of
asynchronous updating would be similar to the implementation of well known Aloha
protocol in multiaccess communication, where multiple transmitting nodes attempt to
access a single communication channel without colliding with each other 关34兴.
5If SAP is used as a centralized optimization tool, then the computational burden
at each step will be very small because only one entry in a共k兲will be updated at each
step.
592 / Vol. 129, SEPTEMBER 2007 Transactions of the ASME
Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm
long-term behavior of sSAP quite precisely following along simi-
lar lines of proof of Theorem 6.1 in 关9兴.
THEOREM 4.2. Assume that the vehicle utilities constitute a po-
tential game where the global utility Ugis a potential function.
Then,the target proposals a共k兲generated by sSAP satisfy
lim
↓0
lim
k→⬁
Prob兵a共k兲is an optimal target assignment profile其
=1
Proof. sSAP induces an irreducible Markov process where the
state space is Aand the state at step kis the profile a共k兲of
proposed targets. The empirical frequencies of the visited states
converge to the unique stationary distribution of this induced Mar-
kov process. As in Theorem 6.1 in 关9兴, we show that, this station-
ary distribution, denoted by
, is given as
共a兲=e共1/
兲Ug共a兲
兺a
¯
苸Ae共1/
兲Ug共a
¯
兲,∀a苸A
by verifying the detailed balance equations
共a兲Prob兵a→b其=
共b兲Prob兵b→a其,∀a,b苸A
The only nontrivial case that requires the verification of the above
equations is when aand bdiffer in exactly one position. Fix aand
bsuch that ai⫽biand a−i=b−i. Then, we have
Prob兵a→b其
=1
nv兺
共a0,. . .,ani兲苸S共a,b兲
1
共兩Ai兩−1兲¯共兩Ai兩−ni兲
e共1/
兲UVi共b兲
兺j=0
nie共1/
兲UVi共aj兲
where
S共a,b兲=兵共a0, ... ,ani兲苸Ani+1:共a−i
j=a−i,∀j兲共a0=a兲共aj
=b, for one j兲,共aj⫽am,∀j,m兲其
It is now straightforward to see that
Prob兵a→b其
Prob兵b→a其=e共1/
兲关UVi共b兲−UVi共a兲兴 =e共1/
兲关Ug共b兲−Ug共a兲兴 =
共b兲
共a兲
Therefore,
is indeed as given above, and it can be written, in
the alternative vector form, as
=
冉
1
Ug
冊
where, by an abuse of notation, Ugis also used to represent a
vector whose “ath entry” equals Ug共a兲. Finally, the fact that the
Markov process induced by sSAP with
⬎0 being irreducible and
aperiodic readily leads to the desired result. 䊐
Thus, in the setup above,
assigns arbitrarily high probability
to those assignment profiles that maximize a potential function for
the game as
↓0. Clearly, this result indicates that in the case of
vehicle utilities IIU 共4兲, RRU 共5兲,orWLU共7兲, sSAP negotiations
would lead to an optimal target assignment with arbitrarily high
probability provided that
⬎0 is chosen sufficiently small. Of
course, one can gradually decrease
to allow initial exploration.
We believe that one can obtain convergence, in probability, of
proposals a共k兲to an optimal assignment if
is decreased suffi-
ciently slowly as in simulated annealing 关37,38兴. In our simula-
tions, choosing
inversely proportional to k2during the negotia-
tions typically resulted in fast convergence of the proposals to a
near optimal assignment.
5 Simulation Results
In this section, we present some numerical results to illustrate
that when the individual utility functions and the negotiation
mechanisms are properly selected the autonomous vehicles can
agree on a target assignment profile that yield near-optimal global
utility. We consider two scenarios. In the first scenario, we illus-
trate the near optimality of our approach by simulating a special
case of the well-known weapon target assignment model where an
optimal assignment can be obtained for large number of weapons
and targets in a short period of time 关2兴. In the second scenario,
we simulate a general instance of the problem and compare vari-
ous negotiation algorithms in terms of their performance and
speed of convergence.
Scenario 1. Here, the vehicles are identical and have zero val-
ues, whereas the targets are different and have positive values.
Each vehicle can be assigned to any of the targets.6Let Vjbe the
value of target Tjand pjbe the probability that target Tjgets
eliminated when only a single vehicle engages target Tj. When
multiple vehicles are assigned to target Tj, each of the vehicles is
assumed to engage target Tj, independently. Hence, if the number
of vehicles engaging target Tjis xj, then Tjwill be eliminated with
probability 1−共1−pj兲xj. Therefore, as a function of the assignment
profile a, the utility generated by the engagement with target Tjis
given by
UTj共a兲=Vj关1−共1−pj兲兺i=1
nvI兵ai=Tj其兴
which leads to the following global utility function:
Ug共a兲=兺
j=1
nt
Vj关1−共1−pj兲兺i=1
nvI兵ai=Tj其兴
Given the parameters nv,nt,V1,... ,Vnt, and p1,...,pnt, an opti-
mal vehicle-target assignment that maximizes the global utility
function given above can be quickly obtained using an iterative
procedure called minimum marginal return algorithm 关2兴.
To test the effectiveness of our approach, we simulated the
vehicle negotiations using the above model with 200 vehicles and
200 targets in MATLAB on a single personal computer with
1.4 GHz Pentium共R兲M processor and 1.1 GB of RAM. Each of
the target values, V1,... ,V200, and each of the elimination prob-
abilities, p1,... ,p200, are once independently chosen according to
uniform probability distribution on 关0,1兴and thereafter kept con-
stant throughout the simulations. We first conducted 100 runs of
generalized RM negotiations 共RMifunction is as in 共9兲,
=0.1,
␣
=0.5兲with WLU utilities 共7兲, where each negotiation consisted
of 100 steps. We then repeated this with 100 runs of SAP nego-
tiations with WLU utilities 共7兲where each run consisted of 1000
steps. We also conducted 100 runs of utility based FP negotiations
with WLU utilities 共7兲, where each negotiation consisted of 1000
steps. In all cases, the randomization level
is decreased as 10/k2,
where kis the negotiation step. Evolution of global utility during
typical runs of generalized RM, SAP, and utility-based FP nego-
tiations is shown in Fig. 4. Also, the global utility corresponding
to the assignment profile at the end of each run of negotiations and
the CPU time required for each run were recorded. A summary of
these numerical results is provided in Table 1.
All negotiations consistently yielded near-optimal assignments.
Global utility generated by SAP negotiations were almost always
monotonically increasing, whereas global utility generated by
generalized RM and utility-based FP negotiations exhibited fluc-
tuations as seen in Fig. 4.
In any SAP negotiation step, only one vehicle calls its utility
function evaluator 200 times; whereas in any generalized RM ne-
gotiation step, all vehicles call their utility function evaluators
共200 times for each vehicle兲. As a result, although a typical gen-
eralized RM negotiation converged in 100 steps as opposed to
1000 steps in the case of SAP, a typical 100 step generalized RM
negotiation took 593 s CPU time, on average, whereas a typical
1000-step SAP negotiation took 49 s CPU time, on average. How-
ever, it is important to note that these numbers reflect sequential
6Note that there is no reason to consider a null target T0here.
Journal of Dynamic Systems, Measurement, and Control SEPTEMBER 2007, Vol. 129 / 593
Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm
CPU time. In an actual implementation, individual vehicles will
call their utility function evaluators in parallel. The “parallel”
CPU time in Table 1 is the overall CPU time divided by the
number of vehicles. It is a rough reflection of what would be the
actual implementation time in a parallel implementation. We see
that generalized RM is actually faster than SAP. The parallel time
in SAP is the same as the sequential CPU time because only one
vehicle updates its strategy per iteration.
In the case of utility-based FP, all vehicles call their utility
function evaluators at each negotiation step but only once for each
vehicle. This can be contrasted with generalized RM, which re-
quires a utility function evaluation for every possible target.
Utility-based FP took 1000 negotiation steps to approach the op-
timal global utility, but using 67 s CPU time, on average 共or
0.33 s in parallel兲, which is also faster than the average CPU time
used by RM, despite utility-based FP requiring more iterations.
For this scenario, action-based FP would impose enormous
computational burden on each vehicle since a vehicle using action
FP would have to keep track of the empirical frequencies of the
choices of 199 other vehicles and compute its expected utility
over a decision space of dimension 200200 at every negotiation
step. However, the numerical results presented above verify that
autonomous vehicles can quickly negotiate and agree on an as-
signment profile that yields near optimal global utility when ve-
hicle utilities and negotiation mechanisms are chosen properly.
Scenario 2. In this scenario, we consider a more general in-
stance of the weapon target assignment problem, where we have
virtually no way of computing the optimal global utility. The setup
in this scenario is similar to the one in Scenario 1, except that the
vehicles are not identical and are also range restricted. More spe-
cifically, each vehicle still has zero value, but the probability pij
that target Tjgets eliminated when only vehicle Viengages target
Tjdiffers from vehicle to vehicle. Each of the elimination prob-
abilities, pij,0ⱕi,jⱕ200, is once independently chosen accord-
ing to uniform probability distribution on 关0,1兴and thereafter
kept constant throughout the simulations. Each vehicle Vihas 20
targets in its range Aiand the targets in Aiare chosen from the set
of all targets with equal probability and independently of the other
vehicles. Therefore, a pair of two vehicles may have some com-
mon as well as distinct targets in their ranges. As in Scenario 1,
the target values V1,... ,V200 are chosen independently and ac-
cording to uniform probability distribution on 关0,1兴. Therefore, as
a function of the assignment profile a, the utility generated by the
engagement with target Tjis given by
UTj共a兲=Vj
冋
1− 兿
i:Tj苸Ai
共1−pij兲
册
which leads to the following global utility function
Ug共a兲=兺
j=1
nt
Vj
冋
1− 兿
i:Tj苸Ai
共1−pij兲
册
Using the same computational resources and the same setup as
in Scenario 1, we simulated the vehicle negotiations on the above
model. Evolution of global utility during typical runs of general-
ized RM, SAP, and utility-based FP negotiations is shown in Fig.
5. The global utility corresponding to the assignment profile at the
end of each run of negotiations and the CPU time required for
each run were recorded. A summary of these numerical results is
provided in Table 2.
All negotiations eventually settled at some assignment profiles,
leading to comparable global utility as shown in Fig. 5 and Table
2. The convergence in this scenario was slower for all negotiation
mechanisms. The reason for this is that the vehicles in this sce-
nario are not identical and range restricted, and as a result, com-
puting each vehicle’s utility is computationally more demanding.
The relative timings in both CPU time and convergence rates are
similar to those in Scenario 1.
Action-based FP was computationally infeasible for this sce-
nario as well for the same reasons stated earlier, i.e., its enormous
computational burden on each vehicle.
The numerical results presented above show that autonomous
vehicles can quickly negotiate and agree on an 共possibly near-
optimal兲assignment profile when vehicle utilities and negotiation
mechanisms are chosen properly. In all cases, vehicles only com-
municate with their “neighbors,” i.e., those vehicles that share a
common target. The difference between algorithms is in the num-
ber of vehicles that communicate per iteration. In SAP, only the
vehicle revising its assignment must communicate with its neigh-
bors. In generalized RM and utility-based FP, all vehicles must
communicate with their neighbors in every iteration. In Scenario
1, all vehicles share the same targets and thus, all vehicles are
Fig. 4 Evolution of global utility during typical runs of
negotiations
Table 1 Summary of simulation runs
Generalized RM SAP Utility-based FP
Average global utility ⲐOptimal global utility 0.99 0.99 0.98
Minimum global utility ⲐOptimal global utility 0.99 0.99 0.96
Average CPU time 共s兲593 共⬇3.0 parallel兲49 67 共⬇0.33 parallel兲
594 / Vol. 129, SEPTEMBER 2007 Transactions of the ASME
Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm
neighbors. In Scenario 2, the communication pattern is much
more sparse because of the limited vehicle ranges and distribution
of targets. The most communications savings per iteration is for
SAP. However, SAP showed more iterations required for conver-
gence.
6 Conclusions
We introduced an autonomous vehicle-target assignment prob-
lem as a multiplayer game where the vehicles are self-interested
players with their own individual utility functions. We emphasized
rational decision making on the part of the vehicles to develop
autonomous operation capability in uncertain and adversarial en-
vironments. To achieve optimality with respect to a global utility
function, we discussed various aspects of the design of the vehicle
utilities, in particular, alignment with a global utility function and
localization. We reviewed selected multiplayer learning algo-
rithms available in the literature. We introduced two new algo-
rithms that address the informational and computation require-
ment of existing algorithms, namely, generalized RM with fading
memory and inertia and selective spatial adaptive play, and pro-
vided accompanying convergence proofs. Finally, we discussed
these learning algorithms in terms of convergence, equilibrium
selection, and computational efficiency, and illustrated the
achievement of a global utility in a near-optimal fashion through
autonomous vehicle negotiations.
We end by pointing to a significant extension of this work, the
case where the vehicle-target assignments need to be made se-
quentially over a time horizon 关2兴. In this case, the assignment
decisions made by the vehicles at a given time step 共probabilisti-
cally兲determines the future games to be played by the vehicles.
Therefore, the vehicles need to take the future utilities into ac-
count in their negotiations. A natural framework to study such
problems of sequential decision making in a competitive multi-
player setting is the framework of Markov games 关39,40兴. Extend-
ing the approach taken in this paper to a Markov game setup
requires significant future work.
Acknowledgment
Research supported by NSF Grant No. ECS-0501394, AFOSR/
MURI Grant No. F49620-01-1-0361, and ARO Grant No.
W911NF–04–1–0316.
Nomenclature
兩A兩⫽number of elements in A, for a finite set A
I兵·其⫽indicator function
Rn⫽ndimensional Euclidian space, for a positive
integer n
1⫽vector
共
1
⯗
1
兲
苸Rn
共·兲T⫽transpose operation
⌬共n兲⫽simplex in Rn, i.e.,
兵s苸Rn兩sⱖ0 componentwise, and 1Ts=1其
Int(⌬共n兲)⫽set of interior points of a simplex, i.e., s⬎0
componentwise
H:Int共⌬共n兲兲
→R⫽entropy function H共x兲=−xTlog共x兲
:Rn→⌬共n兲⫽“logit” or “soft-max” function (
共x兲)i=exi/共ex1
+¯+exn兲
关x兴+苸Rn⫽vector whose ith entry equals max共xi,0兲, for x
苸Rn
References
关1兴Olfati-Saber, R., 2006, “Flocking for Multi-Agent Dynamic Systems: Algo-
rithms and Theory,” IEEE Trans. Autom. Control, 51, pp. 401–420.
关2兴Murphey, R. A., 1999, “Target-Based Weapon Target Assignment Problems,”
Nonlinear Assignment Problems: Algorithms and Applications, Pardalos, P.
M., and Pitsoulis, L. S., ed., pp. 39–53, Kluwer, Dordrecht.
关3兴Ahuja, R. K., Kumar, A., Jha, K., and Orlin, J. B., 2003, “Exact and Heuristic
Methods for the Weapon-Target Assignment Problem,” http://ssrn.com/abstract
⫽489802
关4兴Fudenberg, D., and Tirole, J., 1991, Game Theory, MIT Press, Cambridge,
MA.
关5兴Basar, T., and Olsder, G. J., 1999, Dynamic Noncooperative Game Theory,
SIAM, Philadelphia.
关6兴Wolpert, D. H., and Tumor, K., 2001, “Optimal Payoff Functions for Members
of Collectives,” Adv. Complex Syst., 4共2&3兲, pp. 265–279.
关7兴Monderer, D., and Shapley, L. S., 1996, “Potential Games,” Games Econ.
Behav., 14, pp. 124–143.
关8兴Fudenberg, D., and Levine, D. K., 1998, The Theory of Learning in Games,
MIT Press, Cambridge, MA.
关9兴Young, H. P., 1998, Individual Strategy and Social Structure: An Evolutionary
Theory of Institutions, Princeton University Press, Princeton, NJ.
关10兴Wolpert, D., and Tumor, K., 2004, “A Survey of Collectives,” in Collectives
and the Design of Complex Systems, K. Tumer and D. Wolpert, eds., Springer-
Verlag, New York, NY, p. 142.
关11兴Miettinen, K. M., 1998, Nonlinear Multiobjective Optimization, Kluwer, Dor-
drecht.
关12兴Rosenthal, R. W., 1973, “A Class of Games Possessing Pure-Strategy Nash
Equilibria,” Int. J. Game Theory, 2, pp. 65– 67.
关13兴Mas-Colell, A., Whinston, M. D., and Green, J. R., 1995, Microeconomic
Theory, Oxford University Press, London.
关14兴Benaim, M., and Hirsch, M. W., 1999, “Mixed Equilibria and Dynamical
Systems Arising From Fictitious Play in Perturbed Games,” Games Econ. Be-
Fig. 5 Evolution of global utility during typical runs of
negotiations
Table 2 Summary of simulation runs
Generalized
RM SAP
Utility-based
FP
Global utility 87.62 85.24 85.49
Average CPU time 共s兲2707
共⬇13.5 parallel兲
382 529
共⬇2.64 parallel兲
Journal of Dynamic Systems, Measurement, and Control SEPTEMBER 2007, Vol. 129 / 595
Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm
hav., 29, pp. 36–72.
关15兴Brown, G. W., 1951, “Iterative Solutions of Games by Fictitious Play,” Activ-
ity Analysis of Production and Allocation, Koopmans, T. C., ed., Wiley, New
York, pp. 374–376.
关16兴Monderer, D., and Shapley, L. S., 1996, “Fictitious Play Property for Games
With Identical Interests,” J. Econ. Theory, 68, pp. 258–265.
关17兴Curtis, J. W., and Murphey, R., 2003, “Simultaneous Area Search and Task
Assignment for a Team of Cooperative Agents,” AIAA Guidance, Navigation,
and Control Conference and Exhibit, August, Austin, Texas, AIAA, pp. 2003–
5584.
关18兴Hofbauer, J., 1995, “Stability for the Best Response Dynamics,” University of
Vienna, Vienna, Austria, http://homepage.univie.ac.at/josef.hofbauer/br.ps
关19兴Krishna, V., and Sjöström, T., 1998, “On the Convergence of Fictitious Play,”
Math. Op. Res., 23, pp. 479–511.
关20兴Hofbauer, J., and Sandholm, B., 2002, “On the Global Convergence of Sto-
chastic Fictitious Play,” Econometrica, 70, pp. 2265–2294.
关21兴Lambert, T. J., III, Epelman, M. A., and Smith, R. L., 2005, “A Fictitious Play
Approach to Large-Scale Optimization,” Oper. Res., 53共3兲, pp. 477–489.
关22兴Marden, J. R., Arslan, G., and Shamma, J. S., 2005, “Joint Strategy Fictitious
Play with Inertia for Potential Games,” Proc. of 44th IEEE Conference on
Decision and Control, Dec., pp. 6692–6697.
关23兴Fudenberg, D., and Levine, D., 1998, “Learning in Games,” European Eco-
nomic Review, 42, pp. 631–639.
关24兴Fudenberg, D., and Levine, D. K., 1995, “Consistency and Cautious Fictitious
Play,” J. Econ. Dyn. Control, 19, pp. 1065–1089.
关25兴Sutton, R. S., and Barto, A. G., 1998, Reinforcement Learning: An Introduc-
tion, MIT Press, Cambridge, MA.
关26兴Bertsekas, D. P., and Tsitsiklis, J. N., 1996, Neuro-Dynamic Programming,
Athena Scientific, Belmont, MA.
关27兴Leslie, D., and Collins, E., 2003, “Convergent Multiple-Ttimescales Rein-
forcement Learning Algorithms in Normal form Games,” Ann. Appl. Probab.,
13, pp. 1231–1251.
关28兴Leslie, D., and Collins, E., 2005, “Individual Q-Learning in Normal Form
Games,” SIAM J. Control Optim., 44共2兲. pp. 495–514.
关29兴Leslie, D. S., and Collins, E. J., 2006, “Generalised Weakened Fictitious Play,”
Games and Economic Behavior, Vol. 56, issue 2, pages 285–298.
关30兴Hart, S., and Mas-Colell, A., 2000, “A Simple Adaptive Procedure Leading to
Correlated Equilibrium,” Econometrica, 68共5兲, pp. 1127–1150.
关31兴Hart, S., and Mas-Colell, A., 2001, “A General Class of Adaptative Strate-
gies,” J. Econ. Theory, 98, pp. 26–54.
关32兴Hart, S., and Mas-Colell, A., 2003, “Regret Based Continuous-Time Dynam-
ics,” Games Econ. Behav., 45, pp. 375–394.
关33兴Marden, J. R., Arslan, G., and Shamma, J. S., 2007, “Regret Based Dynamics:
Convergence in Weakly Acyclic Games,” Proc. of 6th International Joint Con-
ference on Autonomous Agents and Multi-Agent Systems, ACM Press, New
York, NY, pp. 194 –201.
关34兴Bertsekas, D., and Gallager, R., 1992, Data Networks, 2nd ed., Prentice-Hall,
Englewood Cliffs., NJ.
关35兴Hofbauer, J., and Hopkins, E., 2005, “Learning in Perturbed Asymmetric
Games,” Games and Economic Behavior, Vol. 52, pp. 133–152.
关36兴Wolpert, D. H., 2004, “Information Theory—The Bridge Connecting Bounded
Rational Game Theory and Statistical Physics,” http://arxiv.org/PS-cache/
cond-mat/pdf/0402/0402508.pdf
关37兴Aarts, E., and Korst, J., 1989, Simulated Annealing and Boltzman Machines,
Wiley, New York.
关38兴van Laarhoven, P. J. M., and Aarts, E. H. L., 1987, Simulated Annealing:
Theory and Applications, Reidel, Dordrecht.
关39兴Raghavan, T. E. S., and Fillar, J. A., 1991, “Algorithms for Stochastic
Games—A Survey,” Methods Models Op. Res., 35, pp. 437–472.
关40兴Vrieze, O. J., and Tijs, S. H., 1980, “Fictitious Play Applied to Sequence of
Games and Discounted Stochastic Games,” Int. J. Game Theory, 11 , pp. 71–
85.
596 / Vol. 129, SEPTEMBER 2007 Transactions of the ASME
Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm