ArticlePDF Available

Autonomous Vehicle-Target Assignment: A Game-Theoretical Formulation

September 2007
Journal of Dynamic Systems Measurement and Control 129(5)

September 2007
129(5)

DOI:10.1115/1.2766722

Authors:

University of Colorado Boulder

University of Illinois, Urbana-Champaign

We consider an autonomous vehicle-target assignment problem where a group of vehicles are expected to optimally assign themselves to a set of targets. We introduce a game-theoretical formulation of the problem in which the vehicles are viewed as self-interested decision makers. Thus, we seek the optimization of a global utility function through autonomous vehicles that are capable of making individually rational decisions to opti-mize their own utility functions. The first important aspect of the problem is to choose the utility functions of the vehicles in such a way that the objectives of the vehicles are localized to each vehicle yet aligned with a global utility function. The second important aspect of the problem is to equip the vehicles with an appropriate negotiation mechanism by which each vehicle pursues the optimization of its own utility function. We present several design procedures and accompanying caveats for vehicle utility design. We present two new negotiation mechanisms, namely, "generalized regret monitoring with fading memory and inertia" and "selective spatial adaptive play," and provide accom-panying proofs of their convergence. Finally, we present simulations that illustrate how vehicle negotiations can consistently lead to near-optimal assignments provided that the utilities of the vehicles are designed appropriately.

Misaligned vehicle utilities

Summary of simulation runs

Misaligned vehicle utilities with no pure Nash equilibrium

Figures - uploaded by Jason Marden

Content may be subject to copyright.

Content uploaded by Jason Marden

Content may be subject to copyright.

Gürdal Arslan

Department of Electrical Engineering,

University of Hawaii,

Manoa, Honolulu, HI 96822

e-mail: gurdal@hawaii.edu

Jason R. Marden

e-mail: marden@ucla.edu

Jeff S. Shamma

e-mail: shamma@ucla.edu

Department of Mechanical and Aerospace

Engineering,

University of California, Los Angeles,

Los Angeles, CA 90095

Autonomous Vehicle-Target

Assignment: A Game-Theoretical

Formulation

We consider an autonomous vehicle-target assignment problem where a group of vehicles

are expected to optimally assign themselves to a set of targets. We introduce a game-

theoretical formulation of the problem in which the vehicles are viewed as self-interested

decision makers. Thus, we seek the optimization of a global utility function through

autonomous vehicles that are capable of making individually rational decisions to opti-

mize their own utility functions. The ﬁrst important aspect of the problem is to choose the

utility functions of the vehicles in such a way that the objectives of the vehicles are

localized to each vehicle yet aligned with a global utility function. The second important

aspect of the problem is to equip the vehicles with an appropriate negotiation mechanism

by which each vehicle pursues the optimization of its own utility function. We present

several design procedures and accompanying caveats for vehicle utility design. We

present two new negotiation mechanisms, namely, “generalized regret monitoring with

fading memory and inertia” and “selective spatial adaptive play,” and provide accom-

panying proofs of their convergence. Finally, we present simulations that illustrate how

vehicle negotiations can consistently lead to near-optimal assignments provided that the

utilities of the vehicles are designed appropriately. 关DOI: 10.1115/1.2766722兴

1 Introduction

Designing autonomous vehicles with intelligent and coordi-

nated action capabilities to achieve an overall objective is a major

part of the recent theme of “cooperative control,” which has re-

ceived signiﬁcant attention in recent years. Whereas much of the

work in this area focuses on “kinetic” coordination, e.g., multive-

hicle trajectory generation 共e.g., 关1兴, and references therein兲, the

focus here is on strategic coordination. In particular, we consider

an autonomous vehicle-target assignment problem 共illustrated in

Fig. 1兲, where a group of vehicles are expected to assign them-

selves to a set of targets to optimize a global utility function.

When viewed as a combinatorial optimization problem, the

vehicle-target assignment problem considered in this paper is a

generalization of the well-known weapon-target assignment prob-

lem 关2兴to the case where the global utility is a general function of

the vehicle-target assignments. In its full generality, the weapon-

target assignment problem is known to be nondeterministic-

polynomial-time-complete 关2兴, and the existing literature on the

weapon-target assignment problem is concentrated on heuristic

methods to quickly obtain near optimal assignments in relatively

large instances of the problem—very often with no guarantees on

the degree of suboptimality 共cf., 关3兴, and references therein兲.

Therefore, from an optimization viewpoint, the vehicle-target as-

signment problem considered in this paper is, in general, a hard

problem, even though optimal assignments can be obtained quite

efﬁciently in very special cases.

Our viewpoint in this paper deviates from that of direct optimi-

zation. Rather, we emphasize the design of vehicles that are indi-

vidually capable of making coordination decisions to optimize

their own utilities, which then indirectly translates to the optimi-

zation of a global utility function. The main potential beneﬁt of

this approach is to enable autonomous vehicles that are individu-

ally capable of operating in uncertain and adversarial environ-

ments, with limited information, communication, and computa-

tion, to autonomously optimize a global utility. The optimization

methods available in the literature are not suitable for our pur-

poses because even a distributed implementation of such optimi-

zation algorithms need not induce “individually rational” behav-

ior, which is the key to realize the expected beneﬁts of our

approach. Furthermore, an optimization approach would typically

require constant dissemination of global information throughout

the network of the vehicles as well as increased communication

and computation.

Accordingly, in this paper we formulate our autonomous

vehicle-target assignment problem as a multiplayer game 关4,5兴,

where each vehicle is interested in optimizing its own utility. We

use the notion of pure Nash equilibrium to represent the assign-

ments that are agreeable to the rational vehicles, i.e., the assign-

ments at which there is no incentive for any vehicle to unilaterally

deviate. We use algorithms for multiplayer learning in games as

negotiation mechanisms by which the vehicles seek to optimize

their utilities. The problem of optimizing a global utility function

by the autonomous vehicles then reduces to the proper design of

共i兲the vehicle utilities and 共ii兲the negotiation mechanisms.

Designing vehicle utilities is essential to obtaining desirable

collective behavior through self-interested vehicles 共cf., 关6兴兲.An

important consideration in designing the vehicle utilities is that

the vehicle utility functions should be “aligned” with the global

utility function in the sense that agreeable assignments 共i.e., Nash

equilibria兲should lead to high, ideally maximal, global utility.

There are multiple ways that such alignment can be achieved. An

obvious instance is to set the vehicle utilities equal the global

utility. This choice is not desirable in the case of a large number of

interaction vehicles, because another consideration in designing

the vehicle utilities is that the vehicle utilities should be “local-

ized,” i.e., a vehicle’s utility should depend only on the local

information available to the vehicle. For example, in a large

vehicle-target assignment problem, the vehicles may have range

restrictions and a vehicle may not even be aware of the targets

and/or the vehicles outside its range. In such a case, a vehicle

whose utility is set to the global utility would not have sufﬁcient

information to compute its own utility. Therefore, a vehicle’s util-

ity should be localized to its range while maintaining the align-

Contributed by the Dynamic Systems, Measurement, and Control Division of

ASME for publication in the JOURNAL OF DYNAMIC SYSTEMS, MEASUREMENT, AND CON-

TROL. Manuscript received March 31, 2006; ﬁnal manuscript received April 1, 2007.

Review conducted by Tal Shima.

584 / Vol. 129, SEPTEMBER 2007 Copyright © 2007 by ASME Transactions of the ASME

Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm

ment with the global utility. More generally, we will discuss the

properties of being aligned and localized for several utility design

procedures in Sec. 3.

Obtaining optimal assignments using the approach presented in

this paper also requires that the vehicles use a negotiation mecha-

nism that is convergent in the multiplayer game induced by the

vehicle utilities. We will show that when vehicle utilities are

aligned with the global utility, they always lead to a class of

games known as “ordinal potential games” 关7兴. The signiﬁcance

of this connection is that certain multiplayer learning algorithms,

such as ﬁctitious play 共FP兲关8兴, are known to converge in potential

games, and hence can be used as vehicle negotiation mechanisms.

However, FP has an intensive informational requirement. Spatial

adaptive play 共SAP兲关9兴is another such algorithm, which leads to

an optimizer of the potential function in potential games with

arbitrarily high probability. Although SAP reduces the information

requirement, there can be a high implementation cost when ve-

hicles have a large number of possible actions.

This paper goes beyond existing work in the area through the

introduction of new negotiating mechanisms that alleviate the in-

formational and implementation requirement, namely, “general-

ized regret monitoring with fading memory and inertia” and “se-

lective spatial adaptive play.” We establish new convergence

results for both algorithms and simulate their performance on an

illustrative weapon-target assignment problem.

The remainder of this paper is organized as follows. Section 2

sets up an autonomous vehicle-target assignment problem as a

multiplayer game. Section 3 discusses the issue of designing the

utility functions of the vehicles that are localized to each vehicle

yet aligned with a given global utility function. Section 4 reviews

selected learning algorithms available in the literature and pre-

sents two new algorithms, along with convergence results, that

offer some advantages over existing algorithms. Section 5 present

some simulation results to illustrate the possibility of obtaining

near optimal assignments through vehicle negotiations. Finally,

Section 6 contains some concluding remarks.

2 Game-Theoretical Formulation of an Autonomous

Vehicle-Target Assignment Problem

We begin by considering an optimal assignment problem where

nvvehicles are to be assigned to nttargets. Each entity, whether a

vehicle or a target, may have different characteristics. The ve-

hicles are labeled as V1,... ,Vnv, and the targets are labeled as

T0,T1,... ,Tnt, where a ﬁctitious target T0represents the “null tar-

get” or “no target.” Let Vª兵V1,... ,Vnv其and T

ª兵T0,T1,... ,Tnt其. A vehicle can be assigned to any target in its

range, denoted by Ai傺Tfor vehicle Vi苸V. The null target al-

ways satisﬁes T0苸Ai. Let AªA1⫻¯⫻Anv. The assignment of

vehicle Viis denoted by ai苸Ai, and the collection of vehicle

assignments 共a1,... ,anv兲, called the assignment proﬁle, is denoted

by a. Each assignment proﬁle, a苸A, corresponds to a global

utility, Ug共a兲, that can be interpreted as the objective of a global

planner.

We view the vehicles as “autonomous” decision makers, and

accordingly, each vehicle, e.g., vehicle Vi苸V, is assumed to select

its own target assignment, ai苸Ai, to maximize its own utility

function, UVi共a兲. In general, vehicle utility functions may be dif-

ferent and each of them may depend on the whole assignment

proﬁle a. Hence, the vehicles do not necessarily face an optimi-

zation problem, but rather, they face a 共ﬁnite兲multiplayer game.

In such a setting, the vehicles are to negotiate an assignment pro-

ﬁle that is mutually agreeable. The autonomous target assignment

problem is to design the utilities, UVi共a兲, as well as appropriate

negotiation procedures so that the vehicles can negotiate a mutu-

ally agreeable target assignment that yields maximal global utility,

Ug共a兲.

To be able to deal with the intricacies of our autonomous target

assignment problem, we adopt some concepts and methods from

the theory of games 关4,5兴. We start with the concept of equilib-

rium to characterize the target assignments that are agreeable to

the vehicles. A well-known equilibrium concept for multiplayer

games is the notion of Nash equilibrium. In the context of an

autonomous target assignment problem, a Nash equilibrium is an

assignment proﬁle a*=共a1

*,... ,anv

*兲such that no vehicle could im-

prove its utility by unilaterally deviating from a*. Before introduc-

ing the notion of Nash equilibrium in more precise terms, we will

introduce some notation. Let a−idenote the collection of the target

assignments of the vehicles other than vehicle Vi, i.e.,

a−i=共a1, ... ,ai−1,ai+1, ... ,anv兲

and let

A−iªA1⫻... ⫻Ai−1 ⫻Ai+1 ⫻... ⫻Anv

With this notation, we will sometimes write an assignment proﬁle

aas 共ai,a−i兲. Similarly, we may write UVi共a兲as UVi共ai,a−i兲. Using

the above notation, an assignment proﬁle a*is called a pure Nash

equilibrium if, for all vehicles Vi苸V,

UVi共ai

*,a−i

*兲=max

ai苸Ai

UVi共ai,a−i

*兲共1兲

In this paper, we will represent the agreeable target assignment

proﬁles by the set of pure Nash equilibria even though in the

literature some non-Nash solution concepts for multiplayer games

are also available. We will introduce one such concept called ef-

ﬁciency for future reference. An assignment proﬁle is called efﬁ-

cient if there is no other assignment that yields higher utilities to

all vehicles. For given vehicle utilities, a Nash equilibrium assign-

ment may or may not be efﬁcient. Our justiﬁcation of a pure Nash

equilibrium as an agreeable assignment is based on the autono-

Fig. 1 Illustration of vehicle target assignment

Journal of Dynamic Systems, Measurement, and Control SEPTEMBER 2007, Vol. 129 / 585

Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm

mous and self-interested nature of the vehicles. Clearly, an efﬁ-

cient pure Nash equilibrium should be more appealing to the ve-

hicles than an inefﬁcient pure Nash equilibrium.

In general, a pure Nash equilibrium may not exist for an arbi-

trary set of vehicle utilities. However, as will be seen in Sec. 3,

any reasonable set of vehicle utilities tailored to the autonomous

vehicle-target problem would have at least one pure Nash equilib-

rium.

We conclude this section with the deﬁnition of potential games

and ordinal potential games 关7兴. These games form an important

class of games because of their relevance to autonomous vehicle-

target assignment as well as their desirable convergence properties

mentioned earlier.

DEFINITION 2.1 共关ORDINAL兴POTENTIAL GAMES兲. A potential

game consists of vehicle utilities,UVi共a兲,Vi苸V,and a potential

function,

␾

共a兲:A哫R,such that,for every vehicle,Vi苸V,for

every a−i苸A−i,and for every ai

⬘,ai

⬙

苸Ai,

UVi共ai

⬘,a−i兲−UVi共ai

⬙,a−i兲=

␾

共ai

⬘,a−i兲−

␾

共ai

⬙,a−i兲

An ordinal potential game consists of vehicle utilities UVi共a兲,Vi

苸V,and a potential function

␾

共a兲:A哫Rsuch that,for every

vehicle Vi苸V,for every a−i苸A−i,and for every ai

⬘,ai

⬙

苸Ai,

UVi共ai

⬘,a−i兲−UVi共ai

⬙,a−i兲⬎0⇔

␾

共ai

⬘,a−i兲−

␾

共ai

⬙,a−i兲⬎0

In a potential game, the difference in utility received by any one

vehicle for its two different target choices, when the assignments

of other vehicles are ﬁxed, can be measured by a potential func-

tion that only depends on the assignment proﬁle and not on the

label of any vehicle.

In an ordinal potential game, an improvement in utility received

by any one vehicle for its two different target choices, when the

assignments of other vehicles are ﬁxed, always results in an im-

provement of a potential function that, again, only depends on the

assignment proﬁle and not on the label of any vehicle. Clearly,

ordinal potential games form a broader class than potential games.

3 Utility Design

In this section, we discuss various important aspects of design-

ing the vehicle utilities to achieve a high global utility. We cite

关7,10兴as the key references for this section, since we freely use

some of the terminology and the ideas presented in them. To make

the discussion more concrete and relevant, we assume a certain

structure for the global utility, even though it is possible to present

the ideas at a more abstract level. We assume that all vehicles that

assign themselves to a particular target form a team and engage

their common target in a coordinated manner. An engagement

with target Tj苸Tgenerates some utility denoted by UTj共a兲;

UT0共a兲=0 for any a.

It is important to distinguish between a target utility, UTj共a兲,

and a vehicle utility, UVi共a兲. The realized target utility represents

the overall value for engaging target Tj, whereas a vehicle utility

partly reﬂects vehicle Vi’s share of that value. Furthermore, it may

be that vehicle Vishares this reward even if it did not engage

target Ti. This will depend on the ﬁnal speciﬁcation of vehicle

utilities.

We will assume that the utility generated by an engagement

with target Tjdepends only on the characteristics of target Tjand

the vehicles engaging target Tj. This is stated more precisely in the

following assumption.

ASSUMPTION 3.1. Let a and a

˜

be two action proﬁles in A,and

for any target,Tj苸T,deﬁne the sets

Sj=兵兩Vi苸V兩ai=Tj其and S

˜

j=兵兩Vi苸V兩a

˜

i=Tj其

Then,

Sj=S

˜

j⇒UTj共a兲=UTj共a

˜

兲

We now deﬁne the global utility to be the total sum of the

utilities generated by all engagements, i.e.,

Ug共a兲=兺

Tj苸T

UTj共a兲共2兲

This summation is only one approach to aggregate the target util-

ity functions. See 关11兴for a more general discussion from the

perspective of multiobjective optimization.

It will be convenient to model an engagement with a target as a

random event that is assumed to be independent of the other target

engagements. At the end of an engagement, the target and some of

the engaging vehicles are destroyed with certain probability. The

statistics of the outcome of an engagement depend on the charac-

teristics of the target as well as the composition of the engaging

vehicles. As an example, it may be the case that only a particular

team of vehicles may destroy a particular target with reasonable

probability. In this case, the utility generated by an engagement is

taken to be the expected difference between the value of a de-

stroyed target and the total value of the destroyed vehicles. These

issues are discussed further for the well-known weapon-target as-

signment problem in Sec. 5.

An important consideration in specifying the vehicle utilities,

UVi共a兲,i=1,...,nv, is to make them “aligned” with the global

utility, Ug共a兲. Ideally, this means that the vehicles can only agree

on an optimal assignment proﬁle, i.e., an assignment proﬁle that

maximizes the global utility. Because it is not always straightfor-

ward to achieve the alignment of the vehicle utilities with the

global utility in this ideal sense 共without ﬁrst calculating an opti-

mal assignment兲, we adopt a more relaxed notion of alignment

from 关10兴. That is, a vehicle can improve its own utility by uni-

lateral action if and only if the same unilateral action also im-

proves the global utility.

DEFINITION 3.1 共ALIGNMENT兲.We will say that a set of vehicle

utilities UVi共a兲,Vi苸V,is aligned1with the global utility Ug共a兲

when the following condition is satisﬁed.For every vehicle,Vi

苸V,for every a−i苸A−i,and for every ai

⬘,ai

⬙

苸Ai,

UVi共ai

⬘,a−i兲−UVi共ai

⬙,a−i兲⬎0⇔Ug共ai

⬘,a−i兲−Ug共ai

⬙,a−i兲⬎0

共3兲

We see that the notion of alignment coincides with the notion of

ordinal potential games in Deﬁnition 2.1.

It turns out that alignment does not rule out pure Nash equilib-

ria that may be suboptimal from the global utility perspective.

Moreover, such suboptimal pure Nash equilibria may even yield

the highest utilities to all vehicles and hence may be efﬁcient.

Nevertheless, alignment also guarantees that the optimal assign-

ment proﬁles are always included in the set of pure Nash equilib-

ria; hence, they are agreeable to the vehicles even though they

may be inefﬁcient.

The above discussion on alignment is summarized by the fol-

lowing proposition, whose proof is straightforward.

PROPOSITION 3.1. Let aopt denote an optimal assignment proﬁle,

i.e.,

aopt 苸arg max

a苸A

Ug共a兲

Under the alignment condition 共3兲,the resulting game is an ordi-

nal potential game that has aopt as a 共possibly nonunique兲pure

Nash equilibrium.

3.1 Identical Interest Utility (IIU). One obvious, but ulti-

mately ineffective, way of making the vehicle utilities aligned

with the global utility is to set all vehicle utilities to the global

utility. In game-theory terminology, setting

UVi共a兲=Ug共a兲, for all vehicles Vi苸V共4兲

1The notion of alignment we adopt here is called factoredness in 关10兴.

586 / Vol. 129, SEPTEMBER 2007 Transactions of the ASME

Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm

results in an identical interest game. Obviously, an identical inter-

est game with UVi共a兲=Ug共a兲, for all vehicles Vi苸V, is also a

potential game with the potential Ug共a兲, and hence, the vehicle

utilities 共4兲are aligned with the global utility. In fact, optimal

assignments in this case yield the highest vehicle utilities and

therefore are efﬁcient. However, suboptimal Nash equilibria may

still exist.

As will be seen later, the vehicles negotiate by proposing tar-

gets and responding to the previous target assignment proposals

that are exchanged among the vehicles. Each vehicle whose utility

is set to the global utility needs to know 共i兲the proposals made by

all other vehicles as well as 共ii兲the characteristics of all the ve-

hicles and the targets to be able to generate a new proposal. The

reason for this is that vehicle Vi’s utility would depend on all

engagements with all targets, including those that are not in Ai.

Therefore, when the vehicle utilities are set to the global utility,

continuous dissemination of global information is required among

the vehicles.

3.2 Range-Restricted Utility (RRU). A possible way of

making the vehicle utilities more localized than IIU would be to

set the utility of vehicle Viequal to the sum of the utilities gen-

erated by the engagements with the targets that belong to vehicle

Vi’s target set Ai, i.e.,

UVi共a兲=兺

Tj苸Ai

UTj共a兲, for all vehicles Vi苸V共5兲

Note that in this case the global information requirement on the

vehicles is alleviated. Moreover, the vehicle utilities 共5兲are still

aligned with the global utility. This guarantees that the optimal

assignments are agreeable to the vehicles, but they may be inefﬁ-

cient; see Example 3.3. In fact, the vehicle utilities lead to a po-

tential game; see 关7兴. The following proposition is an immediate

consequence of Assumption 3.1.

PROPOSITION 3.2. Vehicle utilities that satisfy 共5兲form a poten-

tial game with the global utility Ug共a兲serving as a potential

function.

Note that when all vehicles have the same set of available tar-

gets, i.e., A1=¯=Anv, then 共5兲leads to an identical interest

game.

A concern regarding vehicle utilities 共4兲共and possibly 共5兲兲

stems from the so-called learnability issue introduced in 关10兴. That

is, a vehicle may not be able to inﬂuence its own utility in a

signiﬁcant way when a large number of vehicles can assign them-

selves to the same large set of targets. In this case, since the utility

of a vehicle is the total sum of the utilities generated by a large

number of engagements involving a large number of targets and

vehicles, the proposals made by an individual vehicle may not

have any signiﬁcant effect on its own utility. Hence, a negotiating

vehicle may ﬁnd itself approximately indifferent to the available

target choices if the negotiation mechanism employed is utility

based, i.e., the vehicle proposes targets in response to the actual

utilities corresponding to its past proposals, as in reinforcement

learning.

3.3 Equally Shared Utility (ESU). One way to limit the in-

ﬂuence of other vehicles on vehicle Vi’s utility is to set

UVi共a兲=

UTj共a兲

nTj共a兲,ifai=Tj共6兲

where nTj共a兲is the total number of vehicles engaging target Tj.

The rationale behind 共6兲is to distribute the utility generated by an

engagement equally among the engaging vehicles. Note that in

this case vehicle Vi’s utility is independent of the engagements to

which vehicle Vidoes not participate.

Even though the total sum of vehicle utilities 共6兲equals the

global utility, it turns out that 共6兲need not be exactly aligned with

the global utility.

Example 3.1. Consider two targets T1and T2with values 2 and

10, respectively, and two anonymous vehicles V1and V2, i.e., V1

and V2have identical characteristics. Assume that each vehicle is

individually capable of destroying any one of the targets with

probability 1, while the targets in no case have any chance of

destroying any of the vehicles. The vehicle utilities in this ex-

ample can be represented in the matrix form, shown in Fig. 2,

where if vehicle Vi苸兵V1,V2其chooses target ai苸兵T0,T1,T2其then

the ﬁrst number 共respectively the second number兲in the entry

共a1,a2兲represents the utility to the ﬁrst vehicle 共respectively to

the second vehicle兲. The global planner would of course prefer

each vehicle to engage a different target, since this would yield a

maximal global utility 12. However, such an optimal assignment

proﬁle might leave the vehicle engaging the low-value target un-

satisﬁed with a utility 2, and this unsatisﬁed vehicle might be able

to improve its utility to 5 by unilaterally switching to the high-

value target at the expense of lowering the global utility to 10.

Because of the misalignment of 共6兲with the global utility in this

example, an optimal assignment proﬁle may not be agreeable by

all vehicles, whereas the vehicles may ﬁnd the suboptimal Nash

equilibrium assignment 共a1,a2兲=共T2,T2兲agreeable.

However, in the case of anonymous vehicles, 共6兲does lead to a

potential game.

DEFINITION 3.2 共ANONYMITY兲.Vehicles are anonymous if for

any permutation

␴

:兵1,2, . .. ,nv其→兵1,2, . .. ,nv其

and for any two assignments,a and a

˜

,related by

a

˜

i=a

␴

共i兲,∀i苸兵1,2, . .. ,nv其

the equality

UTj共a兲=UTj共a

˜

兲

holds for any target Tj.

As the terminology implies, the utility generated by an engage-

ment with a target does not depend on the identities of the ve-

hicles engaging the target, but only the number of vehicles engag-

ing the target.

PROPOSITION 3.3. Anonymous vehicles with utilities that satisfy

共6兲form a potential game with potential function

␾

共a兲=兺

Tj苸T兺

ᐉ=1

nTj共a兲UTj共ᐉ兲

ᐉ

where nTj共a兲is the total number of vehicles assigned to target Tj

and UTj共ᐉ兲is the utility generated by an engagement of ᐉanony-

mous vehicles with target Tj.

Hence, in the case of anonymous vehicles, 共6兲is aligned with

the above potential function, which is the same potential function

introduced in 关12兴in the context of so-called congestion games,

but different from the global utility function Ug共a兲. The signiﬁ-

cance of this observation is that the existence of a potential func-

tion associated with the vehicle utilities guarantees the existence

of agreeable 共possibly suboptimal兲assignment proﬁles in the form

Fig. 2 Misaligned vehicle utilities

Journal of Dynamic Systems, Measurement, and Control SEPTEMBER 2007, Vol. 129 / 587

Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm

of pure Nash equilibria. Furthermore, there exist learning algo-

rithms that are known to converge in potential games and these

convergent learning algorithms can be used by the vehicles as

negotiation mechanisms always leading to a settlement on an as-

signment proﬁle. If the vehicles are not anonymous, then the mis-

alignment of the vehicle utilities 共6兲with the global utility can be

even more severe.

Example 3.2. Consider two targets T1and T2with values 10

each, and two distinguishable vehicles, V1and V2, with values 2

each. Assume that vehicle V1is individually capable of destroying

any one of the targets with probability one, and not one of the

targets is ever capable of destroying V1. Assume further that ve-

hicle V2is never capable of destroying any of the targets, and any

one of the targets can destroy vehicle V2with probability one.

This setup leads to the vehicle utilities shown in Fig. 3. In this

example, the two vehicles may not be able to agree on any assign-

ment proﬁle, optimal or suboptimal, because while vehicle V1

would be better off by engaging a target alone, vehicle V2would

be better off by engaging a target together with vehicle V1. Yet,

the global planner would prefer vehicle V1engaging one of the

targets and vehicle V2not engaging any target. If these two ve-

hicles were to use a negotiation mechanism that allows settlement

only on a pure Nash equilibrium, then they would not be able to

agree on any assignment because a pure Nash equilibrium does

not exist in this example. A mixed, but not pure, Nash equilibrium

is still guaranteed to exist, but would not lead to an agreement on

a particular assignment. Therefore, in the distinguishable vehicles

case, the vehicle utilities 共6兲might lead to a situation where the

vehicles are not only in conﬂict with the global planner but also in

conﬂict among themselves.

3.4 Wonderful Life Utility (WLU). A solution to the prob-

lem of designing individual utility functions that are more learn-

able than 共4兲or 共5兲and still aligned with the global utility is

offered in 关10兴in the form of a family of utility structures called

the wonderful life utility. In our context, a particular WLU struc-

ture would be obtained by setting the utility of a vehicle to the

marginal contribution made by the vehicle to the global utility,

i.e.,

UVi共ai,a−i兲=Ug共ai,a−i兲−Ug共T0,a−i兲, for all vehicles Vi苸V

共7兲

From the deﬁnition of the global utility 共2兲, the WLU 共7兲can be

written as

UVi共ai,a−i兲=UTj共ai,a−i兲−UTj共T0,a−i兲,ifai=Tj

for all vehicles Vi苸V, which means that the utility of a vehicle is

its marginal contribution to the utility generated by the engage-

ment that the vehicle participates. WLU is expected to make each

vehicle’s utility more learnable by removing the unnecessary de-

pendencies on other vehicles’ assignment decisions, while still

keeping the vehicle utilities aligned with the global utility. It turns

out that WLU 共7兲also leads to a potential game with the global

utility being the potential function.

PROPOSITION 3.4. Vehicle utilities that satisfy 共7兲form a poten-

tial game with the global utility Ug共a兲serving as a potential

function.

Another interpretation of the WLU is that a vehicle is rewarded

with a side payment equal to the externality it may create by not

assigning itself to any target, which is the idea behind “internal-

izing the externalities” in economics 关13兴.

3.5 Comparisons. Each of the vehicle utilities IIU 共4兲, RRU

共5兲, and WLU 共7兲lead to a potential game with the globally utility

function being the potential function, and hence, they are aligned

with the global utility. This guarantees that the optimal assign-

ments are in each case included in the set of pure Nash equilibria.

However, in each case, there may also be suboptimal Nash equi-

libria that may be pure and/or mixed. There is ample evidence in

the literature that a mixed equilibrium cannot emerge as a stable

outcome of vehicle negotiations, particularly in potential games

共e.g., 关14兴兲. However, a suboptimal pure Nash equilibrium can

emerge as a stable outcome, depending on the negotiation mecha-

nism used by the vehicles.

Example 3.3. Consider N⬎2 vehicles, V1,...,VN, and N+1

targets, T1,... ,TN+1, where Ai=兵Ti,TN+1其. Assume that any ve-

hicle Viengaging target Tigenerates 1 unit of utility. Assume also

that an engagement with target TN+1 generates 0 utility unless all

vehicles engage TN+1 in which case they generate 2 units of utility.

Clearly, the optimal assignment is given by a*=共T1,T2,...,TN兲.

The optimal assignment proﬁle a*is a pure Nash equilibrium when

the vehicle utilities are given by any of 共4兲and 共5兲,or共7兲. How-

ever, there is another pure Nash equilibrium a**

=共TN+1 ,TN+1 ,... ,TN+1兲for any of vehicle utilities 共4兲and 共5兲,or

共7兲which is suboptimal with respect to the global utility. The

global utility and the vehicle utilities corresponding to a*and a**

are summarized as follows:

Ug共a*兲=NU

g共a**兲=2

UVi共a*兲=NU

Vi共a**兲= 2 if vehicles utilities are given by 共4兲

UVi共a*兲=1 UVi共a**兲= 2 if vehicles utilities are given by 共5兲or 共7兲

Note that the optimality gap N−2 between a*and a** can be

arbitrarily large for large N. Note also that if the vehicle utilities

are given by RRU 共5兲or WLU 共7兲the suboptimal Nash equilib-

rium a** yields higher utilities to all vehicles than the optimal

Nash equilibrium a*.

In the case of RRU or WLU, if the negotiation mechanism

employed by the vehicles were to eliminate the inefﬁcient assign-

ment proﬁles, the vehicles would never be able to agree on the

optimal assignment a*. This example illustrates the fact that the

vehicle utilities cannot be designed independently of the negotia-

tion mechanism employed by the vehicles.

4 Negotiation Mechanisms

The issue of which Nash equilibrium will emerge as a stable

outcome of vehicle negotiations is studied under the topic of equi-

librium selection in game theory. In this section, we will discuss

equilibrium selection and other important properties of some ne-

gotiation mechanisms. In particular, we will present a negotiation

mechanism from the literature that leads to an optimal Nash equi-

librium in potential games with arbitrarily high probability.

We will adopt various learning algorithms available in the lit-

erature for multiplayer games as vehicle negotiation mechanisms

to make use of the theoretical and computational tools provided

by game theory. The negotiation mechanisms that will be pre-

sented in this section will provide the vehicles with strategic

decision-making capabilities. In particular, each vehicle will ne-

gotiate with other vehicles without any knowledge about the utili-

ties of the other vehicles. One of the reasons for such a require-

ment is that the vehicles may not have the same information

regarding their environment. For example, a vehicle may not

know all the targets and/or the potential collaborating vehicles

Fig. 3 Misaligned vehicle utilities with no pure Nash

equilibrium

588 / Vol. 129, SEPTEMBER 2007 Transactions of the ASME

Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm

available to another vehicle and, moreover, it may not be possible

to pass on such information due to limited communication band-

width. Another reason for the private utilities requirement is to

make the vehicles truly autonomous in the sense that each vehicle

is individually capable of making robust strategic decisions in

uncertain and adversarial environments. In this case, any indi-

vidual vehicle is cooperative with the other vehicles only to the

extent that cooperation helps the vehicle to maximize its own

utility, which is, of course, carefully designed by the global plan-

ner.

Accordingly, we will consider some negotiation mechanisms

that require each vehicle to know, at most, its own utility function,

the proposals made by the vehicle itself, and the proposals made

by those other vehicles that can inﬂuence the utility of the vehicle.

We will review these negotiation mechanisms in terms of conver-

gence, equilibrium selection, and computational efﬁciency. We

will present our review primarily in the context of potential

games, since many of the vehicle utility structures considered in

Sec. 3 fall into this category. In some cases, we will point to

existing results in the literature, while in some other cases we will

point to open problems.

4.1 Review: Selected Recursive Averaging Algorithms

4.1.1 Action-Based Fictitious Play. Action-based ﬁctitious

play, or simply FP, was originally introduced as a computational

method to calculate the Nash equilibria in zero-sum games 关15兴,

but later proposed as a learning mechanism in multi-player games

共cf., 关8兴兲.

One can also think of FP as a negotiation mechanism employed

by the vehicles to select their targets. At each negotiation step, k

=1,2,..., vehicles simultaneously propose targets

a共k兲ª关a1共k兲, ... ,anv共k兲兴

where ai共k兲苸Aiis the label of the target proposed by vehicle Vi.

The objective is to construct a negotiation mechanism so that the

proposed assignments, a共k兲, ultimately converge for large k.FPis

one such mechanism that is guaranteed to converge for potential

games.

In FP, the target assignment proposals at stage kare functions of

past proposed assignments over the interval 关1,k−1兴as follows.

First, enumerate the targets available to vehicle Vias Ai

=兵Ai

1,... ,Ai

兩Ai兩其. For any target index j苸关1,兩Ai兩兴, let nj共k;Vi兲de-

note the total number of times vehicle Viproposed target Ai

jup to

stage k. Now deﬁne the empirical frequency vector, qi共k兲苸R兩Ai兩,

of vehicle Vias follows:

qi共k兲=

冉

n1共k−1;Vi兲

k−1 ,n2共k−1;Vi兲

k−1 , ... ,

n兩Ai兩共k−1;Vi兲

k−1

冊

In words, qi共k兲reﬂects the histogram of proposed target assign-

ments by vehicle Viover the interval 关1,k−1兴. Note that the ele-

ments of the empirical frequency vector are all positive and sum

to unity. Therefore, qi共k兲can be identiﬁed with a probability vec-

tor on the probability simplex of dimension 兩Ai兩.

We are now set to deﬁne the FP process. At stage k, vehicle Vi

selects its proposed assignment ai共k兲苸Aiin accordance with

maximizing its expected utility as though all other vehicles make

a simultaneous and independent random selection of their actions,

a−i, based on the product distribution deﬁned by empirical fre-

quencies, q1共k兲,... ,qi−1共k兲,qi+1共k兲,...,qnv共k兲, i.e.,

ai共k兲苸arg max

␣

苸Ai

Ea−i关UVi共

␣

,a−i兲兴

In case the maximizer is not unique, then any maximizer will do.

One appealing property of FP is that the empirical frequencies

generated by FP converge to the set of Nash equilibria in potential

games 关7,16兴. Although the empirical frequencies may converge

to a mixed Nash equilibrium while the proposals are cycling 共see

the related churning issue in 关17兴兲, it is generally believed that

convergence of empirical frequencies to a mixed 共but not pure兲

Nash equilibrium happens rarely when vehicle utilities are not

equivalent to a zero sum game 关18,19兴. Thus, if the vehicles ne-

gotiate using FP and their utilities constitute a potential game,

then in most cases we can expect them to asymptotically reach an

agreement on an assignment proﬁle. We should also mention nu-

merous stochastic versions of FP with similar convergence prop-

erties 关20兴.

The main disadvantage of FP for the purposes of this paper is

its computational burden on each vehicle. The most computation-

ally intensive operation is the optimization of the utilities during

the negotiations, which effectively requires an enumeration of all

possible combined assignments by other vehicles 关21,22兴. This

makes FP computationally prohibitive when there are large num-

bers of vehicles with large target sets. To make FP truly scalable,

it is clear that the vehicles need to evaluate their utilities more

directly without using the empirical frequencies.

4.1.2 Utility-Based FP. The distinction between action-based

and utility-based FP, see 关23,24兴, is that the vehicles predict their

utilities during the negotiations based on the actual utilities corre-

sponding to the previous proposals. Utility Based FP is in essence

a multi-agent Reinforcement Learning algorithm 关25,26兴. The dif-

ference is that in reinforcement learning, the utility evaluation is

based on experience, whereas in utility based FP, it is based on a

call to a simulated utility function evaluator.

The main advantage of utility-based FP is its very low compu-

tational burden on each vehicle. In particular, the vehicles do not

need to compute the empirical frequencies of the past proposals

made by any vehicle and do not need to compute their expected

utilities based on the empirical frequencies. It only requires an

individual vehicle to process a 共state兲vector whose dimension is

its number of targets and to select a 共randomized兲maximizer. This

signiﬁcantly alleviates the computational bottleneck of FP. How-

ever, the convergence of utility-based FP for potential games is

still an open issue.

There are also other utility-based learning algorithms that are

proven to converge in partnership games 关27–29兴. These algo-

rithms are similar to multiagent reinforcement learning algorithms

and have comparable computational burden to that of utility based

FP. However, convergence requires ﬁne tuning of various param-

eters, such as the learning rates of each agent. Moreover, utility-

based learning algorithms are prone to the issue learnability and

may exhibit a slower convergence than action-based FP.

4.1.3 Regret Matching. The discussion on FP in Sec. 4.1.2

motivates a learning algorithm that is computationally feasible as

well as convergent in potential games, both theoretically and prac-

tically. Accordingly, we introduce regret matching, from 关30兴,

whose main distinction is that the vehicles propose targets based

on their regret for not proposing particular targets in the past

negotiation steps.

As before, let us enumerate the targets available to vehicle Vias

Ai=兵Ai

1,... ,Ai

兩Ai兩其. Vehicle Viselects its proposed target, ai共k兲,

according to a probability distribution, pi共k兲苸⌬共兩Ai兩兲, that will

be speciﬁed shortly. The ᐉth component, pi

ᐉ共k兲,ofpi共k兲is the

probability that vehicle Viselects the ᐉth target in Aiat the nego-

tiation step k, i.e., pi

ᐉ共k兲=Prob兵ai共k兲=Ai

ᐉ其. Vehicle Vidoes not

know the utility UVi关a共k兲兴 before proposing its own target ai共k兲.

Accordingly, before selecting ai共k兲,k⬎1, vehicle Vicomputes its

average regret

RVi

ᐉ共k兲ª1

k−1兺

m=1

k−1

兵

UVi„Ai

ᐉ,a−i共m兲…−UVi„a共m兲…

其

for not proposing Ai

ᐉin all past negotiation steps, assuming that

the proposed targets of all other vehicles remain unaltered.

Clearly, vehicle Vican compute RVi

ᐉ共k兲using the recursion

Journal of Dynamic Systems, Measurement, and Control SEPTEMBER 2007, Vol. 129 / 589

Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm

RVi

ᐉ共k+1兲=k−1

kRVi

ᐉ共k兲+1

k

兵

UVi„Ai

ᐉ,a−i共k兲…−UVi„a共k兲…

其

,k⬎1

We note that, at any step k⬎1, vehicle Viupdates all entries in

its average regret vector RVi共k兲ª关RVi

1共k兲,... ,RVi

兩Ai兩共k兲兴T, whose di-

mension is 兩Ai兩. In particular, the vehicles do not need to compute

the empirical frequencies of the past proposals made by any ve-

hicle and do not need to compute their expected utilities based on

the empirical frequencies. We also note that it is sufﬁcient for

vehicle Vi, at step k⬎1, to have access to ai共k−1兲and

UVi(Ai

ᐉ,a−i共k−1兲)for all ᐉ苸兵1,... ,兩Ai兩其. In other words, it is

sufﬁcient for vehicle Vi’s to have access to its proposal at step k

−1 and its actual utility UVi(a共k−1兲)received at step k−1 as well

as its hypothetical utilities UVi(Ai

ᐉ,a−i共k−1兲), which would have

been received if it had proposed target Ai

ᐉ关instead of ai共k−1兲兴 and

all other vehicle proposals a−i共k−1兲had remained unchanged at

step k−1.

Once vehicle Vicomputes its average regret vector, RVi共k兲,it

proposes a target ai共k兲,k⬎1, according to the probability distri-

bution

pi共k兲=

关RVi共k兲兴+

1T关RVi共k兲兴+

provided that the denominator above is positive; otherwise, pi共k兲

is the uniform distribution over Ai关pi共1兲苸⌬共兩Ai兩兲 is always ar-

bitrary兴. Roughly speaking, a vehicle using regret matching pro-

poses a particular target at any step with probability proportional

to the average regret for not playing that particular target in the

past negotiation steps. It turns out that the average regret of a

vehicle using regret matching would asymptotically vanish 共simi-

lar results hold for different regret based adaptive dynamics兲; see

关30–32兴. Although this result characterizes the long-term behavior

of regret matching in general games, it need not imply that the

negotiations of vehicles using regret matching will converge to a

pure equilibrium assignment proﬁle when vehicle utilities consti-

tute a potential game, an objective which we will pursue in Sec.

4.2.

4.2 Generalized Regret Monitoring With Fading Memory

and Inertia. To enable convergence to a pure equilibrium in po-

tential games, we will modify regret matching in two ways. First,

we will assume that each vehicle has a fading memory; that is,

each vehicle exponentially discounts the inﬂuence of its past re-

gret in the computation of its average regret vector. More pre-

cisely, each vehicle computes a discounted average regret vector

according to the recursion

R

˜

Vi

ᐉ共k+1兲=共1−

␳

兲R

˜

Vi

ᐉ共k兲+

␳

兵UVi„Ai

ᐉ,a−i共k兲…

−UVi„a共k兲…其, for all ᐉ苸兵1, ...,兩Ai兩其

where

␳

苸共0,1兴is a parameter with 1−

␳

being the discount fac-

tor, and R

˜

Vi

ᐉ共1兲=0.

Second, we will assume that each vehicle proposes a target

based on its discounted average regret using some inertia. There-

fore, each vehicle Viproposes a target ai共k兲, at step k⬎1, accord-

ing to the probability distribution

␣

i共k兲RMi„R

˜

Vi共k兲…+关1−

␣

i共k兲兴vai共k−1兲

where

␣

i共k兲is a parameter representing vehicle Vi’s willingness to

optimize at time k,vai共k−1兲is the vertex of ⌬共兩Ai兩兲 corresponding

to the target ai共k−1兲proposed by vehicle Viat step k−1, and

RMi:R兩Ai兩→⌬共兩Ai兩兲 is any continuous function satisfying

xᐉ⬎0⇔RMi

ᐉ共x兲⬎0 and 1T关x兴+=0⇒RMi共x兲=1

兩Ai兩1

共8兲

where xᐉand RMi

ᐉ共x兲are the ᐉth components of xand RMi共x兲,

respectively.

We will call the above dynamics generalized regret monitoring

共RM兲with fading memory and inertia. The reason behind the term

“monitoring” is that the algorithm leaves as unspeciﬁed how an

agent reacts to regrets through the function RMi共·兲. One particular

choice for the function RMiis

RMi共x兲=关x兴+

1T关x兴+共when 1T关x兴+⬎0兲

which leads to regret matching with fading memory and inertia.

Another particular choice is

RMi

ᐉ共x兲=e1

␶

xᐉ

兺xm⬎0e1

␶

xmI兵xᐉ⬎0其共when 1T关x兴+⬎0兲共9兲

where

␶

⬎0 is a parameter. Note that, for small values of

␶

, ve-

hicle Viwould choose, with high probability, the target corre-

sponding to the maximum regret. This choice leads to a stochastic

variant of an algorithm called joint strategy ﬁctitious play 共with

fading memory and inertia兲; see 关22兴. Also, note that, for large

values of

␶

,Viwould choose any target having positive regret with

equal probability.

According to these rules, vehicle Viwill stay with its previous

proposal ai共k−1兲with probability 1−

␣

i共k兲regardless of its regret.

We make the following standing assumption on the vehicles’ will-

ingness to optimize.

ASSUMPTION 4.1. There exist constants ␧and ␧

¯

such that

0⬍␧⬍

␣

i共k兲⬍␧

¯

⬍1

for all time k⬎1and for all i苸兵1, ...,nv其.

This assumption implies that vehicles are always willing to op-

timize with some nonzero inertia.2The following theorem estab-

lishes the convergence of generalized regret monitoring with fad-

ing memory and inertia to a pure equilibrium.

THEOREM 4.1. Assume that vehicle utilities constitute an ordinal

potential game3and no vehicle is indifferent between distinct

strategies,i.e.,

UVi共ai

1,a−i兲⫽UVi共ai

2,a−i兲,∀ai

1,ai

2苸Ai,ai

1⫽ai

2,∀a−i

苸A−i,∀i苸兵1, ... ,nv其

Then,the target proposals a共t兲generated by generalized regret

monitoring with fading memory and inertia satisfying Assumption

4.1 converge to a pure Nash equilibrium almost surely.

Proof. We will state and prove a series of claims. The ﬁrst claim

states that if a vehicle proposes a target with positive 共discounted

average兲regret, then all subsequent target proposals will also have

positive regret.

CLAIM 4.1. Fix any k0⬎1. Then,R

˜

Vi

ai共k0兲共k0兲⬎0⇒R

˜

Vi

ai共k兲共k兲⬎0

for all k⬎k0.

Proof. Suppose R

˜

Vi

ai共k0兲共k0兲⬎0. If ai共k0+1兲=ai共k0兲, then

R

˜

Vi

ai共k0+1兲共k0+1兲=共1−

␳

兲R

˜

Vi

ai共k0兲共k0兲⬎0

If ai共k0+1兲⫽ai共k0兲, then

R

˜

Vi

ai共k0+1兲共k0+1兲⬎0

The argument can be repeated to show that R

˜

Vi

ai共k兲共k兲⬎0, for all

k⬎k0.䊐

Deﬁne

2This assumption can be relaxed to holding for sufﬁciently large k, as opposed to

all k.

3This theorem also holds in the more general weakly acyclic games, see 关33兴.

590 / Vol. 129, SEPTEMBER 2007 Transactions of the ASME

Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm

Muªmax兵UVi共a兲:a苸A,Vi苸V其

muªmin兵UVi共a兲:a苸A,Vi苸V其

␦

ªmin兵兩UVi共a1兲−UVi共a2兲兩:a1,a2苸A,a−i

1=a−i

2,兩UVi共a1兲

−UVi共a2兲兩 ⬎0,Vi苸V其

Nªmin

再

n苸兵1,2, . .. 其:共1−共1−

␳

兲n兲

␦

−共1−

␳

兲n共Mu−mu兲

⬎

␦

2

冎

fªmin

再

RMi

m共x兲:兩xᐉ兩ⱕMu−mu,∀ᐉ,xm

ⱖ

␦

2, for one m,∀Vi苸V

冎

Note that

␦

,f⬎0, and 兩R

˜

Vi

ai共k兲兩ⱕMu−mu, for all Vi苸V,ai苸Ai,

k⬎1.

The second claim states that if the current proposal is a strict

Nash equilibrium and if the proposal is repeated a sufﬁcient num-

ber of times, then all subsequent proposals will also be that Nash

equilibrium.

CLAIM 4.2. Fix k0⬎1. Assume

1. a共k0兲is a strict Nash equilibrium,and

2. R

˜

Vi

ai共k0兲共k0兲⬎0for all Vi苸V,and

3. a共k0兲=a共k0+1兲=¯=a共k0+N−1兲.

Then,a共k兲=a共k0兲,for all kⱖk0.

Proof. For any Vi苸Vand any ai苸Ai, we have

R

˜

Vi

ai共k0+N兲=共1−

␳

兲NR

˜

Vi

ai共k0兲+关1−共1−

␳

兲N兴兵UVi„ai,a−i共k0兲…

−UVi„ai共k0兲,a−i共k0兲…其

Since a共k0兲is a strict Nash equilibrium, for any Vi苸Vand any

ai苸Ai,ai⫽ai共k兲, we have

UVi„ai,a−i共k0兲…−UVi„ai共k0兲,a−i共k0兲…ⱕ−

␦

Therefore,

R

˜

Vi

ai共k0+N兲ⱕ共1−

␳

兲N共Mu−mu兲−关1−共1−

␳

兲N兴

␦

⬍−

␦

2⬍0

We also know that, for all Vi苸V,

R

˜

Vi

ai共k0兲共k0+N兲=共1−

␳

兲NR

˜

Vi

ai共k0兲共k0兲⬎0

This proves the claim. 䊐

The third claim states that if the current proposal is not a Nash

equilibrium and if the proposal is repeated a sufﬁcient number of

times, then the subsequent assignment proposal will have a higher

global utility with at least a ﬁxed probability.

CLAIM 4.3. Fix k0⬎1. Assume

1. a共k0兲is not a Nash equilibrium,and

2. a共k0兲=a共k0+1兲=¯=a共k0+N−1兲

Let a*=(ai

*,a−i共k0兲)be such that

UVi„ai

*,a−i共k0兲…⬎UVi„ai共k0兲,a−i共k0兲…

for some Vi苸Vand some ai

*苸Ai.Then,R

˜

Vi

ai

*共k0+N兲⬎

␦

/2, and a*

will be proposed at step k0+N with at least probability

␥

ª共1

−

⑀

¯

兲nv−1

⑀

f.

Proof. We have

R

˜

Vi

ai

*共k0+N兲ⱖ−共1−

␳

兲N共Mu−mu兲+关1−共1−

␳

兲N兴

␦

⬎

␦

2

Therefore, the probability of vehicle Viproposing ai

*at step k0

+Nis at least

⑀

f. Because of players’ inertia, the probability that

all vehicles will propose action a*at step k0+Nis at least 共1

−

⑀

¯

兲nv−1

⑀

f.䊐

The fourth claim speciﬁes an event and associated probability

that guarantees that all vehicles will only propose targets with

positive regret.

CLAIM 4.4. Fix k0⬎1. We have R

˜

Vi

ai共k兲共k兲⬎0for all kⱖk0

+2Nnvand for all Vi苸Vwith probability at least

兿

i=1

nv1

兩Ai兩

␥

共1−

⑀

¯

兲2Nnv

Proof. Let a0ªa共k0兲. Suppose R

˜

Vi

ai

0共k0兲ⱕ0. Furthermore, sup-

pose that a0is repeated Nconsecutive times, i.e., a共k0兲=¯

=a共k0+N−1兲=a0, which occurs with at least probability at least

共1−

⑀

¯

兲nv共N−1兲.

If there exists an a*=共ai

*,a−i

0兲such that UVi共a*兲⬎Ui共a0兲, then,

by Claim 4.3, R

˜

Vi

ai

*共k0+N兲⬎

␦

/2 and a*will be proposed at step

k0+Nwith at least probability

␥

. Conditioned on this, we know

from Claim 4.1 that R

˜

Vi

ai共k兲共k兲⬎0 for all kⱖk0+N.

If there does not exist such an action a*, then R

˜

Vi

ai共k0+N兲⬍0 for

all ai苸Ai. A proposal proﬁle 共ai

w,a−i

0兲with UVi共ai

w,a−i

0兲⬍Ui共a0兲

will be proposed at step k0+Nwith at least probability 共1/兩Ai兩兲

⫻共1−

⑀

¯

兲nv−1.Ifa共k0+N兲=共ai

w,a−i

0兲, and if, furthermore, 共ai

w,a−i

0兲is

repeated Nconsecutive times, i.e., a共k+N兲=¯=a共k+2N−1兲,

which happens with probability at least 共1−

⑀

¯

兲nv共N−1兲, then, by

Claim 4.3, R

˜

Vi

ai

0共k0+2N兲⬎

␦

/2 and the joint target a0will be pro-

posed at step 共k0+2N兲with at least probability

␥

. Conditioned on

this, we know from Claim 4.1 that R

˜

Vi

ai共k兲共k兲⬎0 for all kⱖk0

+2N.

In summary, R

˜

Vi

ai共k兲共k兲⬎0 for all kⱖk0+2Nwith at least prob-

ability

1

兩Ai兩

␥

共1−

⑀

¯

兲2Nnv

We can repeat this argument for each vehicle to show that

R

˜

Vi

ai共k兲共k兲⬎0 for all times kⱖk0+2Nnvand for all Vi苸Vwith

probability, at least

兿

i=1

nv1

兩Ai兩

␥

共1−

⑀

¯

兲2Nnv

䊐

Final Step: Establishing Convergence to a Pure Nash Equilib-

rium. Fix k0⬎1. Let k1ªk0+2Nnv. Suppose R

˜

Vi

ai共k兲共k兲⬎0 for all

kⱖk1and for all Vi苸V, which, by Claim 4.4, occurs with prob-

ability, at least

兿

i=1

nv1

兩Ai兩

␥

共1−

⑀

¯

兲2Nnv

Suppose further that a共k1兲=¯=a共k1+N−1兲which occurs with at

least probability 共1−

⑀

¯

兲nv共N−1兲.Ifa共k兲is a Nash equilibrium, then

by Claim 4.1, we are done. Otherwise, according to Claim 4.3, a

proposal proﬁle a⬘=(ai

⬘,a−i共k1兲)with UVi共a⬘兲⬎UVi(a共k1兲)for

some Vi苸Vwill be played at step k1+Nwith at least probability

␥

. Note that this would imply Ug(a共k1+N兲)⬎Ug(a共k1兲). Suppose

now a共k1+N兲=¯=a共k1+2N−1兲, which occurs with at least

Journal of Dynamic Systems, Measurement, and Control SEPTEMBER 2007, Vol. 129 / 591

Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm

probability 共1−

⑀

¯

兲nv共N−1兲.Ifa⬘is a Nash equilibrium, then, by

Claim 4.2, we are done. Otherwise, according to Claim 4.3, a

proposal proﬁle a⬙=共ai

⬙,a−i

⬘兲with UVi共a⬙兲⬎UVi(a共k1+N兲)for

some Vi苸Vwill be played at step k1+2Nwith at least probability

␥

. Note that this would imply Ug(a共k1+2N兲)⬎Ug(a共k1+N兲).

Note that this procedure can only be repeated a ﬁnite number of

times because the global utility is strictly increasing each time.

We can repeat the above arguments until we reach a pure Nash

equilibrium a*and stay at a*for Nconsecutive steps. This means

that there exists constants

⑀

˜

⬎0 and T

˜

⬎0, both of which are in-

dependent of k0, such that the following event happens with at

least probability

⑀

˜

:a共k兲=a*for all kⱖk0+T

˜

. This proves Theorem

4.1. 䊐

Note that an agreed assignment that emerges from generalized

RM with fading memory and inertia can be suboptimal. Charac-

terizing the equilibrium selection properties in potential games

still remains as an open problem. As in FP, regret-based dynamics

introduced above would require communication of proposed tar-

get assignments as part of a negotiation process. FP is guaranteed

to converge for potential games but requires an individual vehicle

to process the empirical frequencies of all other vehicles that af-

fect its utility and to use these empirical frequencies to compute

the maximizer of its expected utility. Generalized RM with fading

memory and inertia is guaranteed to converge to a pure equilib-

rium in almost all 共ordinal兲potential games; however, its compu-

tational requirements are signiﬁcantly lower. It only requires an

individual vehicle to process an average regret vector whose di-

mension is its number of targets and to select a 共randomized兲

target based on the positive part of its average regret vector.

4.3 Review: One-Step Memory Spatial Adaptive Play. The

previous negotiation mechanisms were called recursive averaging

algorithms since they maintained a running average 共or fading

memory average兲of certain variables, e.g., averaged actions of

other players 共FP兲or averaged regret measures 共RM兲. These algo-

rithms have “inﬁnite memory” in that the long-term effect of a

measured variable may diminish but is never completely

eliminated.

In this section, we will consider an opposite extreme, namely, a

speciﬁc one-step memory algorithm called spatial adaptive play.

共SAP兲spatial adaptive play was introduced in 关9兴共Chap. 6兲

共which also reviews other multistep memory algorithms兲as a

learning process for games played on graphs. SAP can be a very

effective negotiation mechanism in our autonomous vehicle-target

assignment problem because it would have low computational

burden on each vehicle and it would lead to an optimal solution in

potential games with arbitrarily high probability.

Unlike the other negotiation mechanisms we considered thus

far, at any step of SAP negotiations, one vehicle is randomly

chosen, where each vehicle is equally likely to be chosen, and

only this chosen vehicle is given the chance to update its proposed

target.4Let a共k−1兲denote the proﬁle of proposed targets at step

k−1. At step k, the vehicle that is given the chance to update its

proposed target, say vehicle Vi, proposes a target according to a

probability distribution pi共k兲苸⌬共兩Ai兩兲 that maximizes

pi

T共k兲

冤

UVi„Ai

1,a−i共k−1兲…

⯗

UVi„Ai

兩Ai兩,a−i共k−1兲…

冥

+

␶

H关pi共k兲兴

where H共·兲is the entropy function that rewards randomization

共see Nomenclature兲and

␶

⬎0 is a parameter that controls the level

of randomization. For any

␶

⬎0, the maximizing probability pi共k兲

is uniquely given by

pi共k兲=

␴

冢

1

␶

冤

UVi关Ai

1,a−i共k−1兲兴

⯗

UVi关Ai

兩Ai兩,a−i共k−1兲兴

冥

冣

where

␴

共·兲is the logit or soft-max function 共see Nomenclature兲.

For any

␶

⬎0, pi共k兲assigns positive probability to all targets in

Ai. We are interested in small values of

␶

⬎0 because then pi共k兲

approximately maximizes vehicle Vi’s 共unperturbed兲utility based

on other vehicles’ proposals at the previous step. For other inter-

pretations of the entropy term, see 关35,36兴; and for different ways

of randomization, see 关20兴.

The computational burden of SAP on each updating vehicle is

comparable to that of RM on each vehicle. Each vehicle needs to

observe and maintain the proposal proﬁle a共k兲共actually, only the

relevant part of a共k兲兲. If given the chance to update its proposal,

vehicle Vineeds to call its utility function evaluator only 兩Ai兩

times. Because only one vehicle updates its proposal at a given

negotiation step, the convergence of negotiations may be slow

when there are large number of vehicles.5However, if the vehicles

have a relatively small number of common targets in their target

sets, then multiple vehicles can be allowed to update their propos-

als at a given step as long as they do not have common targets.

Allowing such multiple updates may potentially speed up the ne-

gotiations substantially. In our simulations summarized in Sec. 5,

typically SAP provided convergence to a near-optimal assignment

faster than the most other negotiations mechanisms.

4.4 Selective Spatial Adaptive Play. We will now introduce

“selective spatial adaptive play” 共sSAP兲for the cases where a

vehicle has a large number of targets in its target set or calling its

utility function evaluator is computationally expensive. We will

parameterize sSAP with n=共n1, ...,nnv兲where 1ⱕniⱕ兩Ai兩−1

represents the number of times that vehicle Vicalls its utility

function evaluator when it is given the chance to update its pro-

posal. Let us say that vehicle Vi, using sSAP, is given the chance

to update its proposal at step k. First, vehicle Visequentially se-

lects nitargets from Ai\兵ai共k−1兲其 without replacement where

each target is selected independently and with uniform probability

over the remaining targets. Call these selected targets

Ai

ᐉ1共k兲,... ,Ai

ᐉni共k兲, and let Ai

ᐉ0共k兲ªai共k−1兲be appended to these

set of selected targets. Then, at step k,vehicle Viproposes a target

according to the probability distribution

pi共k兲=

␴

冢

1

␶

冤

UVi关Ai

ᐉ0共k兲,a−i共k−1兲兴

⯗

UVi关Ai

ᐉni共k兲,a−i共k−1兲兴

冥

冣

for some

␶

⬎0. In other words, at step k, vehicle Viproposes a

target to approximately maximize its own utility based on the

selected targets Ai

ᐉ0共k兲,... ,Ai

ᐉni共k兲and other vehicles’ proposals

at the previous step. Thus, to compute pi共k兲, vehicle Vi, needs to

call its utility function evaluator only nitimes where niⱖ1 could

be much smaller than 兩Ai兩. It turns out that we can characterize the

4We will not deal with the issue of how the autonomous vehicles can randomly

choose exactly one vehicle 共or multiple vehicles with no common targets兲to update

its proposal without centralized coordination. In actuality, such asynchronous updat-

ing may be easier to implement than implementing the aforementioned negotiation

mechanisms that require synchronous updating. One possible implementation of

asynchronous updating would be similar to the implementation of well known Aloha

protocol in multiaccess communication, where multiple transmitting nodes attempt to

access a single communication channel without colliding with each other 关34兴.

5If SAP is used as a centralized optimization tool, then the computational burden

at each step will be very small because only one entry in a共k兲will be updated at each

step.

592 / Vol. 129, SEPTEMBER 2007 Transactions of the ASME

Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm

long-term behavior of sSAP quite precisely following along simi-

lar lines of proof of Theorem 6.1 in 关9兴.

THEOREM 4.2. Assume that the vehicle utilities constitute a po-

tential game where the global utility Ugis a potential function.

Then,the target proposals a共k兲generated by sSAP satisfy

lim

␶

↓0

lim

k→⬁

Prob兵a共k兲is an optimal target assignment profile其

=1

Proof. sSAP induces an irreducible Markov process where the

state space is Aand the state at step kis the proﬁle a共k兲of

proposed targets. The empirical frequencies of the visited states

converge to the unique stationary distribution of this induced Mar-

kov process. As in Theorem 6.1 in 关9兴, we show that, this station-

ary distribution, denoted by

␮

␶

, is given as

␮

␶

共a兲=e共1/

␶

兲Ug共a兲

兺a

¯

苸Ae共1/

␶

兲Ug共a

¯

兲,∀a苸A

by verifying the detailed balance equations

␮

␶

共a兲Prob兵a→b其=

␮

␶

共b兲Prob兵b→a其,∀a,b苸A

The only nontrivial case that requires the veriﬁcation of the above

equations is when aand bdiffer in exactly one position. Fix aand

bsuch that ai⫽biand a−i=b−i. Then, we have

Prob兵a→b其

=1

nv兺

共a0,. . .,ani兲苸S共a,b兲

1

共兩Ai兩−1兲¯共兩Ai兩−ni兲

e共1/

␶

兲UVi共b兲

兺j=0

nie共1/

␶

兲UVi共aj兲

where

S共a,b兲=兵共a0, ... ,ani兲苸Ani+1:共a−i

j=a−i,∀j兲共a0=a兲共aj

=b, for one j兲,共aj⫽am,∀j,m兲其

It is now straightforward to see that

Prob兵a→b其

Prob兵b→a其=e共1/

␶

兲关UVi共b兲−UVi共a兲兴 =e共1/

␶

兲关Ug共b兲−Ug共a兲兴 =

␮

␶

共b兲

␮

␶

共a兲

Therefore,

␮

␶

is indeed as given above, and it can be written, in

the alternative vector form, as

␮

␶

=

␴

冉

1

␶

Ug

冊

where, by an abuse of notation, Ugis also used to represent a

vector whose “ath entry” equals Ug共a兲. Finally, the fact that the

Markov process induced by sSAP with

␶

⬎0 being irreducible and

aperiodic readily leads to the desired result. 䊐

Thus, in the setup above,

␮

␶

assigns arbitrarily high probability

to those assignment proﬁles that maximize a potential function for

the game as

␶

↓0. Clearly, this result indicates that in the case of

vehicle utilities IIU 共4兲, RRU 共5兲,orWLU共7兲, sSAP negotiations

would lead to an optimal target assignment with arbitrarily high

probability provided that

␶

⬎0 is chosen sufﬁciently small. Of

course, one can gradually decrease

␶

to allow initial exploration.

We believe that one can obtain convergence, in probability, of

proposals a共k兲to an optimal assignment if

␶

is decreased sufﬁ-

ciently slowly as in simulated annealing 关37,38兴. In our simula-

tions, choosing

␶

inversely proportional to k2during the negotia-

tions typically resulted in fast convergence of the proposals to a

near optimal assignment.

5 Simulation Results

In this section, we present some numerical results to illustrate

that when the individual utility functions and the negotiation

mechanisms are properly selected the autonomous vehicles can

agree on a target assignment proﬁle that yield near-optimal global

utility. We consider two scenarios. In the ﬁrst scenario, we illus-

trate the near optimality of our approach by simulating a special

case of the well-known weapon target assignment model where an

optimal assignment can be obtained for large number of weapons

and targets in a short period of time 关2兴. In the second scenario,

we simulate a general instance of the problem and compare vari-

ous negotiation algorithms in terms of their performance and

speed of convergence.

Scenario 1. Here, the vehicles are identical and have zero val-

ues, whereas the targets are different and have positive values.

Each vehicle can be assigned to any of the targets.6Let Vjbe the

value of target Tjand pjbe the probability that target Tjgets

eliminated when only a single vehicle engages target Tj. When

multiple vehicles are assigned to target Tj, each of the vehicles is

assumed to engage target Tj, independently. Hence, if the number

of vehicles engaging target Tjis xj, then Tjwill be eliminated with

probability 1−共1−pj兲xj. Therefore, as a function of the assignment

proﬁle a, the utility generated by the engagement with target Tjis

given by

UTj共a兲=Vj关1−共1−pj兲兺i=1

nvI兵ai=Tj其兴

which leads to the following global utility function:

Ug共a兲=兺

j=1

nt

Vj关1−共1−pj兲兺i=1

nvI兵ai=Tj其兴

Given the parameters nv,nt,V1,... ,Vnt, and p1,...,pnt, an opti-

mal vehicle-target assignment that maximizes the global utility

function given above can be quickly obtained using an iterative

procedure called minimum marginal return algorithm 关2兴.

To test the effectiveness of our approach, we simulated the

vehicle negotiations using the above model with 200 vehicles and

200 targets in MATLAB on a single personal computer with

1.4 GHz Pentium共R兲M processor and 1.1 GB of RAM. Each of

the target values, V1,... ,V200, and each of the elimination prob-

abilities, p1,... ,p200, are once independently chosen according to

uniform probability distribution on 关0,1兴and thereafter kept con-

stant throughout the simulations. We ﬁrst conducted 100 runs of

generalized RM negotiations 共RMifunction is as in 共9兲,

␳

=0.1,

␣

=0.5兲with WLU utilities 共7兲, where each negotiation consisted

of 100 steps. We then repeated this with 100 runs of SAP nego-

tiations with WLU utilities 共7兲where each run consisted of 1000

steps. We also conducted 100 runs of utility based FP negotiations

with WLU utilities 共7兲, where each negotiation consisted of 1000

steps. In all cases, the randomization level

␶

is decreased as 10/k2,

where kis the negotiation step. Evolution of global utility during

typical runs of generalized RM, SAP, and utility-based FP nego-

tiations is shown in Fig. 4. Also, the global utility corresponding

to the assignment proﬁle at the end of each run of negotiations and

the CPU time required for each run were recorded. A summary of

these numerical results is provided in Table 1.

All negotiations consistently yielded near-optimal assignments.

Global utility generated by SAP negotiations were almost always

monotonically increasing, whereas global utility generated by

generalized RM and utility-based FP negotiations exhibited ﬂuc-

tuations as seen in Fig. 4.

In any SAP negotiation step, only one vehicle calls its utility

function evaluator 200 times; whereas in any generalized RM ne-

gotiation step, all vehicles call their utility function evaluators

共200 times for each vehicle兲. As a result, although a typical gen-

eralized RM negotiation converged in 100 steps as opposed to

1000 steps in the case of SAP, a typical 100 step generalized RM

negotiation took 593 s CPU time, on average, whereas a typical

1000-step SAP negotiation took 49 s CPU time, on average. How-

ever, it is important to note that these numbers reﬂect sequential

6Note that there is no reason to consider a null target T0here.

Journal of Dynamic Systems, Measurement, and Control SEPTEMBER 2007, Vol. 129 / 593

Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm

CPU time. In an actual implementation, individual vehicles will

call their utility function evaluators in parallel. The “parallel”

CPU time in Table 1 is the overall CPU time divided by the

number of vehicles. It is a rough reﬂection of what would be the

actual implementation time in a parallel implementation. We see

that generalized RM is actually faster than SAP. The parallel time

in SAP is the same as the sequential CPU time because only one

vehicle updates its strategy per iteration.

In the case of utility-based FP, all vehicles call their utility

function evaluators at each negotiation step but only once for each

vehicle. This can be contrasted with generalized RM, which re-

quires a utility function evaluation for every possible target.

Utility-based FP took 1000 negotiation steps to approach the op-

timal global utility, but using 67 s CPU time, on average 共or

0.33 s in parallel兲, which is also faster than the average CPU time

used by RM, despite utility-based FP requiring more iterations.

For this scenario, action-based FP would impose enormous

computational burden on each vehicle since a vehicle using action

FP would have to keep track of the empirical frequencies of the

choices of 199 other vehicles and compute its expected utility

over a decision space of dimension 200200 at every negotiation

step. However, the numerical results presented above verify that

autonomous vehicles can quickly negotiate and agree on an as-

signment proﬁle that yields near optimal global utility when ve-

hicle utilities and negotiation mechanisms are chosen properly.

Scenario 2. In this scenario, we consider a more general in-

stance of the weapon target assignment problem, where we have

virtually no way of computing the optimal global utility. The setup

in this scenario is similar to the one in Scenario 1, except that the

vehicles are not identical and are also range restricted. More spe-

ciﬁcally, each vehicle still has zero value, but the probability pij

that target Tjgets eliminated when only vehicle Viengages target

Tjdiffers from vehicle to vehicle. Each of the elimination prob-

abilities, pij,0ⱕi,jⱕ200, is once independently chosen accord-

ing to uniform probability distribution on 关0,1兴and thereafter

kept constant throughout the simulations. Each vehicle Vihas 20

targets in its range Aiand the targets in Aiare chosen from the set

of all targets with equal probability and independently of the other

vehicles. Therefore, a pair of two vehicles may have some com-

mon as well as distinct targets in their ranges. As in Scenario 1,

the target values V1,... ,V200 are chosen independently and ac-

cording to uniform probability distribution on 关0,1兴. Therefore, as

a function of the assignment proﬁle a, the utility generated by the

engagement with target Tjis given by

UTj共a兲=Vj

冋

1− 兿

i:Tj苸Ai

共1−pij兲

册

which leads to the following global utility function

Ug共a兲=兺

j=1

nt

Vj

冋

1− 兿

i:Tj苸Ai

共1−pij兲

册

Using the same computational resources and the same setup as

in Scenario 1, we simulated the vehicle negotiations on the above

model. Evolution of global utility during typical runs of general-

ized RM, SAP, and utility-based FP negotiations is shown in Fig.

5. The global utility corresponding to the assignment proﬁle at the

end of each run of negotiations and the CPU time required for

each run were recorded. A summary of these numerical results is

provided in Table 2.

All negotiations eventually settled at some assignment proﬁles,

leading to comparable global utility as shown in Fig. 5 and Table

2. The convergence in this scenario was slower for all negotiation

mechanisms. The reason for this is that the vehicles in this sce-

nario are not identical and range restricted, and as a result, com-

puting each vehicle’s utility is computationally more demanding.

The relative timings in both CPU time and convergence rates are

similar to those in Scenario 1.

Action-based FP was computationally infeasible for this sce-

nario as well for the same reasons stated earlier, i.e., its enormous

computational burden on each vehicle.

The numerical results presented above show that autonomous

vehicles can quickly negotiate and agree on an 共possibly near-

optimal兲assignment proﬁle when vehicle utilities and negotiation

mechanisms are chosen properly. In all cases, vehicles only com-

municate with their “neighbors,” i.e., those vehicles that share a

common target. The difference between algorithms is in the num-

ber of vehicles that communicate per iteration. In SAP, only the

vehicle revising its assignment must communicate with its neigh-

bors. In generalized RM and utility-based FP, all vehicles must

communicate with their neighbors in every iteration. In Scenario

1, all vehicles share the same targets and thus, all vehicles are

Fig. 4 Evolution of global utility during typical runs of

negotiations

Table 1 Summary of simulation runs

Generalized RM SAP Utility-based FP

Average global utility ⲐOptimal global utility 0.99 0.99 0.98

Minimum global utility ⲐOptimal global utility 0.99 0.99 0.96

Average CPU time 共s兲593 共⬇3.0 parallel兲49 67 共⬇0.33 parallel兲

594 / Vol. 129, SEPTEMBER 2007 Transactions of the ASME

Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm

neighbors. In Scenario 2, the communication pattern is much

more sparse because of the limited vehicle ranges and distribution

of targets. The most communications savings per iteration is for

SAP. However, SAP showed more iterations required for conver-

gence.

6 Conclusions

We introduced an autonomous vehicle-target assignment prob-

lem as a multiplayer game where the vehicles are self-interested

players with their own individual utility functions. We emphasized

rational decision making on the part of the vehicles to develop

autonomous operation capability in uncertain and adversarial en-

vironments. To achieve optimality with respect to a global utility

function, we discussed various aspects of the design of the vehicle

utilities, in particular, alignment with a global utility function and

localization. We reviewed selected multiplayer learning algo-

rithms available in the literature. We introduced two new algo-

rithms that address the informational and computation require-

ment of existing algorithms, namely, generalized RM with fading

memory and inertia and selective spatial adaptive play, and pro-

vided accompanying convergence proofs. Finally, we discussed

these learning algorithms in terms of convergence, equilibrium

selection, and computational efﬁciency, and illustrated the

achievement of a global utility in a near-optimal fashion through

autonomous vehicle negotiations.

We end by pointing to a signiﬁcant extension of this work, the

case where the vehicle-target assignments need to be made se-

quentially over a time horizon 关2兴. In this case, the assignment

decisions made by the vehicles at a given time step 共probabilisti-

cally兲determines the future games to be played by the vehicles.

Therefore, the vehicles need to take the future utilities into ac-

count in their negotiations. A natural framework to study such

problems of sequential decision making in a competitive multi-

player setting is the framework of Markov games 关39,40兴. Extend-

ing the approach taken in this paper to a Markov game setup

requires signiﬁcant future work.

Acknowledgment

Research supported by NSF Grant No. ECS-0501394, AFOSR/

MURI Grant No. F49620-01-1-0361, and ARO Grant No.

W911NF–04–1–0316.

Nomenclature

兩A兩⫽number of elements in A, for a ﬁnite set A

I兵·其⫽indicator function

Rn⫽ndimensional Euclidian space, for a positive

integer n

1⫽vector

共

1

⯗

1

兲

苸Rn

共·兲T⫽transpose operation

⌬共n兲⫽simplex in Rn, i.e.,

兵s苸Rn兩sⱖ0 componentwise, and 1Ts=1其

Int(⌬共n兲)⫽set of interior points of a simplex, i.e., s⬎0

componentwise

H:Int共⌬共n兲兲

→R⫽entropy function H共x兲=−xTlog共x兲

␴

:Rn→⌬共n兲⫽“logit” or “soft-max” function (

␴

共x兲)i=exi/共ex1

+¯+exn兲

关x兴+苸Rn⫽vector whose ith entry equals max共xi,0兲, for x

苸Rn

References

关1兴Olfati-Saber, R., 2006, “Flocking for Multi-Agent Dynamic Systems: Algo-

rithms and Theory,” IEEE Trans. Autom. Control, 51, pp. 401–420.

关2兴Murphey, R. A., 1999, “Target-Based Weapon Target Assignment Problems,”

Nonlinear Assignment Problems: Algorithms and Applications, Pardalos, P.

M., and Pitsoulis, L. S., ed., pp. 39–53, Kluwer, Dordrecht.

关3兴Ahuja, R. K., Kumar, A., Jha, K., and Orlin, J. B., 2003, “Exact and Heuristic

Methods for the Weapon-Target Assignment Problem,” http://ssrn.com/abstract

⫽489802

关4兴Fudenberg, D., and Tirole, J., 1991, Game Theory, MIT Press, Cambridge,

MA.

关5兴Basar, T., and Olsder, G. J., 1999, Dynamic Noncooperative Game Theory,

SIAM, Philadelphia.

关6兴Wolpert, D. H., and Tumor, K., 2001, “Optimal Payoff Functions for Members

of Collectives,” Adv. Complex Syst., 4共2&3兲, pp. 265–279.

关7兴Monderer, D., and Shapley, L. S., 1996, “Potential Games,” Games Econ.

Behav., 14, pp. 124–143.

关8兴Fudenberg, D., and Levine, D. K., 1998, The Theory of Learning in Games,

MIT Press, Cambridge, MA.

关9兴Young, H. P., 1998, Individual Strategy and Social Structure: An Evolutionary

Theory of Institutions, Princeton University Press, Princeton, NJ.

关10兴Wolpert, D., and Tumor, K., 2004, “A Survey of Collectives,” in Collectives

and the Design of Complex Systems, K. Tumer and D. Wolpert, eds., Springer-

Verlag, New York, NY, p. 142.

关11兴Miettinen, K. M., 1998, Nonlinear Multiobjective Optimization, Kluwer, Dor-

drecht.

关12兴Rosenthal, R. W., 1973, “A Class of Games Possessing Pure-Strategy Nash

Equilibria,” Int. J. Game Theory, 2, pp. 65– 67.

关13兴Mas-Colell, A., Whinston, M. D., and Green, J. R., 1995, Microeconomic

Theory, Oxford University Press, London.

关14兴Benaim, M., and Hirsch, M. W., 1999, “Mixed Equilibria and Dynamical

Systems Arising From Fictitious Play in Perturbed Games,” Games Econ. Be-

Fig. 5 Evolution of global utility during typical runs of

negotiations

Table 2 Summary of simulation runs

Generalized

RM SAP

Utility-based

FP

Global utility 87.62 85.24 85.49

Average CPU time 共s兲2707

共⬇13.5 parallel兲

382 529

共⬇2.64 parallel兲

Journal of Dynamic Systems, Measurement, and Control SEPTEMBER 2007, Vol. 129 / 595

Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm

hav., 29, pp. 36–72.

关15兴Brown, G. W., 1951, “Iterative Solutions of Games by Fictitious Play,” Activ-

ity Analysis of Production and Allocation, Koopmans, T. C., ed., Wiley, New

York, pp. 374–376.

关16兴Monderer, D., and Shapley, L. S., 1996, “Fictitious Play Property for Games

With Identical Interests,” J. Econ. Theory, 68, pp. 258–265.

关17兴Curtis, J. W., and Murphey, R., 2003, “Simultaneous Area Search and Task

Assignment for a Team of Cooperative Agents,” AIAA Guidance, Navigation,

and Control Conference and Exhibit, August, Austin, Texas, AIAA, pp. 2003–

5584.

关18兴Hofbauer, J., 1995, “Stability for the Best Response Dynamics,” University of

Vienna, Vienna, Austria, http://homepage.univie.ac.at/josef.hofbauer/br.ps

关19兴Krishna, V., and Sjöström, T., 1998, “On the Convergence of Fictitious Play,”

Math. Op. Res., 23, pp. 479–511.

关20兴Hofbauer, J., and Sandholm, B., 2002, “On the Global Convergence of Sto-

chastic Fictitious Play,” Econometrica, 70, pp. 2265–2294.

关21兴Lambert, T. J., III, Epelman, M. A., and Smith, R. L., 2005, “A Fictitious Play

Approach to Large-Scale Optimization,” Oper. Res., 53共3兲, pp. 477–489.

关22兴Marden, J. R., Arslan, G., and Shamma, J. S., 2005, “Joint Strategy Fictitious

Play with Inertia for Potential Games,” Proc. of 44th IEEE Conference on

Decision and Control, Dec., pp. 6692–6697.

关23兴Fudenberg, D., and Levine, D., 1998, “Learning in Games,” European Eco-

nomic Review, 42, pp. 631–639.

关24兴Fudenberg, D., and Levine, D. K., 1995, “Consistency and Cautious Fictitious

Play,” J. Econ. Dyn. Control, 19, pp. 1065–1089.

关25兴Sutton, R. S., and Barto, A. G., 1998, Reinforcement Learning: An Introduc-

tion, MIT Press, Cambridge, MA.

关26兴Bertsekas, D. P., and Tsitsiklis, J. N., 1996, Neuro-Dynamic Programming,

Athena Scientiﬁc, Belmont, MA.

关27兴Leslie, D., and Collins, E., 2003, “Convergent Multiple-Ttimescales Rein-

forcement Learning Algorithms in Normal form Games,” Ann. Appl. Probab.,

13, pp. 1231–1251.

关28兴Leslie, D., and Collins, E., 2005, “Individual Q-Learning in Normal Form

Games,” SIAM J. Control Optim., 44共2兲. pp. 495–514.

关29兴Leslie, D. S., and Collins, E. J., 2006, “Generalised Weakened Fictitious Play,”

Games and Economic Behavior, Vol. 56, issue 2, pages 285–298.

关30兴Hart, S., and Mas-Colell, A., 2000, “A Simple Adaptive Procedure Leading to

Correlated Equilibrium,” Econometrica, 68共5兲, pp. 1127–1150.

关31兴Hart, S., and Mas-Colell, A., 2001, “A General Class of Adaptative Strate-

gies,” J. Econ. Theory, 98, pp. 26–54.

关32兴Hart, S., and Mas-Colell, A., 2003, “Regret Based Continuous-Time Dynam-

ics,” Games Econ. Behav., 45, pp. 375–394.

关33兴Marden, J. R., Arslan, G., and Shamma, J. S., 2007, “Regret Based Dynamics:

Convergence in Weakly Acyclic Games,” Proc. of 6th International Joint Con-

ference on Autonomous Agents and Multi-Agent Systems, ACM Press, New

York, NY, pp. 194 –201.

关34兴Bertsekas, D., and Gallager, R., 1992, Data Networks, 2nd ed., Prentice-Hall,

Englewood Cliffs., NJ.

关35兴Hofbauer, J., and Hopkins, E., 2005, “Learning in Perturbed Asymmetric

Games,” Games and Economic Behavior, Vol. 52, pp. 133–152.

关36兴Wolpert, D. H., 2004, “Information Theory—The Bridge Connecting Bounded

Rational Game Theory and Statistical Physics,” http://arxiv.org/PS-cache/

cond-mat/pdf/0402/0402508.pdf

关37兴Aarts, E., and Korst, J., 1989, Simulated Annealing and Boltzman Machines,

Wiley, New York.

关38兴van Laarhoven, P. J. M., and Aarts, E. H. L., 1987, Simulated Annealing:

Theory and Applications, Reidel, Dordrecht.

关39兴Raghavan, T. E. S., and Fillar, J. A., 1991, “Algorithms for Stochastic

Games—A Survey,” Methods Models Op. Res., 35, pp. 437–472.

关40兴Vrieze, O. J., and Tijs, S. H., 1980, “Fictitious Play Applied to Sequence of

Games and Discounted Stochastic Games,” Int. J. Game Theory, 11 , pp. 71–

85.

596 / Vol. 129, SEPTEMBER 2007 Transactions of the ASME

Downloaded 02 Sep 2007 to 128.171.57.189. Redistribution subject to ASME license or copyright, see http://www.asme.org/terms/Terms_Use.cfm

Finite-time convergence to an $\epsilon$-efficient Nash equilibrium in potential games

Preprint

Full-text available

May 2024

This paper investigates the convergence time of log-linear learning to an $\epsilon$-efficient Nash equilibrium (NE) in potential games. In such games, an efficient NE is defined as the maximizer of the potential function. Existing results are limited to potential games with stringent structural assumptions and entail exponential convergence times in $1/\epsilon$. Unaddressed so far, we tackle general potential games and prove the first finite-time convergence to an $\epsilon$-efficient NE. In particular, by using a problem-dependent analysis, our bound depends polynomially on $1/\epsilon$. Furthermore, we provide two extensions of our convergence result: first, we show that a variant of log-linear learning that requires a factor $A$ less feedback on the utility per round enjoys a similar convergence time; second, we demonstrate the robustness of our convergence guarantee if log-linear learning is subject to small perturbations such as alterations in the learning rule or noise-corrupted utilities.

Equilibrium Selection for Multi-agent Reinforcement Learning: A Unified Framework

Preprint

Jun 2024

While there are numerous works in multi-agent reinforcement learning (MARL), most of them focus on designing algorithms and proving convergence to a Nash equilibrium (NE) or other equilibrium such as coarse correlated equilibrium. However, NEs can be non-unique and their performance varies drastically. Thus, it is important to design algorithms that converge to Nash equilibrium with better rewards or social welfare. In contrast, classical game theory literature has extensively studied equilibrium selection for multi-agent learning in normal-form games, demonstrating that decentralized learning algorithms can asymptotically converge to potential-maximizing or Pareto-optimal NEs. These insights motivate this paper to investigate equilibrium selection in the MARL setting. We focus on the stochastic game model, leveraging classical equilibrium selection results from normal-form games to propose a unified framework for equilibrium selection in stochastic games. The proposed framework is highly modular and can extend various learning rules and their corresponding equilibrium selection results from normal-form games to the stochastic game setting.

Naval Air Defense Planning problem: A novel formulation and heuristics

Article

Apr 2024

This article focuses on air defense in maritime environment, which involves protecting friendly naval assets from aerial threats. Specifically, we define and address the Naval Air Defense Planning (NADP) problem, which consists of maneuvering decisions of the ships and scheduling weapons and sensors to the threats in order to maximize the total expected survival probability of friendly units. The NADP problem is more realistic and applicable than previous studies, as it considers features such as sensor assignment requirements, weapon and sensor blind sectors, sequence‐dependent setup times, and ship's infrared/radar signature. In this study, a mixed‐integer nonlinear programming model of the NADP problem is presented and heuristic solution approaches are developed. Computational results demonstrate that these heuristic approaches are both fast and efficient in solving the NADP problem.

Learning-in-Games Approach for the Mission Planning of Autonomous Multi-Drone Spatio-Temporal Sensing

Article

Full-text available

Jan 2024

Mission planners are one of the major classes of autonomy software and their design is especially challenging in the case cooperation autonomy is required for unmanned multi-vehicle systems. A clear example of this is given by the applications of teams of drones, such as multi-drone spatio-temporal sensing. Here, drone teams act as mobile and cooperative sensor networks to simultaneously collect sensor data in areas of interest and to allow detailed computation on the sensed data. For the design of cooperative and autonomous drone teams, mission planning shall be accomplished in the form of coordinated sensing to optimally assign the different sensing tasks and routes to each drone, employing task allocation and route planning as the basic pillars to maximize the multi-drone mission effectiveness. This work proposes a dynamic and decentralized mission planner for a drone team performing autonomous and cooperative spatio-temporal sensing. The design exploits the learning-in-games framework for the processing of optimal routes in reasonable time frames. Two ad-hoc variants of the binary log-linear learning are proposed as a coordination algorithm to manage both task allocation and route planning, by demonstrating reachability and reversibility properties. Also, the work describes an experimental analysis of the proposed solutions by means of model-in-the-loop simulations, in order to provide a preliminary tune of the main learning parameters for both solutions.

Multiagent Reinforcement Learning and Game-Theoretic Optimization for Autonomous Sensor Control

Conference Paper

Mar 2024

Utility Decoupling for Distributed Nash Equilibrium Seeking in Weakly Acyclic Games

Article

Jul 2024

This article addresses the problem of distributed Nash equilibrium seeking over networks for games with finite action sets. Gradient-like and consensus-based methods commonly used for continuous action spaces fail to work for this case. To this end, we propose a utility decoupling method to reformulate the original game into an augmented game, which preserves the Nash equilibrium and weakly acyclic property, yet enjoys a utility coupling network the same as the communication network. In this way, a variety of full-information game-theoretic learning dynamics for the augmented game turns into partial-information Nash equilibrium seeking dynamics for the original game. We proceed to apply the developed utility decoupling method to formulate three types of distributed Nash equilibrium seeking dynamics, including distributed best-response dynamics, distributed fictitious play, and distributed regret matching for weakly acyclic games. In the last, a typical color assignment game is utilized to empirically illustrate the validity and effectiveness of our approach.

A timestamp-based projected gradient play for distributed Nash equilibrium seeking in monotone games

Article

Feb 2024
AUTOMATICA

Shaolin Tan

Game-Theoretic Mission Planning of Drone Teams in Autonomous Detection and Recognition

Conference Paper

Nov 2023

A performance comparison of utility functions for game theory based weapon-target assignment

Article

Sep 2023

The weapon-target assignment problem has been considered as an essential issue for military applications to provide a protection for defended assets. The goal of a typical weapon-target assignment problem is to maximize the expected survivability of the valuable assets. In this study, defense of naval vessels that encounter aerial targets is considered. The vessels are assumed to have different types of weapons having various firepower and cost as well as the incoming targets may have different attack capabilities. In a typical scenario, in addition to protecting assets, it is also desirable to minimize the cost of weapons. Therefore, an asset-based static weapon-target assignment problem is considered in order to both maximize the expected survivability of the assets and minimize the weapon budget. Thus, a co-operative game theory based solution is proposed which relates the utilities of the individuals to the global utility and reach the Nash equilibrium.

Game-Theoretical Approach to Multi-Robot Task Allocation Using a Bio-Inspired Optimization Strategy

Conference Paper

Oct 2023

Potential Games

Article

Full-text available

May 1996

We define and discuss several notions of potential functions for games in strategic form. We characterize games that have a potential function, and we present a variety of applications.Journal of Economic LiteratureClassification Numbers:C72, C73

A Survey of Collectives

Article

Full-text available

Jan 2004

Due to the increasing sophistication and miniaturization of computational components, complex, distributed systems of interacting agents are becoming ubiquitous. Such systems, where each agent aims to optimize its own performance, but there is a well-defined set of system-level performance criteria, are called collectives. The fundamental problem in analyzing and designing such systems is in determining how the combined actions of a large number of agents lead to “coordinated” behavior on the global scale. Examples of artificial systems that exhibit such behavior include packet routing across a data network, control of an array of communication satellites, coordination of multiple rovers, and dynamic job scheduling across a distributed computer grid. Examples of natural systems include ecosystems, economies, and the organelles within a living cell. No current scientific discipline provides a thorough understanding of the relation between the structure of collectives and how well they meet their overall performance criteria. Although still very young, research on collectives has resulted in successes in both understanding and designing such systems. It is expected that as it matures and draws on other disciplines related to collectives, this field will greatly expand the range of computationally addressable tasks. Moreover, in addition to drawing on them, such a fully developed field of collective intelligence may provide insight into already established scientific fields, such as mechanism design, economics, game theory, and population biology. This chapter provides a survey of the emerging science of collectives.

Reinforcement Learning

Book

Jan 1992

Richard S. Sutton

Reinforcement learning is the learning of a mapping from situations to actions so as to maximize a scalar reward or reinforcement signal. The learner is not told which action to take, as in most forms of machine learning, but instead must discover which actions yield the highest reward by trying them. In the most interesting and challenging cases, actions may affect not only the immediate reward, but also the next situation, and through that all subsequent rewards. These two characteristics -- trial-and-error search and delayed reward -- are the most important distinguishing features of reinforcement learning. Reinforcement learning is both a new and a very old topic in AI. The term appears to have been coined by Minsk (1961), and independently in control theory by Walz and Fu (1965). The earliest machine learning research now viewed as directly relevant was Samuel's (1959) checker player, which used temporal-difference learning to manage delayed reward much as it is used today. Of course learning and reinforcement have been studied in psychology for almost a century, and that work has had a very strong impact on the AI/engineering work. One could in fact consider all of reinforcement learning to be simply the reverse engineering of certain psychological learning processes (e.g. operant conditioning and secondary reinforcement). Reinforcement Learning is an edited volume of original research, comprising seven invited contributions by leading researchers.

Nonlinear assignment problems: Algorithms and applications

Article

R.A. Murphey

Nonlinear Multiobjective Optimization

Article

Feb 2000
J OPER RES SOC

Simultaneous Area Search and Task Assignment for a Team of Cooperative Agents

Conference Paper

Aug 2003

Many strategies exist for the coordinated search and exploration of an unknown region, however many multi-agent applications must do more than search a space: they must act upon objects that are found in the search. There can be a strong coupling between these two problems, and this paper explores the simultaneous search of an unknown region and assignment of tasks to agents. Specifically, we cast the search and task assignment decision into a unified optimization framework, and we propose a coordination strategy that generates a sequence of way-points to simultaneously accomplish area search and task assignment. Computational complexity is mitigated by a two-stage algorithm based on the satisficing approach, and a method of preventing cooperative instability (churn) is addressed.

Learning in Games

Article

May 1998
EUR ECON REV

This essay discusses some recent work on `learning in games'. We explore non-equilibrium theories in which equilibrium emerges as the long-run outcome of a dynamic process of adjustment or learning. We focus on individual level models, and more specifically on variants of `fictitious play' in two-player games. We discuss both the theoretical properties of the models and their relationship to regularities observed in game theory experiments.

Target-Based Weapon Target Assignment Problems

Article

Jan 2000

Robert Murphey

In this chapter we consider a class of non-linear assignment problems collectively referred to as Target-based Weapon Target Assignment (WTA). The Target-based Weapon Target Assignment problem considers optimally assigning M weapons to N targets so that the total expected damage to the targets is maximized. We use the term target-based to distinguish these problems from those that are asset-based, that is problems where weapons are assigned to targets such that the value of a group of assets is maximized supposing that the targets themselves are missiles engaging the assets. The asset-based problem is most pertinent to strategic ballistic missile defense problems whereas the target-based problems apply to offensive conventional warfare types of problems. Previous surveys of WTA problems were made by Matlin [Matlin, 1970] and Eckler and Burr [Eckler and Burr, 1972] the first on offensive, target based problems and the second primarily about defensive, asset based problems.

Iterative solution of games by fictitious play

Article

Jan 1951

G. W. Brown

Nonlinear Multiobjective Optimization

Book

Jan 1998

Kaisa Miettinen

Preface. Acknowledgements. Notation and Symbols. Part I: Terminology and Theory. 1. Introduction. 2. Concepts. 3. Theoretical Background. Part II: Methods. 1. Introduction. 2. No-Preference Methods. 3. A Posteriori Methods. 4. A Priori Methods. 5. Interactive Methods. Part III: Related Issues. 1. Comparing Methods. 2. Software. 3. Graphical Illustration. 4. Future Directions. 5. Epilogue. References. Index.

Article

Optimizing structural topology patterns using regularization of Heaviside function

September 2015 · STRUCTURAL ENGINEERING AND MECHANICS

This study presents optimizing structural topology patterns using regularization of Heaviside function. The present method needs not filtering process to typical SIMP method. Using the penalty formulation of the SIMP approach, a topology optimization problem is formulated in co-operation, i.e., couple-signals, with design variable values of discrete elements and a regularized Heaviside step ... [Show full abstract] function. The regularization of discontinuous material distributions is a key scheme in order to improve the numerical problems of material topology optimization with 0 (void)-1 (solid) solutions. The weak forms of an equilibrium equation are expressed using a coupled regularized Heaviside function to evaluate sensitivity analysis. Numerical results show that the incorporation of the regularized Heaviside function and the SIMP leads to convergent solutions. This method is tested using several examples of a linear elastostatic structure. It demonstrates that improved optimal solutions can be obtained without the additional use of sensitivity filtering to improve the discontinuous 0-1 solutions, which have generally been used in material topology optimization problems.

Article

Full-text available

Throughput-efficient coalition formation of selfish/altruistic nodes in ad hoc networks: a hedonic g...

January 2018 · Telecommunication Systems

In this paper, we analyze the problem of throughput-efficient distributed coalition formation (CF) of selfish/altruistic nodes in ad hoc radio networks. We formulate the problem as a hedonic CF game with non-transferable utility and propose different preference relations (CF rules) based on individual/group rate improvement of distributed nodes. We develop a hedonic CF algorithm, through which ... [Show full abstract] distributed nodes may self-organize into stable throughput-efficient disjoint coalitions. We apply the concept of frequency reuse over different coalitions, such that the members of each coalition will transmit over orthogonal sub-bands with the available spectrum being optimally allocated among them. We study the computational complexity and convergence properties of the proposed hedonic CF algorithm under selfish and altruistic preferences, and present means to guarantee Nash-stability. In addition, we identify the scenarios in which a CF process might lead to instability (CF cycle), and we propose methods to avoid cycles and define different exit procedures if a CF cycle is inevitable. Performance analysis shows that the proposed algorithm with optimal bandwidth allocation provides a substantial gain, in terms of average payoff per link, over existing coalition formation algorithms for a wide SNR range.

Article

Full-text available

Analysis of a mimetic finite difference approximation of flows in fractured porous media

July 2018 · ESAIM Mathematical Modelling and Numerical Analysis

We consider the mixed formulation for Darcy’s flow in fractured media. We give a well-posedness result that does not rely on the imposition of pressure in part of the boundary of the fracture network, thus including a fully immersed fracture network. We present and analyze a mimetic finite difference formulation for the problem, providing convergence results and numerical tests.

Article

Quadrature-based vector fitting for discretized H2 approximation

January 2015 · SIAM Journal on Scientific Computing

Vector fitting is a popular method of constructing rational approximants designed to fit given frequency response measurements. The original method, which we refer to as VF, is based on a least-squares fit to the measurements by a rational function, using an iterative reallocation of the poles of the approximant. We show that one can improve the performance of VF significantly by using a ... [Show full abstract] particular choice of frequency sampling points and properly weighting their contribution based on quadrature rules that connect the least-squares objective with an H2 error measure. Our modified approach, designated here as QuadVF, helps recover the original transfer function with better global fidelity (as measured with respect to the H2 norm) than the localized least-squares approximation implicit in VF. We extend the new framework also to incorporate derivative information, leading to rational approximants that minimize system error with respect to a discrete Sobolev norm. We consider the convergence behavior of both VF and QuadVF as well and evaluate potential numerical ill-conditioning of the underlying least-squares problems. We investigate briefly VF in the case of noisy measurements and propose a new formulation for the resulting approximation problem. Several numerical examples are provided to support the theoretical discussion.