Conference PaperPDF Available

Convergence and Finite-Time Behavior of Simulated Annealing

September 1986
Advances in Applied Probability 18(3):761 - 767

September 1986
18(3):761 - 767

DOI:10.1109/CDC.1985.268600

Source
IEEE Xplore

Conference: Decision and Control, 1985 24th IEEE Conference on
Volume: 24

Authors:

Debasis Mitra

Columbia University

Alberto L. Sangiovanni-Vincentelli

University of California, Berkeley

Simulated Annealing is a randomized algorithm which has been proposed for finding globally optimum least-cost configurations in large NP-complete problems with cost functions which may have many local minima. A theoretical analysis of Simulated Annealing based on its precise model, a time-inhomogeneous Markov chain, is presented. An annealing schedule is given for which the Markov chain is strongly ergodic and the algorithm converges to a global optimum. The finite-time behavior of Simulated Annealing is also analyzed and a bound obtained on the departure of the probability distribution of the state at finite time from the optimum. This bound gives an estimate of the rate of convergence and insights into the conditions on the annealing schedule which gives optimum performance.

Content uploaded by Debasis Mitra

Content may be subject to copyright.

Convergence and Finite-Time Behavior of Simulated Annealing

Author(s): Debasis Mitra, Fabio Romeo and Alberto Sangiovanni-Vincentelli

Source:

Advances in Applied Probability,

Vol. 18, No. 3 (Sep., 1986), pp. 747-771

Published by: Applied Probability Trust

Stable URL: http://www.jstor.org/stable/1427186 .

Accessed: 02/07/2014 16:01

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .

http://www.jstor.org/page/info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of

content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms

of scholarship. For more information about JSTOR, please contact support@jstor.org.

Applied Probability Trust is collaborating with JSTOR to digitize, preserve and extend access to Advances in

Applied Probability.

http://www.jstor.org

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

Adv. Appl. Prob. 18, 747-771 (1986)

Printed in N. Ireland

? Applied Probability Trust 1986

CONVERGENCE AND FINITE-TIME BEHAVIOR

OF SIMULATED ANNEALING

DEBASIS MITRA, *AT&

T Bell Laboratories

FABIO ROMEO, **

University of California, Berkeley

ALBERTO SANGIOVANNI-VINCENTELLI, **

University of California, Berkeley

Abstract

Simulated

annealing

is a randomized

algorithm

which has been proposed

for

finding globally optimum least-cost configurations

in large NP-complete

problems with cost functions which may have many local minima. A

theoretical analysis of simulated annealing

based on its precise model, a

time-inhomogeneous

Markov

chain, is presented.

An annealing

schedule is

given for which the Markov chain is strongly ergodic and the algorithm

converges to a global optimum. The finite-time behavior of simulated

annealing

is also analyzed and a bound obtained on the departure

of the

probability

distribution

of the state at finite time from the optimum.

This

bound gives an estimate of the rate of convergence

and insights into the

conditions on the annealing

schedule which

gives optimum

performance.

GLOBAL OPTIMIZATION; RANDOMIZED ALGORITHMS; TIME-INHOMOGENEOUS

MARKOV CHAINS

1. Introduction

Many combinatorial

optimization problems belong to a class of problems

which are difficult

to solve, i.e., the class of NP-complete problems [3]. For

these problems, there is no known algorithm

whose worst-case

complexity

bounded

by a polynomial

in the size of the input. Heuristic

algorithms

are used

to solve NP-complete problems approximately,

i.e. to find 'good' solutions

which are 'close' to the optimum.

These algorithms explore a discrete

space of

admissible configurations,

S, in a deterministic fashion. Often the search

terminates at a local minimum due to the fact that heuristic algorithms

are

'greedy'. To avoid this behavior, a class of randomized

algorithms

(e.g. [171)

have been devised which generate

the next configuration randomly,

and which

can 'climb hills', i.e., moves that generate configurations

of higher cost than

the present

one are accepted.

Received

24 April 1985;

revision received

26 July 1985.

* Postal address:

AT&T Bell Laboratories,

600 Mountain

Avenue, Murray

Hill, NJ

07974,

USA.

** Postal address:

Department

of Electrical

Engineering

and Computer

Science, University

California,

Berkeley,

CA 94720, USA. 747

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

748 D. MITRA, F. ROMEO AND A. SANGIOVANNI-VINCENTELLI

Simulated annealing as proposed by Kirkpatrick

et al. [11] allows 'hill-

climbing'

moves but these moves are accepted

according

to a certain

criterion

which takes the state of the search process into consideration

in a manner

unlike other randomized

algorithms.

The controlling

mechanism is based on

the observation that combinatorial

optimization problems with a large con-

figuration

space exhibit properties similar to physical processes with many

degrees of freedom.

In particular,

bringing a fluid into a low-energy state such as growing a

crystal, has been considered

in [11] to be similar to the process of finding

optimum solution of a combinatorial

optimization problem. Annealing is a

well-known

process for growing crystals. It consists of melting the fluid and

then lowering

the temperature

slowly until the crystal

is formed. The rate of

decrease of temperature

has to be very low around the freezing temperature.

The Metropolis Monte Carlo method [1], [14] can be used to simulate the

annealing process. It has been proposed as an effective method for finding

global minima of combinatorial

optimization

problems.

In applications

to combinatorial

optimization,

this method starts from an

arbitrary

configuration

and, given that the simulation

is at configuration

i at

time m, m = 0, 1, 2, - - -, a new configuration

j is randomly generated

from an

admissible set N(i) and a check is made to determine whether the cost of the

new configuration

satisfies an acceptance

criterion based on the temperature,

controlling parameter, at time m, Tm. If the cost decreases, the simulation

accepts the move. Otherwise, a random number uniformly

distributed over

[0, 1] is picked and compared

with exp (- {c(j)- c(i)}/Tm), where c(.) is the

cost function on configurations. If the random number is smaller, the

simulation

accepts

the move, otherwise

it discards the move. In any case, time

is incremented. Note that the higher

the temperature,

the more likely it is that

a 'hill-climbing'

move is accepted. The initial temperature, the number of

moves generated

at each temperature

and the rate of decrease of temperature

are all important parameters

that affect the speed of the algorithm

and the

quality

of the final

configuration.

Experimental results [9], [11], [16], [19] show that simulated annealing

produces very good results when compared

to other techniques

for the solution

of combinatorial

optimization problems

such as those arising

from the layout

of integrated

circuits, at the expense, however, of large computation

time (a

1500 standard cell placement problem can take as much as 24 hours of a

VAX 11/780 [161). This has emphasized the need for a better theoretical

understanding

of simulated

annealing.

Early analyses using time-homogeneous

Markov chains [7], [8], [18] were

based on certain

(unrealistic)

assumptions

on the number of iterations taken at

each temperature.

It was shown [12], [15] that simulated

annealing,

and even

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

Convergence

and

finite-time

behavior

of simulated

annealing 749

generalizations

of it called 'probabilistic

hill-climbing

algorithms' [15] give

asymptotically

the optimum

solution with probability

The analysis

in this paper is based on time-inhomogeneous

Markov chains.

We prove that for an arbitrary

but bounded cost function, for annealing

schedules of the form

(1.1) T=log (m mo1), m = 0, 1, 2,

where mo is any parameter

satisfying

1 _ mo

< oo,

the Markov

chain is strongly

ergodic

if r

where r is the radius

of the graph

underlying

the chain and L is a Lipschitz-like

constant of the cost function. Strong ergodicity implies that, for any starting

probability

vector, the state probability

vector converges

component-wise

to a

constant vector e*. Furthermore,

we show that e* is the optimum

vector, i.e.,

the vector in which all elements are zero except those with the indices of the

global least-cost configurations. Our other main result is on finite-time

behavior and rate of convergence.

We give a bound on the departure

of the

state vector from the optimum

vector after a finite number

of iterations. This

bound indicates how the annealing schedule must be balanced between

contrary requirements

for optimum performance.

A simple corollary

to this

result states that for a large number of iterations k, the La-norm

of the

difference

of the state vector from the optimum

vector is O(1/kmin(ab)),

where

a and b respectively

increase

and decrease with increasing

We also obtain a set of results on distributions which we call quasi-

stationary. These constructs are the equilibrium distributions of time-

homogeneous Markov chains obtained from simulated annealing by holding

the temperature fixed at various values. The dependence of the quasi-

stationary

distributions on temperature

is shown to have a number of desirable

properties. These properties are essential for our analysis of the time-

inhomogeneous Markov chains obtained from annealing schedules given in

(1.1). In addition, they are of independent interest since they hold for

annealing schedules considerably more general than (1.1). This may be

important in the future if, as we expect, it becomes possible to design

schedules matched

to special properties

of the cost function.

In an important

work Geman and Geman [4] have proved in the context of

Markov fields, used to model image-processing models, that simulated

annealing converges

to the least-cost configurations

for a particular

annealing

schedule.t Our results are stronger

in the following respects: (i) there is no

t The recent works of Gidas [5] and Hajek [6] are also noteworthy.

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

750 D. MITRA, F. ROMEO AND A. SANGIOVANNI-VINCENTELLI

result in [4] on finite-time

behavior

and rate of convergence,

results which are

most useful to obtain a practical

annealing

schedule;

(ii) our conditions on the

annealing

schedule are substantially

weaker; (iii) the proof of convergence

simpler since it makes use of powerful, known results in the theory of

time-inhomogeneous

Markov chains; (iv) the graph underlying

the Markov

chain is arbitrary

and well-matched

to combinatorial

optimization,

and there is

no suggestion

of the need of structural

constraints

such as those that exist in

image processing.

The paper is organized

as follows. In Section 2, the structure of simulated

annealing

and the Markov-chain model are briefly

recalled. In Section 3, the

quasi-stationary

probabilities

of the Markov chain are introduced and their

properties established. In Section 4, the basic results of the theory of

time-inhomogeneous

Markov chain useful to us are recalled. In Section 5, the

annealing

schedule that guarantees

convergence

of simulated

annealing

to the

optimum

vector is presented and the basic convergence

theorem proven. In

Section 6, the finite-time behavior of the Markov chain and the rate of

convergence of simulated annealing are investigated. In Section 7, some

concluding

remarks and future research directions

are offered.

2. Preliminaries

In this section, we describe the basic structure of the simulated

annealing

algorithm

and we introduce a Markov

chain model for it.

Simulated

annealing algorithm

structure

(fo, To)

{

/* Given an initial state

jo and an initial

value

for the

parameter

T, To.

X =jo;

m = 0;

while('stopping

criterion' is not satisfied)

{

while('inner loop criterion'

is not satisfied)

{

j = generate(X)

if(accept(c(j), c(X), T,))

X =j;

}

T,,+ = update (Tn)

m=m+l

}

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

Convergence

and

finite-time

behavior

of simulated

annealing 751

The 'inner loop criterion' determines how many steps are taken by the

algorithm

at a given temperature.

In the analysis

here we have emphasized

the

case in which only one step is taken at each temperature.

However, as we

observe later, it is easy to extend the results to the case of more than one step

at each temperature.

In the algorithm

structure

three functions

play a fundamental role: accept,

generate and update. While several accept functions

can be used [15], in this

paper

we restrict

our attention to the one proposed

in [11].

accept(c(j), c(i), T)

{

returns

1 if the cost

variation

passes a test.

T is the control

parameter.

Ac- = cOu) - c(i);

y = min [1, exp (-Aci)];

r = random(0, 1);

random

is a function which

returns

a pseudo

random

number

uniformly

distributed on the

interval

[0, 1].

if(r

!y)

return(I);

else

return(0);

}

The generate

function selects a new configuration.

In simulated

annealing,

new configuration

is generated randomly

from a set of possible configurations.

To specify this function completely, a set of configurations

accessible

from a

given configuration

and the probability

of generating

one of these has to be

given.

The update function, also called the annealing

schedule or cooling schedule,

produces a new value for the temperature. This function is most important to

determine the convergence properties of the algorithm. We focus on update

functions which return monotonically decreasing values of T, i.e. Vm L0,

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

752 D. MITRA, F. ROMEO AND A. SANGIOVANNI-VINCENTELLI

T,,,m+

< T and lim,,, Tm

= 0. The function is completely specified

when the

explicit dependency

of T on m is given. This paper is devoted to the study of

update functions that guarantee convergence

of the algorithm

to the optimum

vector.

It is easy to see that simulated

annealing

can be represented

by a Markov

chain, whose connectivity

is fully specified by the generate

function and whose

transition probabilities are determined by the accept and by the generate

functions.

The underlying

directed graph, which we denote by G, is determined as

follows. There is a bijective

correspondence

between the elements

of S, the set

of all the possible

configurations

of the optimization problem,

and the nodes of

the graph.

Given two different

elements, say i and

j, of S, there is an arc

from i

to j if j can be generated starting from i. The two nodes are said to be

neighbors.

We define N(i) to be the set of all the neighbors

of i. We assume

that i 0 N(i). In several applications

of the simulated

annealing algorithm,

the

probability

of generating

a particular neighboring configuration

starting

from i

is simply given

by 1/IN(i)i

where

IN(i)I

is the cardinality

of N(i). However,

certain

applications

such as placement

of integrated

circuits

[16], it is important

to generate certain neighbors with higher probability. For this reason, we

assume

that the probability

of generating

j from i is given by

(2.1) g(i, j)/g(i)

where g(i, j) gives the 'weights'

for each of the neighbors

of i and g(i) is a

normalizing

function

which ensures

that

1 g(i,j)= 1.

g(i)

jEN(i)

The directed

graph

G is assumed to be connected.

The one-step transition

probabilities

of the Markov

chain are represented

weights on the edges of the directed graph G defined above and are

determined

by the product

of the probability

of generating

a given configura-

tion and the probability

of accepting

it. We define first a one-parameter

family

of transition

probabilities:

0 if j N(i) and j # i

(2.2) Pij(T)= g(i,j) min [1, exp (-{c(j) - c(i)}/T)] if j e N(i)

g(i)

and

(2.3) Pi,(T)= 1- C P,(T).

j•N(i)

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

Convergence

and

finite-time

behavior

of simulated

annealing 753

The transition

probabilities

of the time-inhomogeneous

Markov

chain

in which

m denotes discrete time are obtained from the above and the annealing

schedule which specifies T = Tm,

m = 0, 1, 2, .

3. Quasi-stationary

probability

distributions and their properties

We have shown that a mathematical model for simulated annealing is a

time-inhomogeneous

Markov chain. However, if the temperature

is frozen at a

particular

value T, then we obtain a time-homogeneous

Markov chain. To

prove the convergence of simulated annealing, it is important

to study this

Markov chain. In particular,

we show here that this chain has a stationary

probability

distribution,

which we call the quasi-stationary

probability

distribu-

tion of the time-inhomogeneous

Markov

chain.t In addition,

we show that the

stationary

probability

distributions have a limit when T goes to 0, i.e. when m

goes to oo,

and that this limit is the optimum

vector e*.

3.1. The quasi-stationary probabilities.

For i e S define

(3.1) .(T) &A_

g(i) exp (-c(i)/T)

G(T)

where G(T) is a scaling

factor such that IIr(T)II

= 1 where

i=1

and s = ISI.

The role of G(T) is similar to that of the partition function in statistical

mechanics

and stochastic

networks

[10].

We now show that x(T) is the stationary probability

distribution

for the

time-homogeneous

Markov

chain. For this to be true we need to assume that

the function

g(i, j) is symmetric,

i.e.,

(3.2) g(i, j) = g(j, i) Vi, j e S.

This is a mild restriction which is easy to satisfy in implementations

simulated annealing. In particular, symmetry exists in the case where all

neighbours

of each configuration

are given equal weights.

Proposition

3.1. If (3.2) holds, then {1(T)}, defined

by (3.1), satisfies

(3.3) ;r(T)P(T) = ;r(T), m = 0, 1, ...

t The reader is warned that the term 'quasi-stationary distribution' is used for another concept

in [8].

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

754 D. MITRA, F. ROMEO AND A. SANGIOVANNI-VINCENTELLI

where P(T) is the one-step transition

probability

matrix

of the Markov

chain

defined in (2.2) and (2.3).

Proof. By (3.1), (2.2), and (3.2), we have that for every i and

j neighbors

S, and also for j = i,

(- T exp ({cO")

- c(i)}/ T) = PT)

xj(T) gO) Pi(T)

Note that this is true regardless

of the sign of {c(j)- c(i)}.

Hence, detailed balance holds:

(3.4) x.( T)P1.(T)

= x.(T)Pji(T).

Equation

(3.4) obviously

holds also for those i and

j that are not neighbors

in S under the given topology

since then each side is 0. By adding,

with respect

to i, both sides of (3.4) and recalling

(2.3), (3.3) is obtained.

It is of some interest

to note that detailed

balance,

see (3.4), is equivalent

the time-reversibility

[10] of the time-homogeneous

Markov

chain.

3.2. Asymptotic

quasi-stationary probabilities.

The results

in this section and

Sections

3.3-3.4 hold for any update

function

in which

(3.5a) T> Tm+,,, Vm -0,

(3.5b) lim Tm=

m--->oo

It should

be emphasized

that here and in Sections

3.3-3.4 we are investigating

the dependence of x(Tm) on Tm

where {Tm} behaves as in (3.5), and that

x(Tm) is a construct and not the distribution obtained from simulated

annealing.

It is possible

to show the following

result.

Proposition 3.2. If the update function satisfies (3.5.b), then the quasi-

stationary

probability

vector x(Tm) defined in (3.1) converges, as m -- 00, to

the optimal

vector e*

(3.6) e= g(i)/g(*) ie S*

where S* is the set of indices of global least-cost

configurations,

i.e.

S* ({i SIc(i) c(O)

e S}

and

g(,)_A ] gO').

JES*

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

Convergence

and

finite-time

behavior

of simulated

annealing 755

The proof of this proposition

is straightforward

and hence omitted here. A

proof for a more general class of algorithms, probabilistic hill-climbing

algorithms,

can be found in [15].

Note that this result can be interpreted

as the convergence

of the algorithm

to the optimum

vector provided

that an infinite

number

of iterations

are taken

at each value of m, so that the equilibrium

distribution

is reached.

3.3. Monotonicity of the quasi-stationary

probabilities.

The convergence

the quasi-stationary

distributions

to e* displays

remarkable

monotonicity pro-

perties. This property

is insightful

and also an essential

element of the analysis

of the asymptotic

and finite-time

behavior of simulated

annealing.

We will need to identify the 'weighted

mean cost' to be denoted by C and

defined

thus:

(3.7)

gU)cU)

gu).

jES jES

Proposition

3.3.

(i) For each ie S*,

J;(Tm+1) - J;((Tm)

> 0 Vm 0.

(ii) For each i 0 S*, there exists an unique integer i_, 0 -

rh5ii < oo, such that

;(Tm+l) - J;(Trm)

> 0 0 5 m 5 rki - 1

<0 m-r"h;M.

Proof. Consider x as a continuous function of the parameter T and

differentiate

zR(T)

in (3.1) with respect

to T:

[( J)

T 7(T)=

- g'){c(j) - c(i)} exp (-{c) - c(i)}/T)

(3.8) = [1 g(0){c(i)- cO)} exp

({c(i) - c) /T)

Lj:c()<c(i)

- 1: g(j){c(j) - c(i)} exp (-{c(j) - c(i)}/T)].

Ljc(i)>c(i)

The sign properties of (d/dT)4.(T) will be deduced from the relative

magnitudes

of the two bracketed

terms on the right-hand

side.

If configuration

i is least-cost, i.e. i e S*, then the first term is null and

(d/dT)xi(T) < 0 for T > 0. Statement

(i) then follows from (3.5).

The terms if not null are respectively monotonically decreasing and

monotonically increasing with increasing T, and the value of the right-hand

side of (3.8) evaluated at T = 0 is positive. Hence the right-hand side either

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

756 D. MITRA, F. ROMEO AND A. SANGIOVANNI-VINCENTELLI

has a finite-valued zero or not depending

on the sign of its value evaluated at

T = o. We conclude that if c(i) is not least-cost and c(i) < C then a unique

zero exists at T, say, where 0 < T < oo, and also that

dT;;(T) > O, 0 < T < 4,

(3.9)

= 0, T = ,

<0,

1 T< Tc--o.

Thus for c(i) < C, the weighted mean cost, we may use (3.5) to identify mi in

statement

(ii) with the smallest

integer such that T,s, Ti.

If on the other hand c(i)> C, then

d ( T) > O, 0 < T <oo

and Mi = 0 in statement (ii).

An immediate

corollary

to Proposition

3.3 is the existence of th, ih < c, such

that for all i S*,

(3.10) ai((Tm+1)

- ;.(Tm) < 0, Vm mtih.

In fact

(3.11) h = max

rii.

ieS*

3.4. Uniform

monotonicity of the quasi-stationary probabilities.

The analysis

in Section 6 on finite-time behavior

requires

knowledge

of rh

which marks

the

onset of monotonic decrease of quasi-stationary

probabilities

of all but the

least-cost

configurations.

We show here how it may be identified. This is done

by considering i, for i 0 S* and c(i) < C, as functions

of the cost associated to

each state {c(j)}.

Proposition

3.4. For all i such that i 0 S* and c(j)< C, T are monotonic,

strictly

increasing

with increasing c(i).

Proof. A little algebra

shows that for any pair (i1, i2) where il E S and i2

e S

and c(i) - c(i2)= E

> 0

1 do,(T) E 2

(T) 1

dIj,(T)

(3.12) g(i) dT exp

(-E/T) 2) + exp (-E/T)2) dT

For the case of interest here c(i2) < (il)< C, so that from the definition of T,2

see (3.9), (d/dT)~.(T2) = 0. Now if (3.12) is evaluated

at i2, then the second

term on the right-hand

side is 0, while the first term is positive. Again noting

(3.9) it follows that 2

< i,.

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

Convergence

and

finite-time

behavior

of simulated

annealing 757

Note that configurations

with common cost have common values of i and

Mni.

To calculate th it is helpful to identify the least-cost and the next-to-least-

cost of all the configurations.

Let

(3.13) c(*) min c(j)

jeS

and

(3.14) 6 {min c)} - c(*),

SjIS*

so that c(*) and {6 + c(*)} are respectively the least- and the next-to-least

cost. Note that 6 is an important

global characteristic of the cost function.

The monotonicity

property

in Proposition

3.4 allows (3.11) to be sharpened:

= m-,

where T is any configuration

with next-to-least cost.

Setting (3.8) equal to 0 for i = i, let T be the unique

positive solution of the

equation

(3.15) 6g(*) - C g'){c(j) - c(*) - 6} exp (-{c() - c(*)}/T) = 0,

j:c(j)>b+c(*)

where

g(*) is given in Proposition

3.2. Then rh

is the smallest

integer

such that

We conclude this section by a summary.

The quasi-stationary

probability

distribution

converges

with decreasing

temperature

(i.e. increasing

time) to the

optimum

vector. The quasi-stationary

probabilities

of least-cost

configurations

monotonically

increase with decreasing temperature.

For configurations

with

costs not less than the weighted mean cost, the opposite is true. Each

configuration

i with cost between least-cost and weighted mean cost has an

associated 'critical

temperature'

T; while the temperature

is greater than T,

the configuration's

quasi-stationary

probability

increases with decreasing

tem-

perature, and for temperatures

less than T the opposite is true. Furthermore,

the critical temperature

is an increasing

function of cost. All of the above

properties

hold for any update function

satisfying

(3.5).

4. Time-inhomogeneous

Markov

chains

In this section a number of well-known

properties

of time-inhomogeneous

Markov chains are presented. These results will be used in Section 5 to prove

the convergence properties of the simulated annealing algorithm and to

determine the influence of the annealing schedule on the rate of convergence

to the optimal solution of the combinatorial optimization problem.

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

758 D. MITRA, F. ROMEO AND A. SANGIOVANNI-VINCENTELLI

All theorems and propositions are given without proof. The interested

reader

can find these proofs in [7], [8] and [18].

4.1. Notation. For the sake of notational simplicity, from now on all

vectors, matrices and functions

depending

on T,m

will be denoted as depending

on m. Let P(m, m) be the identity

matrix,

and

n-1

P(m, n + m) A P(m + i), m -O, n- 1

i=O

be the n-step transition

matrix.

Furthermore

let

V(m) A [va(m), V2(m), ' ' ', s(m)]

denote the state probability

vector after m transitions of the Markov

chain, so

that v(m + n) = v(m)P(m, m + n).

We also let v(m, n) = v(O)P(m,

n).

4.2. Basic results from the theory of time-inhomogeneous Markov

chains. We need the following

definition.

Definition

4.1. A time-inhomogeneous

Markov chain is weakly ergodic if,

for all m,

(4.1) lim sup IlIr(m,

n) - V2(m, n)II

= 0

n--o v1(0),v2(0)

where v1(0) and v2(0) are two arbitrary

initial state probability

vectors and

vl(m, n) = vl(o)P(m, n)

v2(m,

n) = v2(0)P(m, n).

Note that weak ergodicity

does not imply the existence of limits of vectors

v'(m, n) and v2(m, n) but only a tendency towards equality of the rows of

P(m, n). Thus weak ergodicity implies only a 'loss of memory' of the initial

conditions,

but not convergence.

The investigation

of conditions under which weak ergodicity

holds is aided

by the introduction of the following

coefficient of ergodicity.t

Definition

4.2. Given a stochastic matrix

P, its coefficient of ergodicity

ri is

1 SS

(4.2) ,l(P)

= max Pik -Pjk =1-min min(Pk, P).

2 i,, k = 1 J k = 1

t We are following

Seneta [18];

Isaacson and Madsen

[7], following

Dobrushin

[2], call (1- r1)

the ergodic

coefficient.

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

Convergence

and

finite-time

behavior

of simulated

annealing 759

With the above definition of the coefficient of ergodicity

the following

result

can be proved [7], [8], [18].

Theorem

4.1. The time-inhomogeneous

Markov

chain is weakly ergodic if

and only if there is a strictly increasing sequence of positive integers

{(k}, i= 0, 1, ... such that

(4.3) E [1 - rl(P(k;, ki+,))] =oo.

i=0

Strong ergodicity

is defined as follows.

Definition

4.3. The time-inhomogeneous

Markov chain is strongly ergodic

there exists a vector q, IIqII

= 1 and qi O0,

i E S, such that for all m

(4.4) lim sup

Ilv(m,

n) - q | = 0.

n---oo ()

Strong ergodicity

is obtained only with convergence in addition to loss of

memory. Note that since the Markov chain is finite, the convergence

in norm

used to define weak and strong ergodicity is equivalent to coordinate-wise

convergence.

We shall need the following

result due to Madsen and Isaacson

[13], [7].

Theorem 4.2. If for every m there exists a r(m) such that r(m)=

;r(m)P(m),

II (m)ll

= 1 and

||(m) - ; (m

+ 1)11<

m=O

and the time-inhomogeneous

Markov

chain is weakly ergodic, then it is also

strongly

ergodic. Moreover

e* = lim xr(m),

then for all m, m---00

lim

su Ilv(m,

n) -

e*'

= 0.

n-.* o v(o

5. Strong ergodicity

of simulated

annealing

To establish weak ergodicity we use Theorem 4.1. In particular,

we first

determine

a bound on the coefficient of ergodicity

and then we determine the

update function

such that (4.3) is satisfied. Next we show that weak ergodicity

together with the existence of 7(Tm), as defined in (3.1), are sufficient

conditions

to ensure strong

ergodicity.

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

760 D. MITRA, F. ROMEO AND A. SANGIOVANNI-VINCENTELLI

5.1. Radius

of G and Lipschitz

constant. We need a few definitions

to the structure of the graph underlying

the Markov chain and to the slope of

the cost function.

Let Sm

be the set of all the points that are local maxima for the cost function,

i.e.,

Sm, {iE S I

c(j) - c(i) Vj e N(i)}.

Let

(5.1) r min max

d(i, j)

iE(S-S,) jES

be the radius

of the graph,

where d(i, j) is the distance of j from i measured

the length (number

of edges) of the minimum

length path from i to j in G. Let

1, the index of a node where the minimum

in (5.1) is attained,

be the center

the graph.

We will show that at any time the radius

r represents

an upper

bound

on the

number

of transitions

of the Markov

chain that are required

for the probability

transition

matrix

to have all the elements in at least one column, namely the

one indexed by 1, to be different

from 0. Note that the radius is well defined

since we assumed

G is connected and, because of the symmetry

of g(i, j), it is

also strongly

connected.

A Lipschitz-like

constant bounding the local slope of the cost function is

given by

(5.2) L = max max Ic(j)

- c(i)l.

ieS jeN(i)

Finally

we define a lower bound on the generation

function

A .g(i, j)

(5.3) w= min min

i~S jeN(i) g(i)

An important

assumption

is that w > 0.

5.2. Coefficient

of ergodicity.

If i and

j are neighbors

in G, i.e. j E

N(i), then

from (2.2), (5.2) and (5.3),

(5.4) P1i(m) - w exp (-L/Tm), m = 0, 1,..

Now the diagonal elements Pi,(m), i (S - Sm), may be quite small initially,

but these terms are monotonic, increasing

with increasing

m. This is because

the probabilities of transition from node i to neighboring nodes with lower cost

are constant with respect to m, while the probabilities of transition to

neighboring nodes with higher cost are monotonically decreasing with increas-

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

Convergence

and

finite-time

behavior

of simulated

annealing 761

ing m. Hence there exists some ko, ko < oo, such that for all i E S - Sm

(5.5) Pi(m) _ w exp (-L/Tm), m - (ko - 1)r,

since the left-hand side monotonically increases and the right-hand side

monotonically

decreases with increasing

We can use (5.1) and (5.5) to bound Pji(m - r, m) for every i ES and

m = kor m-1

Pi(m - r, m) - I {w exp (-L/T,)}

(5.6) n=m -r

Sw' exp (-rL/Tm_I).

Hence the coefficient of ergodicity

rl defined

in (4.2) satisfies

(5.7) rl(P(kr - r, kr)) - 1 - min {min (Pil, Pji)}

(5.8) 1- w'exp ,

ke?:ko'

From now on, for convenience we shall abbreviate r,(P(n, m)) to rl(n, m).

5.3. Weak

ergodicity. By Theorem 4.1 and (5.8), we have that the Markov

chain associated with simulated

annealing

is weakly ergodic

(5.9) kk exp -Tkrl =00

k=ko Tkr-1

Note that up to now, we have only assumed

that the sequence of parameter

{Tm} is monotonically decreasing and limm,.TTm

=0; in particular, the

dependency of Tm

on m has not been specified. We give now an update

function

which ensures that the Markov

chain is weakly

ergodic.

Theorem

5.1. The Markov chain associated with simulated

annealing

with

the following

update function:

(5.10) TM=

= + , m = 0, 1, 2, ...

T log(m + mo+ 1)'

where mo is any parameter satisfying

1 = mo

< oo,

is weakly

ergodic

(5.11) y -rL.

Proof. Replacing Tm

in (5.8) with the formula

given in (5.10) we obtain

(5.12a) 1(kr - r, kr) 1 - k ko

(k + mo/r)"k

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

762 D. MITRA, F. ROMEO AND A. SANGIOVANNI-VINCENTELLI

where

(5.12b) 9 A rL/y,

and

a -

(5.12c) arrL/

It is obvious that, for any 1

Z {1 - rl(kr

- r, kr)}

k=1

if t - 1. Using Theorem 4.1, the proposition

is proved.

It is clear that weak ergodicity

is preserved

even if the annealing

schedule

(5.10) is modified to keep the temperature

unchanged at various (finitely

many) time steps.

5.4. Strong

ergodicity. In Section

3 we have shown

that

there exists

for every

m, m- 0, a vector x(m) of quasi-stationary

probabilities

that has unit norm,

satisfies (3.3) and, as shown in Proposition 3.2, converges to the optimum

vector e* defined

in (3.6).

Hence, to prove the strong ergodicity

of the Markov

chain associated

with

simulated

annealing

using Theorem 4.2, we only have to prove the following

proposition. Interestingly

the proposition holds more generally than for the

update function

in (5.10).

Proposition 5.1. For update functions satisfying (3.5) the corresponding

quasi-stationary

probabilities

are such that

(5.13) |

I1(m + 1)-

x(m)l _-2(m,

+ 1)

< ,

m=O

where mr

is given in (3.10) and (3.11).

Proof. From statement

(i) of Proposition

3.3, and (3.10), for m- Fi,

(5.14) IIx(m+

- x(m)ll = ((m+ 1)- -

(m)}

- {.(m 1)

iES* ioS*

Since

Sn(m)

+ C (m)= 1, Vm _

iES* ieS*

we have

(5.15) IIx(m

+ 1)- x(m)ll = 2{x*(m + 1)- x*(m)}, m r_

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

Convergence

and

finite-time

behavior

of simulated

annealing 763

where

(5.16) x*(m)_

(m), m _0.

ieS*

By (5.15), we have

(5.17)

IIx(m

+ 1)- (m)l

I2.

m=m

In view of (5.17) the proposition

is proven.

Using Theorem 4.2 and Theorem 5.1, we can prove the fundamental

result

of this section.

Theorem 5.2. The time-inhomogeneous Markov chain associated with

simulated annealing is strongly ergodic if it is weakly ergodic, and the

annealing

schedule satisfies

(3.5). In this case, for all m

(5.18) lim

sup IIv(m, n) - e* = 0.

n-. ov (O

In particular,

the annealing schedule in (5.10) with y

,=rL gives a strongly

ergodic

Markov chain for which (5.18) holds.

6. Finite-time

behavior and rate of convergence

We obtain an estimate of the departure

of the state of the Markov chain at

finite time m from the optimum

vector e*. The results in Theorem 6.2 below

give important

insights at the factors affecting the rate of convergence and

their implications

in the design of optimum

annealing

schedules.

6.1. Components

of finite-time behavior. The following decomposition is

basic:

v(m) - e* = { v(m) - x(O)P(0, m)}

(6.1) + {x(0)P(0, m) - x(m)) + {x(m) - e*}.

Observe that the sum of the first two terms in braces in the right-hand

side

measures the departure at time m of the state distribution from the

quasi-stationary

distribution. We have chosen to decompose this quantity

further so that the first term measures the extent to which at time m the

Markov

chain has lost memory

of the difference between v(0) and x(0).

From (6.1) we obtain

Ilv(m)

- e*Il - Ilv(m)

- r(0)P(0, m)ll

+ IIx (0)P(O,

m)- ,(m)ll

(6.2) + J|l|(m) - e* | .

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

764 D. MITRA, F. ROMEO AND A. SANGIOVANNI-VINCENTELLI

In the next subsections, each of the three terms in the right-hand

side are

bounded

independently.

6.1.1. Bound for the

first term of (6.2). To determine

a bound for the first

term in the right-hand

side of (6.2), we need the following

fundamental

result

due to Dobrushin

[2], [7], [18].

Theorem

6.1. If P is any stochastic matrix and y is any row vector with

then

In view of Theorem 6.1 for the first term of the right-hand

side of (6.2),

IIv(kr)

- ;(0)P(0, kr)II

= II{v(0)

- ;r(0)}P(0, kr)ll

(6.3) <I(0) - r(0)Ir(0, kr).

To complete the bound of the first term of (6.2) we need to bound rl(0, kr).

To this end the following

proposition

is necessary.

Proposition

6.1. If y i rL and the annealing

schedule (5.10) is applied so

that rl satisfies

(5.12a), then

+ mo/0r)

(6.4a) r1(lr

- r, kr) + , for ko 1 k

\ k + mo/r

/ '

(6.4b)

rl(lr-r,kr)< k+moIr) , for

kol<k

where a is defined

by (5.12c), r by (5.1), and ko is such that (5.5) holds and mo

is the parameter

that controls the initial

value of the temperature.

Proof. Let Q and R be two stochastic

matrices,

then [7]

Tl(QR) 5 TI(Q)•I(R).

By the above property,

we have from (5.12a), for ko-5

15 k

Tl(lr- r, kr) = H rT(mr - r, mr)

m=l

k ka

H0 1- a

m=: (m + mo/r)i"

< exp -a mor

m=l (m + o/r)"

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

Convergence

and

finite-time

behavior

of simulated

annealing 765

Sexp

-a-( t

+ mo/r

\k + molr/

A similar

bound can be derived for 1

_ ko < k.

The bound in (6.4) on the coefficient of ergodicity

is fundamental

to the

finite-time analysis of simulated annealing. Substituting

the above bound in

(6.3) yields

(6.5) IIv(kr)

- x(O)P(O,

kr)I < IIv (O)

- (O)II

(ko

+ o/r) k ko

(k + mol/r) Vk

6.1.2. Bound

for the second term

of (6.2). Let

(6.6) p(m) _ x(0O)P(O,

m) - (m), m = 0, 1,...

Note that p(0) = 0 and that {p(i)} satisfy

the recursion

p(m

+ r)

(6.7) r

(6.7)

=(m)P(m, m + r) +> {x(m + s - 1) - x(m + s)}P(m + s, m + r).

s=1

The recursion

is solved to give

(6.8a) p(kr) = e(lr)P(lr, kr),

1=1

where

(6.8b) e(lr) - E {(lr - s) - r(lr

- s + 1)}P(lr - s + 1, ir).

s=1

Applying

Theorem 6.1 twice to obtain bounds

for EII(lr)P(lr,

kr)|| and EII(lr)

from (6.8a) and (6.8b) respectively,

we obtain

(6.9) ||p(kr)|| -- rl(lr, kr) IIx

(1r

+ 1 - s) - x (lr - s)II, k - 1.

1=1 s=l

Now making

use of (5.15) and (6.4) we obtain

ko(6.10)

+ mo/r\a lo r

(6.10) 11p(kr)I k., IIr(lr

+ 1 - s) - xr(lr - s)II 11

k + mo/r 1

=1s=1

2 k \a

+( +( +tolr) + 1 + {

r{*(lr) - x*(lr - r) }

(k + mo/r)a 1=10+1 r

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

766 D. MITRA, F. ROMEO AND A. SANGIOVANNI-VINCENTELLI

for k >10 amax {rm/r,

ko - 2}. Now writing **(n) for {1 - x*(n)}, with x* as

in (5.16), we have

E (1 + 1 + molr)a{x*(lr) - x*(lr - r)}

1=10+1 k

(6.11) _ {(ll+1+mor)a- (+mo/r)a}**(lr-r)+(lo+1+mo/r)aft*(lor)

1=10+1

k x*(lr-r)

- a 1 ( 1 + (lo + 1 + molr)ax*(lor),

1=10o+1

( + mo/r)

where in the last step we have used the relation

a < 1.

On substituting (6.11) in (6.10) we obtain, for k > lo

D + 2a ,k r * _

(6.12a) IIx(kr) + 2ar - r)

(k + mo/lr)a (k + mol/r)a =10+1 (1 + molr)l-a '

where

lo r

(6.12b) D, A (ko

+ molr)a E I||x(lr

+ 1 - s) - x(lr - s)II

1=1 s=1

+ 2(10

+ 1 + molr)a*x*(lor).

To proceed further it is necessary to estimate {x*(m)}, and this is

undertaken

in the following

proposition.

Proposition

6.2.

(6.13)

k*(m) = 1 - x*(m) = II

x(m) - e*Il

5 g )g(*) m 0, 1,...

jes (m + mo + 1)

where {b(j)} is given by

b(j) A {cO.) - c(*)}I/y, j E S,

c(*), see (3.13), is the minimum

of the cost function

and g(*), see Proposition

3.2, is

Proof. By the definition of x(m) given in (3.1) and that of x*(m) given in

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

Convergence

and

finite-time

behavior

of simulated

annealing 767

(5.16) we have

1 - *(m) - 1 - g(i) exp (-c(i)/T,)

iEs. G(m)

g(j)/g(*)

;•s

s(m + m0 + l)b) )g

g )/g(*) -es. (m + mo +

1)b(O)"

S+.*

(m + mo + 1)b(O)

Observe

that the bound given in (6.13) is asymptotically (i.e. as m -- oo)

tight.

We can now say that

(6.14a) xr*(lr - r) 5 (- 1 + or)b)

, l= 1, 2,...

where

sgj)/g(*)

(6.14b) gu0)

= g(b) jJeS.

By substituting

(6.14a) in (6.12a) and then bounding

the resulting expression

we obtain

S(6.15) Il(rll D

)a 2aqr0() 1 Ea-b(j) ]

(6.15) IIp(kr)I

I| ( + I

(k + molr)a j a - b ) (k + molr)b() (k + mo/lr)a

where E (lo

- 1 + mo/r).

This bound in (6.15) has been obtained for a b(j), j 4 S*; if this is not true,

then for the terms corresponding

to values of j for which a = b(j), a related

expression

is obtained by a slightly

different

bounding procedure.

6.1.3. Bound for the third term of (6.2). This bound comes directly from

Proposition

6.2.

6.2. Final results. Combining

the results

given in 6.1.1-6.1.3 we obtain the

following

final theorem.

Theorem

6.2. For every k - lo, the following

relation

holds:

v(kr)

- e*|

II5

(k + molr)a

(6.16) + C 2ar(j) [ 1

Ea_

es a - b(j) (k + mo/r)b&) (k + mo/r)aJ

2•j(j')

+es. (k + mo/r)b&)'

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

768 D. MITRA, F. ROMEO AND A. SANGIOVANNI-VINCENTELLI

where D

+ I

v(O)

- x(O))II

(ko

mo/r)a.

Also, a, {b(')) and {(r(j)} are given in (5.12c), (6.13) and (6.14b) respectively.

Equation (6.16) can be further

simplified

if we observe that the dominant

term of 1

(k + molr)b()' JS*,

is given by 1

(k + mo/r)b

where b min b

(j) =-,

is* Y

and 6, which

has been defined

in (3.14), is the difference

between next-to-least

cost and least cost.

A simple corollary

to Theorem 6.2 is the following.

Proposition 6.3. The simulated annealing algorithm with the annealing

schedule

given by (5.10) has the following

estimate

for its rate of convergence:

(6.17) IIv(kr)

- e*II

= O(1/kmin(ab)).

6.3. Discussion. We can

see from

(6.17) that

the bound

on the asymptotic

rate

of convergence is limited by min (a, b). Both a and b depend on 6 and L

derived

from the cost function,

w and r from the connectivity

properties

of the

graph

underlying

the Markov

chain and y from the annealing

schedule. Note

that with all other parameters

and time held fixed, higher y corresponds

higher temperature

and thus, in this sense, to slower cooling. Now y has to

satisfy a condition that gives weak ergodicity, i.e. y - YWE wherein by our

analysis

yWE

= rL, but otherwise

it is a free parameter.

It is therefore

of some

interest

to investigate

the value of y which maximizes

min

(a, b).

Recall the definition

of a in (5.12c) and that b 6/y. Hence a(y) and b(y)

are respectively

increasing

and decreasing

with increasing

y, and it is easy to

see that there exists an unique ? such that a(7) = b(7). Furthermore,

the

problem

max {min(a,b)}

Y:Y=YWE

has the solution

= max (YWE, *).

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

Convergence

and

finite-time

behavior

of simulated

annealing 769

The above procedure

for optimizing

the algorithm

is often feasible since for

many combinatorial

optimization problems, graph partitioning problems in

particular,

estimates of r, L and 6 are available.

The above discussion has been on the effect of y (from the annealing

schedule) on the bound on the rate of convergence

at finite, but large time.

For behavior at smaller time, the more detailed relation (6.16) has to be

considered. Observe that in the right-hand

side of this equation, the only

factors which depend on the time kr are 1/(k + molr)a and 1/(k + molr)b(j),

j f S*. We may glean qualitative

information on the dependence

of the rate of

convergence

on y by investigating

the dependence

of a and {b(0)} on y. Now,

smaller

y gives larger b(j) for each j and, as already

noted, smaller a. Hence,

reducing y has the effect of reducing

the third term and increasing

the first

term in the right-hand

side of (6.16). The dependence of the middle term is

more involved since it has features

of both other terms reflected in it. Roughly,

it is small only when both the first and third terms are small, i.e. in the

mid-range

of y.

With the benefit of analysis we can even go back to (6.2) and deduce

qualitatively

the effect of y on each of the three terms there. The first term

measures how effectively

the difference between v(O) and x(O) is forgotten

step m of the algorithm. The bound in (6.5) corroborates our intuitive

understanding

that this rate of memory loss is aided by having higher y, i.e.

higher temperatures

and slower

cooling. The third

term, for which we have the

most explicit information

(see Proposition

6.2), depends on the rate at which

the quasi-stationary

distribution

approaches

its asymptotic

value, the optimum

distribution. This term benefits from small y. The middle

term benefits from a

matching of the two rates. The point in the analysis where this is most

explicitly manifest is in (6.12a). The two rates are matched and the term

minimized in the mid-range

of y. In all, the above discussion

illuminates the

balancing

of opposite mechanisms

that an optimal annealing

schedule must

reflect.

The analysis

can be brought

to bear on an important

question (for which

are indebted

to H. S. Witsenhausen):

to what extent does simulated

annealing

exploit the connectivity of the configurations

in a particular case? The

comparison is therefore between a given partially-connected

graph and a

construct

in which the connectivity

is artificially

increased.

A first observation

is that the artificial increase

of connectivity

leads to a deteriorating component

in performance,

insofar as the departure

of the quasi-stationary

distribution

a particular temperature

from the optimum distribution

(see third term in

(6.3)) is greater. This is easily seen by tracing the effect of increased

connectivity

on g(j)/g(*), in Proposition

6.2. On the other hand, the effect on

the coefficient

of ergodicity

and, in particular,

on the parameter

a in the bound

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

770 D. MITRA, F. ROMEO AND A. SANGIOVANNI-VINCENTELLI

for it given in Proposition

6.1, depends on the particulars

of the case being

considered.

To see this observe that the parameter

a depends on w, r and L

and typically

the first

two decrease

while the last increases with the increase

connectivity

in the construct.

7. Concluding

remarks

We have proven

a number

of results

on the behavior

of simulated

annealing.

In particular,

we have introduced

an annealing

schedule which

guarantees

that

the individual

state probabilities converge either to a positive value or to 0

depending upon whether the configuration corresponding

to the state is

globally

least-cost

or not. Also we have analyzed

finite-time

behavior

in terms

of a decomposition

of the distance of the state probability

vector from the

optimum.

Each of the three terms of the decomposition

reflects

an important

component

of the behavior of the algorithm.

Each term has an independent

bound and this allows the trade-offs in the design of the algorithm

to be

quantified.

We give below a selection of three directions in which the present analysis

may be extended:

1. An analysis

more closely attached

to the evolution

with time of mean cost

rather

than the distance

of the state distribution

from the optimal.

2. An analysis

of schedules

in which temperature

is lowered at a faster rate

than that allowed here by (1.1).

3. The exploitation of special properties of the cost function to design

matched annealing schedules with a provable improvement in

performance.

References

[1] BINDER,

K. (1978)

Monte Carlo

Methods

in Statistical

Physics.

Springer-Verlag,

Berlin.

[2] DOBRUSHIN,

R. L. (1956) Central limit theorem for nonstationary

Markov

chains, I, II.

Theory Prob. Appl. 1, 65-80; 329-383.

[3] GAREY,

M. R. AND

JOHNSON,

D. S. (1979) Computers and Intractability: A Guide to the

Theory of NP-Completeness. Freeman, San Francisco.

[4] GEMAN,

S. AND GEMAN,

D. (1948) Stochastic relaxation, Gibbs distributions, and the

Bayesian restoration of images. IEEE Trans. Pattern

Analysis and Machine Intelligence

721-741.

[5] GIDAS, B. (1985) Non-stationary Markov chains and convergence of the annealing

algorithm. J. Statist. Phys. 39, 73-131.

[6] HAJEK,

B. (1985) Cooling schedules for optimal annealing. Preprint.

[7] ISAACSON,

D. L. AND MADSEN,

R. W. (1976) Markov Chains: Theory and Applications.

Wiley, New York.

[8] IOSIFESCU,

M. (1980) Finite Markov Processes and their Applications. Wiley, New York.

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

Convergence

and

finite-time

behavior

of simulated

annealing 771

[9] JOHNSON,

D. S. (1984) Simulated annealing performance

studies. Presented at the

Simulated

Annealing

Workshop,

Yorktown

Heights.

[10] KELLY, F. P. (1980) Reversibility

and Stochastic

Networks.

Wiley, New York.

[11] KIRKPATRICK, S., GELATT, C. D. AND VECCHI, M. P. (1983) Optimization

by simulated

annealing.

Science

220, 671-680.

[12] LUNDY, M. AND MEES, A. (1984) Convergence

of the annealing algorithm.

Presented at

Simulated

Annealing Workshop,

Yorktown

Heights.

[13] MADSEN, R. W. AND ISAACSON, D. L. (1973)

Strongly ergodic

behavior

for non-stationary

Markov

processes.

Ann. Prob. 1, 329-335.

[14] METROPOLIS, N., ROSENBLUTH, A. W., ROSENBLUTH, M. N. AND TELLER, A. H. (1953)

Equations

of state calculations

by fast computing

machines. J. Chem.

Phys. 21, 1087-1091.

[15] ROMEO,

F. AND SANGIOVANNI-VINCENTELLI,

A. (1984) Probabilistic hill climbing

algo-

rithms:

properties

and applications.

ERL Memo, University

of California,

Berkeley.

[16] SECHEN, C. AND SANGIOVANNI-VINCENTELLI,

A. (1984) The timber

wolf placement

and

routing package.

Proc. 1984 Custom

Integrated

Circuit

Conference,

Rochester.

[17] SCHWARTZ,

J. (1980) Fast probabilistic

algorithms

for verification of polynomial

identities.

J. Assoc. Comput,

Mach.

27, 701-717.

[18] SENETA, E. (1980) Non-negative

Matrices

and Markov

Chains,

2nd edn. Springer-Verlag,

New York.

[19] VECCHI, M. P. AND KIRKPATRICK,

S. (1983) Global

wiring by simulated

annealing.

IEEE

Trans.

Computer-Aided

Design

2, 215-222.

This content downloaded from 128.59.46.225 on Wed, 2 Jul 2014 16:01:23 PM

All use subject to JSTOR Terms and Conditions

A Landscape-based Analysis of Fixed Temperature and Simulated Annealing

Article

Apr 2022
EUR J OPER RES

Since the introduction of Simulated Annealing (SA), researchers have considered variants that keep the same temperature value throughout the whole search and tried to determine whether this strategy can be more effective than the original cooling scheme. Several studied have tried to answer this question without a conclusive answer and without providing indications that could be useful for a practical implementation. In this work, we address this question following an experimental approach, relating the characteristics of the algorithms with the characteristics of the landscapes they encounter. We use problem-independent landscape features to study the algorithmic behaviour across different problems. We consider three different objective functions and various instance classes and determine the conditions under which the fixed-temperature variant of SA can outperform its original counterpart and when SA is instead a better choice.

SANA: cross-species prediction of Gene Ontology GO annotations via topological network alignment

Article

Full-text available

Jul 2022

Topological network alignment aims to align two networks node-wise in order to maximize the observed common connection (edge) topology between them. The topological alignment of two protein–protein interaction (PPI) networks should thus expose protein pairs with similar interaction partners allowing, for example, the prediction of common Gene Ontology (GO) terms. Unfortunately, no network alignment algorithm based on topology alone has been able to achieve this aim, though those that include sequence similarity have seen some success. We argue that this failure of topology alone is due to the sparsity and incompleteness of the PPI network data of almost all species, which provides the network topology with a small signal-to-noise ratio that is effectively swamped when sequence information is added to the mix. Here we show that the weak signal can be detected using multiple stochastic samples of “good” topological network alignments, which allows us to observe regions of the two networks that are robustly aligned across multiple samples. The resulting network alignment frequency (NAF) strongly correlates with GO-based Resnik semantic similarity and enables the first successful cross-species predictions of GO terms based on topology-only network alignments. Our best predictions have an AUPR of about 0.4, which is competitive with state-of-the-art algorithms, even when there is no observable sequence similarity and no known homology relationship. While our results provide only a “proof of concept” on existing network data, we hypothesize that predicting GO terms from topology-only network alignments will become increasingly practical as the volume and quality of PPI network data increase.

Mathematical Aspects of the Digital Annealer’s Simulated Annealing Algorithm

Article

Full-text available

Nov 2023
J STAT PHYS

The Digital Annealer is a CMOS hardware designed by Fujitsu Laboratories for high-speed solving of Quadratic Unconstrained Binary Optimization (QUBO) problems that could be difficult to solve by means of existing general-purpose computers. In this paper, we present a mathematical description of the first-generation Digital Annealer’s Algorithm from the Markov chain theory perspective, establish a relationship between its stationary distribution with the Gibbs-Boltzmann distribution, and provide a necessary and sufficient condition on its cooling schedule that ensures asymptotic convergence to the ground states.

Mathematical aspects of the Digital Annealer's simulated annealing algorithm

Preprint

Full-text available

Mar 2023

The Digital Annealer is a quantum-inspired CMOS hardware designed by Fujitsu Laboratories for high-speed solving Quadratic Unconstrained Binary Optimization (QUBO) problems that could be difficult to solve by means of existing general-purpose computers. In this paper, we present a mathematical description of the first-generation Digital Annealer's Algorithm from the Markov chain theory perspective, establish a relationship between its stationary distribution with the Gibbs-Boltzmann distribution, and provide a necessary and sufficient condition on its cooling schedule that ensures asymptotic convergence to the ground states.

Simulated Annealing Based Influence Maximization in Social Networks

Article

Aug 2011

The problem of influence maximization, i.e., mining top-k influential nodes from a social network such that the spread of influence in the network is maximized, is NP-hard. Most of the existing algorithms for the prob- lem are based on greedy algorithm. Although greedy algorithm can achieve a good approximation, it is computational expensive. In this paper, we propose a totally different approach based on Simulated Annealing(SA) for the influence maximization problem. This is the first SA based algorithm for the problem. Additionally, we propose two heuristic methods to accelerate the con- vergence process of SA, and a new method of comput- ing influence to speed up the proposed algorithm. Experimental results on four real networks show that the proposed algorithms run faster than the state-of-the-art greedy algorithm by 2-3 orders of magnitude while being able to improve the accuracy of greedy algorithm.

SANA: Cross-Species Prediction of Gene Ontology GO Annotations via Topological Network Alignment

Preprint

Full-text available

Apr 2022

Topological network alignment aims to align two networks node-wise in order to maximize the observed common connection (edge) topology between them. The topological alignment of two Protein-Protein Interaction (PPI) networks should thus expose protein pairs with similar interaction partners allowing, for example, the prediction of common Gene Ontology (GO) terms. Unfortunately, no network alignment algorithm based on topology alone has been able to achieve this aim, though those that include sequence similarity have seen some success. We argue that this failure of topology alone is due to the sparsity and incompleteness of the PPI network data of almost all species, which provides the network topology with a small signal-to-noise ratio that is effectively swamped when sequence information is added to the mix. Here we show that the weak signal can be detected using multiple stochastic samples of "good" topological network alignments, which allows us to observe regions of the two networks that are robustly aligned across multiple samples. The resulting Network Alignment Frequency (NAF) strongly correlates with GO-based Resnik semantic similarity and enables the first successful cross-species predictions of GO terms based on topology-only network alignments. Our best predictions have an AUPR of about 0.4, which is competitive with state-of-the-art algorithms, even when there is no observable sequence similarity and no known homology relationship. While our results provide only a "proof of concept" on existing network data, we hypothesize that predicting GO terms from topology-only network alignments will become increasingly practical as the volume and quality of PPI network data increase.

On the current failure -- but bright future -- of topology-driven biological network alignment

Preprint

Full-text available

Apr 2022

The function of a protein is defined by its interaction partners. Thus, topology-driven network alignment of the protein-protein interaction (PPI) networks of two species should uncover similar interaction patterns and allow identification of functionally similar proteins. Howver, few of the fifty or more algorithms for PPI network alignment have demonstrated a significant link between network topology and functional similarity, and none have recovered orthologs using network topology alone. We find that the major contributing factors to this failure are: (i) edge densities in current PPI networks are too low to expect topological network alignment to succeed; (ii) when edge densities are high enough, some measures of topological similarity easily uncover functionally similar proteins while others do not; and (iii) most network alignment algorithms fail to optimize their own topological objective functions, hampering their ability to use topology effectively. We demonstrate that SANA-the Simulated Annealing Network Aligner-significantly outperforms existing aligners at optimizing their own objective functions, even achieving near-optimal solutions when optimal solution is known. We offer the first demonstration of global network alignments based on topology alone that align functionally similar proteins with p-values in some cases below 1e-300. We predict that topological network alignment has a bright future as edge densities increase towards the value where good alignments become possible. We demonstrate that when enough common topology is present at high enough edge densities-for example in the recent, partly synthetic networks of the Integrated Interaction Database-topological network alignment easily recovers most orthologs, paving the way towards high-throughput functional prediction based on topology-driven network alignment.

Thermodynamic Algorithms

Chapter

Apr 2024

Gabor Korvin

A bi-objective hybrid vibration damping optimization model for synchronous flow shop scheduling problems

Article

Dec 2022

Flow shop scheduling deals with the determination of the optimal sequence of jobs processing on machines in a fixed order with the main objective consisting of minimizing the completion time of all jobs (makespan). This type of scheduling problem appears in many industrial and production planning applications. This study proposes a new bi-objective mixed-integer programming model for solving the synchronous flow shop scheduling problems with completion time. The objective functions are the total makespan and the sum of tardiness and earliness cost of blocks. At the same time, jobs are moved among machines through a synchronous transportation system with synchronized processing cycles. In each cycle, the existing jobs begin simultaneously, each on one of the machines, and after completion, wait until the last job is completed. Subsequently, all the jobs are moved concurrently to the next machine. Four algorithms, including non-dominated sorting genetic algorithm (NSGA II), multi-objective simulated annealing (MOSA), multi-objective particle swarm optimization (MOPSO), and multi-objective hybrid vibration-damping optimization (MOHVDO), are used to find a near-optimal solution for this NP-hard problem. In particular, the proposed hybrid VDO algorithm is based on the imperialist competitive algorithm (ICA) and the integration of a neighborhood creation technique. MOHVDO and MOSA show the best performance among the other algorithms regarding objective functions and CPU Time, respectively. Thus, the results from running small-scale and medium-scale problems in MOHVDO and MOSA are compared with the solutions obtained from the epsilon-constraint method. In particular, the error percentage of MOHVDO’s objective functions is less than 2% compared to the epsilon-constraint method for all solved problems. Besides the specific results obtained in terms of performance and, hence, practical applicability, the proposed approach fills a considerable gap in the literature. Indeed, even though variants of the aforementioned meta-heuristic algorithms have been largely introduced in multi-objective environments, a simultaneous implementation of these algorithms as well as a compared study of their performance when solving flow shop scheduling problems has been so far overlooked.

On the current failure—but bright future—of topology-driven biological network alignment

Chapter

Jun 2022

Since the function of a protein is defined by its interaction partners, and since we expect similar interaction patterns across species, the alignment of protein-protein interaction (PPI) networks between species, based on network topology alone, should uncover functionally related proteins across species. Surprisingly, despite the publication of more than fifty algorithms aimed at performing PPI network alignment, few have demonstrated a statistically significant link between network topology and functional similarity, and none have demonstrated that orthologs can be recovered using network topology alone. We find that the major contributing factors to this surprising failure are: (i) edge densities in most currently available experimental PPI networks are demonstrably too low to expect topological network alignment to succeed; (ii) in the few cases where the edge densities are high enough, some measures of topological similarity easily uncover functionally similar proteins while others do not; and (iii) most network alignment algorithms to date perform poorly at optimizing even their own topological objective functions, hampering their ability to use topology effectively. We demonstrate that SANA—the Simulated Annealing Network Aligner—significantly outperforms existing aligners at optimizing their own objective functions, even achieving near-optimal solutions when the optimal solution is known. We offer the first demonstration of global network alignments based on topology alone that align functionally similar proteins with p-values in some cases below 10⁻³⁰⁰. We predict that topological network alignment has a bright future as edge densities increase toward the value where good alignments become possible. We demonstrate that when enough common topology is present at high enough edge densities—for example in the recent, partly synthetic networks of the Integrated Interaction Database—topological network alignment easily recovers most orthologs, paving the way toward high-throughput functional prediction based on topology-driven network alignment.

Reversibility and Stochastic Networks

Article

Oct 1980

Markov Chains: Theory and Applications

Article

Apr 1977

Monte Carlo Methods in Statistical Physics

Book

Jan 1986

Kurt Binder

Probabilistic h ill-climbing algorithms: properties and applications

Article

Jan 1984

Simulated annealing proposed by Kirkpatrick et al. has proven to be an effective technique to solve general combinatorial optimization problems. Morkov chains are proposed as mathematical models of the Simulated Annealing algorithm. Using these models, it has been possible to prove that under certain assumptions on the rules used by the algorithm to generate the configurations of the problem and on the time spent at each temperature, the Simulated Annealing algorithm generates a global optimum solution with probability one. This result has made possible the definition of a general class of algorithms with the same statistical properties: the class of probabilistic hill-climbing methods. The mathematical properties of this class are presented and rules on the selection of annealing schedules are obtained from these properties.

Finite Markov Processes and Their Applications

Article