ArticlePDF Available

A comparison of some parallel game-tree search algorithms (Revised version)

November 1992

November 1992

Authors:

Jaleh Rezaie

Eastern Kentucky University

Raphael Finkel

University of Kentucky

This paper experimentally compares several sequential and parallel game-tree search methods: alpha-beta, mandatory work first, principal-variation splitting, tree splitting, ER, and delay splitting. All have been implemented in a common environment provided by the DIB package. Key words: game trees, heuristic search, alpha-beta 1. Introduction In this paper we compare some of the parallel methods for searching large game trees. These trees arise in the area of artificial intelligence and are closely related to trees searched in other application areas. Exhaustive search of a tree is prohibitively expensive. There are several ways to ameliorate the problem. # Search only to a given depth. # Apply heuristics, such as the alpha-beta method, to cut off fruitless search. # Apply many computers simultaneously in pursuing the search. We concentrate on distributed variants of the alpha-beta heuristic that try to avoid searching unnecessary parts of the tree while keeping many processors...

Minimal subtree when deep cutoffs are not considered

…

Processor tree mapped onto game tree is made, the following relation holds among the children z 1 ,..., z d : α < −negamax(z 1 ) <... < −negamax(z d ) < β

…

Efficiency vs number of machines

…

Efficiency vs number of machines

…

Nodes examined vs. number of machines

…

Figures - uploaded by Raphael Finkel

Content may be subject to copyright.

Content uploaded by Raphael Finkel

Content may be subject to copyright.

A comparison of some parallel game-tree search algorithms

(Revised version)

Jaleh Rezaie (jrezaie@ms.uky.edu)

Raphael Finkel (raphael@ms.uky.edu)

Department of Computer Science

University of Kentucky

Lexington, KY 40506-0027

Abstract

This paper experimentally compares several sequential and parallel game-tree

search methods: alpha-beta, mandatory work ﬁrst, principal-variation splitting, tree split-

ting, ER, and delay splitting. All have been implemented in a common environment pro-

vided by the DIB package.

Key words: game trees, heuristic search, alpha-beta

1. Introduction

In this paper we compare some of the parallel methods for searching large game

trees. These trees arise in the area of artiﬁcial intelligence and are closely related to trees

searched in other application areas. Exhaustive search of a tree is prohibitively expen-

sive. There are several ways to ameliorate the problem.

Search only to a given depth.

Apply heuristics, such as the alpha-beta method, to cut off fruitless search.

Apply many computers simultaneously in pursuing the search.

We concentrate on distributed variants of the alpha-beta heuristic that try to avoid search-

ing unnecessary parts of the tree while keeping many processors fruitfully busy.

The algorithms we compare are alpha-beta, mandatory work ﬁrst, principal-

variation splitting, tree splitting, ER, and delay splitting. To be able to make a fair com-

parison between the above algorithms, we have extended the DIB package [1] to use it as

a framework for implementing all the algorithms we compare.

Section 2 describes the DIB package. Section 3 introduces the alpha-beta pruning

and brieﬂy describes the algorithms used in the experiment. Section 4 presents experi-

mental results that compare the algorithms. Section 5 compares the effects of several

sorting strategies on the above algorithms. Section 6 illustrates the new results achieved

by adjustments made to MWF algorithm. Section 7 summarizes the results, and details

remaining parts of this experiment.

2. DIB — A distributed implementation of backtrack-

ing

In this section we describe how DIB works and how we use it to implement dif-

ferent tree-search algorithms.

Game-tree search 2

2.1. Description of DIB

DIB is a multi-purpose package developed by Finkel and Manber for tree-traversal

problems [1]. It allows applications such as backtrack and branch-and-bound to be

implemented on a multicomputer. DIB’s requirements from the distributed operating

system are minimal. The machines must be connected by a network that supports a

message-passing mechanism; each machine must be able to communicate, not neces-

sarily directly, with all other machines. Our implementation of DIB is programmed in C

and runs in the Unix environment across machines connected by an internet or on a Unix

multiprocessor.

The application program must specify the root of the problem tree, how to generate

children, and calculations needed at each node. It can also optionally specify how to

generate values of a tree node from combining its children’s values and how to spread

information either globally or locally throughout the tree.

DIB divides the problem into subproblems and assigns them to any number of pro-

cessors (potentially nonhomogeneous machines in a network) dynamically. Each proces-

sor maintains a table of explicit work, recording all the problems that have been sent to

the processor, have been generated by the processor itself, and/or have been sent to other

processors. Each processor is responsible for the work in its table. Each item of work

(represented by a node in the backtrack tree, which stands as well for all its descendents)

is labeled by which processor, if any, has been assigned that work.

When a processor A is ﬁnished with a problem and has reported its result to the pro-

cessor that gave it that problem, it will take the ﬁrst (in an inorder traversal of the tree)

unassigned problem from its table. If no unassigned problem is available, A sends a

work request message to another processor (or processors), selected at random from A’s

peers, repeatedly (with some delay) until new work arrives.

A processor B that receives a work request message interrupts its own search and

trys to respond by sending some work to the requesting processor from its table. If no

unassigned problem is available in the table, then the problem B is working on is subdi-

vided and its children are put in the table. Until work is subdivided, DIB maintains a fast

representation of the current search (just a recursion stack; we call it the ‘‘implicit’’

representation); subdivided work is explicit in the table. After subdivision, B can usually

send some unassigned work to the requesting processor. Subdivision may have to be

repeated several times before an unassigned problem is generated, but if it reaches a

trivial problem (not worth subdividing), or if it reaches the depth at which B itself is

searching, the request is not granted. B resumes its search after dealing with the incom-

ing request.

DIB is fault tolerant, in that work that B has given to A can still be accomplished

by B if there is nothing else worth doing and if A has not yet reported the result of that

work. This mechanism does not need timeouts or ‘‘heartbeats’’ to detect failure.

We have enhanced the DIB package so that it can achieve high efﬁciency for game

tree search. The principal enhancement is added ﬂexibility given to the application level

for delaying evaluation of a game-tree node. That is, the application can refuse to gen-

erate additional children for a node but indicate that in the future it may again be willing

to do so. DIB does not attempt to generate children of such a node again until some

other child of that node has completed or a data update message has arrived at that node.

Game-tree search 3

To experiment with game playing, we have designed a two-level application struc-

ture. The game level is game-speciﬁc, knowing the rules for tic-tac-toe, Othello, or

checkers. The control level communicates both with DIB and the game level. It knows

the pattern of evaluation for one of the algorithms we compared, namely, alpha-beta,

mandatory work ﬁrst, principal-variation splitting, tree splitting, ER, or delay splitting.

Any of the game modules we implemented can be used with any of the control modules;

any such combination can be used with our enhanced DIB.

Since DIB distributes work, collects and reports results, and passes messages

between processors in a similar way for all the combinations, we can compare different

control modules in a fairly implementation-independent fashion. Previous comparisons

are questionable because each algorithm was implemented in a different parallel environ-

ment.

3. Parallel tree search algorithms

The best way to evaluate a parallel algorithm for a given problem is to measure the

extent in which it takes advantage of available processors. This idea can be formulated

as follows:

speedup S =

time required by parallel algorithm

time required by best sequential algorithm

efﬁciency E =

number of processors used

It is not easy to achieve a ‘‘perfect’’ efﬁciency of 1.0. For a given sized problem,

efﬁciency tends to decrease as the number of processors increases. This relationship is

explained by Kumar and Rao [2] as resulting from an increase in the communication time

(sum of the time spent by all processors in communicating with neighboring processors,

waiting for messages, time in starvation, and so forth), while there is no change in com-

putation time (sum of the time spent by all the processors in useful computation). The

relationship between communication time (T

), computation time (T

), and efﬁciency

(E) is described as follows:

Kumar and Rao [2] deﬁne an isoefﬁciency function that shows how the problem

must grow with number of processors to achieve the same efﬁciency. They also mention

that since most problems have a sequential component (in depth-ﬁrst search, it is one

node expansion), problem size must grow at least linearly to maintain a particular

efﬁciency.

Steinberg and Solomon [3] blame the failure to achieve perfect efﬁciency on three

types of ‘‘loss’’.

Starvation loss: processors sitting idle while awaiting work to be given to them.

Interference loss: time spent waiting for access to shared resources such as the set of

unﬁnished subproblems.

Speculative loss: time spent performing unnecessary work, such as that performed by

a parallel algorithm before it is possible to determine that the work is necessary.

Game-tree search 4

Because a parallel algorithm must evaluate different nodes simultaneously, informa-

tion gained by evaluation of one node could come too late to cut off evaluation of

other nodes.

3.1. Alpha-beta

The alpha-beta algorithm is a sequential technique used to evaluate a game tree

efﬁciently. The nodes corresponding to the ﬁrst player’s moves are called max nodes,

and the other nodes are called min nodes. The value of a max node is the maximum of

the value of its children, where as the value of a min node is the minimum of the value of

its children. The value of a leaf is determined by a game-speciﬁc static evaluator.

Alpha-beta ignores branches that are certain not to contribute to the value of the current

node. Figure 1 shows a sample game tree with a cutoff. In this ﬁgure, node z, which is a

max node, has two children, and its ﬁrst child is evaluated to 9. Therefore,

value(z) = max{9, value(y)}

where y is the other child of z. Now if the ﬁrst child (we will often call it the eldest

child) of y is evaluated to 7 then

value(y) = min{7, ...}

so the value of z is 9 regardless of the value of y. It follows that the remaining children

of the node y need not be evaluated. Ignoring those children is called shallow cutoff.

Figure 2 illustrates another type of cutoff. After the eldest child of node z is

evaluated, we see that z’s value will be greater than or equal to 9. This value is the

current lower bound in the alpha-beta algorithm. The value of a min node in the subtree

rooted at node y must be greater than 9 in order for the lower bound to change. There-

fore, when the algorithm reaches node w (a min node) and its ﬁrst child is evaluated to 7,

the evaluation of the remaining children can be avoided. This cutoff is called a deep cut-

off because the node w is more than one ply below the node z.

Following Fishburn [4], we present the following Pascal-like code of the alpha-beta

algorithm, as adapted from Knuth and Moore [5]:

Figure 1: Shallow cutoff

Game-tree search 5

Figure 2: Deep cutoff

function alphabeta(z : position; α, β : integer):integer;

var

Answer, Child, t, d : integer;

begin

determine the child positions z

,..., z

if d = 0 then

return(StaticValue(z))

else

begin

Answer := α ;

for Child := 1 to d do

begin

t := -alphabeta(z

Child

, -β, -Answer);

if t > Answer then

Answer := t;

if Answer ≥ β then

return(Answer); {cutoff}

end;

return(Answer);

end;

end.

The alpha-beta algorithm satisﬁes the following conditions [5]:

if negamax(z) ≤ α then alphabeta(z, α, β) ≤ α,

if α < negamax(z) < β then alphabeta(z, α, β) = negamax(z),

if negamax(z) ≥ β then alphabeta(z, α , β) ≥ β.

These conditions imply that

Game-tree search 6

alphabeta(z, −∞, ∞) = negamax(z) ,

which means that if the initial window [alpha, beta] is (−∞, ∞) then the alpha-beta algo-

rithm returns the same value as the negamax algorithm (straightforward tree-evaluation

algorithm that never cuts work off) [5].

The performance of the alpha-beta algorithm depends a great deal on the order in

which children of a node are expanded. If the children of each node in the game tree are

expanded in increasing order of their negamax values, then the largest number of cutoffs

will occur.

Knuth and Moore [5] introduced the idea of critical nodes in their analysis of the

best case of the alpha-beta algorithm, and Steinberg and Solomon [3] use the following

rules to determine the critical nodes:

The root of the game tree is a type-1 node.

The eldest child of a type-1 node is also type-1. The remaining children are type-2.

The eldest child of a type-2 node is a type-3 node.

All children of a type-3 node are type-2.

A node is critical iff it is assigned a number by the above rules.

The critical nodes form a minimal subtree [3] of the game tree which, regardless of

the values of the terminal nodes, will always be examined by the alpha-beta algo-

rithm [5]. The number of terminal nodes in the minimal subtree of a complete d-ary tree

of height h is

h/2

−1

If the tree is examined in increasing order of value, the alpha-beta procedure examines

precisely the minimal subtree of the game tree. In short, alpha-beta examines about 2n

1/2

nodes, where negamax would examine n nodes.

3.2. Mandatory work first (MWF)

This algorithm was proposed by Akl, Barnard, and Doran as a parallel implementa-

tion of alpha-beta without deep cutoffs. The name MWF was coined by Fishburn and

Finkel. MWF evaluates critical nodes concurrently and then returns to evaluate other

nodes if needed [6, 7]. When deep cutoffs are not considered in the search algorithm,

only type-1 and type-2 nodes are critical, as shown in Figure 3.

MWF evaluates type-1 nodes completely, but only evaluates type-2 nodes partially.

After the eldest child of a type-1 node (also type-1) has been evaluated, the remaining

children (all type-2) are completely evaluated only if the result of the partial evaluation is

not sufﬁcient to cut them off. All evaluations currently allowed by MWF may be under-

taken simultaneously.

Akl, Barnard, and Doran [6] tested MWF with game trees of depth 4 and branching

factors of 5, 10, 15 and 20. They noticed that MWF has a better efﬁciency when the

game tree has a larger fanout, but found that the speedup reaches a plateau around six.

The total number of nodes visited as well as the number of terminal nodes examined

showed an increase with increasing number of processors, but the plateau was reached

much faster.

Fishburn [4] analyzed MWF for best-ﬁrst and worst-ﬁrst ordering of the game tree.

In the best-ﬁrst ordering, MWF is almost optimal, since its efﬁciency is very close to 1

Game-tree search 7

1 1 1

1 1

2 2

2 2 2 2 2 21

Figure 3: Minimal subtree when deep cutoffs are not considered

when a large number of processors is used. For the worst-ﬁrst ordering, Fishburn used an

example game tree of degree 38 and processor tree of fanout 2 to predict that speedup for

MWF will satisfy

0.93

≤ S ≤ p

0.96

where p is the number of processors. This result is almost as good as tree-splitting.

3.3. The tree-splitting algorithm

Fishburn proposed this method as a natural parallel way to implement the alpha-

beta algorithm. The tree-splitting algorithm splits the game tree into its subtrees at the

root node, and each subtree is assigned to a pool of processors for evaluation [4]. The

pool will evaluate the subtree in parallel if it has more than one processor. In other

words, the game tree is mapped to a processor tree. as shown in Figure 4. Here we have

a binary tree of processors with height two (connected by heavy arcs) mapped onto a ter-

nary game tree. When there are more branches in the game tree than there are in the pro-

cessor tree, the extra game tree branches are queued and assigned to a processor when

one becomes available.

All interior processors in the tree of processors are both masters and slaves except

the root processor, which is only a master. All the leaf processors are slaves. When a

slave processor ﬁnishes the search of its assigned subtree, it reports the value computed

to its master. When a master processor receives a response from one of its slaves, it

updates its alpha-beta window and informs the other working slaves of this new window.

The new window may allow the remaining work under the master to be cutoff. When all

the slaves have ﬁnished, either by cutoff or by reporting their values, the master proces-

sor can compute the value of its own position.

Fishburn [4] calculates the speedup for the tree-splitting algorithm for two different

orderings of the game tree. Worst-ﬁrst ordering produces no alpha-beta cutoffs. It is

achieved by sorting all children of all nodes so that whenever the call alphabeta(z, α, β)

Game-tree search 8

Figure 4: Processor tree mapped onto game tree

is made, the following relation holds among the children z

,..., z

α < −negamax(z

) <... < −negamax(z

) < β

Since there are no cutoffs, there is no speculative loss, so tree splitting achieves practi-

cally perfect speedup.

Best-ﬁrst ordering produces the maximum number of alpha-beta cutoffs. It is

achieved by sorting all children of all nodes so that:

negamax(z) = −negamax(z

) for all nodes z in the game tree.

Using this ordering, the tree-splitting algorithm gives S = O(p

1/2

) with p processors.

The tree-splitting algorithm gives S = O(p

1/2

) with p processors when best-ﬁrst ord-

ering [4] of the game tree is used.

3.4. Principal-variation (PV) splitting

PV splitting is a reﬁnement of the tree-splitting algorithm [8]. It assumes that the

search tree is mapped onto an underlying tree of processors and that the game tree is

strongly ordered, that is, the ﬁrst branch of each node is the best branch at least 70 per-

cent of the time and that the best move is in the ﬁrst quarter of the branches being

searched 90 percent of the time.

The type-1 nodes are recursively evaluated until a given ply is reached, at which

point tree splitting is used. After the value of the principal variation (type-1) node is

backed up the tree, tree splitting is used to evaluate the remaining siblings if they can not

be cut off.

There are two differences between PV splitting and MWF. First, PV splitting

requires a particular underlying processor structure, in contrast with the pool of proces-

sors used in MWF. Second, it waits for the search of type-1 nodes to end before it starts

evaluating the other nodes. This aspect of PV splitting ensures that the best available

value of α is passed to the other nodes of the tree.

Game-tree search 9

PV splitting was compared experimentally with the tree splitting algorithm using

trees of depth 4 and width 24. Experimental results show that PV splitting outperforms

tree splitting, especially when a wider processor tree is used [8]. For example, when a

processor tree with both depth and width of 2 was used, tree-splitting examined 912

nodes, and PV splitting examined 648 nodes. But when the width of the processor tree

was changed to 8, tree-splitting and PV splitting examined 772 and 277 nodes respec-

tively.

3.5. The ER algorithm

This algorithm was developed by Steinberg and Solomon for parallel evaluation of

game trees. It is a sequential algorithm with a parallel implementation [3]. The nodes in

the game tree are divided into two groups, e-nodes and r-nodes. E-nodes will be fully

evaluated, and r-nodes will be refuted, that is, will have an estimated value. All children

of an e-node are evaluated, but as few as one child of an r-node needs to be examined.

Therefore e-nodes are more ‘‘costly’’ than r-nodes. Every internal node has exactly one

e-node child (e-child).

Any child of a node can be chosen as the e-child, but the child with the lowest

negamax value is the best choice [3].

To choose the e-child of a node z, ER evaluates the elder grandchildren (eldest chil-

dren of z’s children) in parallel, and chooses the child whose elder child has the largest

value. The e-child is then evaluated while avoiding re-evaluation of its oldest child,

since we just got its value. The remaining children are refuted in order.

In the parallel implementation of ER, the elder grandchildren can be evaluated in

parallel because they represent mandatory work. Since these grandchildren are them-

selves e-nodes, their elder grandchildren can also be recursively evaluated in parallel.

These parallel evaluations are mandatory work, but if ER is to perform only the manda-

tory work, the remaining siblings of an e-node must be examined sequentially. To avoid

these sequential evaluations and thus starvation loss, ER uses the following two methods:

Parallel refutation: After an e-child y of an e-node z is evaluated, refute y ’s siblings

in parallel. This parallel evaluation is likely to encounter lots of speculative loss.

This work is similar to the speculative work performed by MWF and PV splitting

algorithms.

Multiple e-children: After an e-child of an e-node z is evaluated, choose the next best

child of z as a second e-child. If it happens that the ﬁrst e-child is not actually the

best child of z (other children cannot be immediately refuted), we will have another

e-child that will hopefully help us cut off z’s other children. In general, after the ﬁrst

e-child has been evaluated, ensure that z always has one active e-child.

Steinberg and Solomon compared ER to PV splitting. Sequential ER evaluates

more nodes than alpha-beta, but sequential PV splitting is identical to alpha-beta. For

this reason Steinberg and Solomon [3] used relative efﬁciency and relative speedup as

shown below to compare the two algorithms.

relative efﬁciency =

no. of processors used

relative speedup

Game-tree search 10

relative speedup=

time required by parallel algorithm

time required by parallel algorithm with 1 processor

Experiments show that ER achieves twice the efﬁciency and speedup of the PV splitting

algorithm when used on sufﬁciently deep trees [3]. The average efﬁciency achieved by

ER using 16 processors for 7, 8, 9, and 10 ply trees are respectively 0.44, 0.52, 0.68, and

0.58. The corresponding efﬁciencies for PV splitting are 0.28, 0.31, 0.31, and 0.31. The

range of speedup for 7 to 9 ply trees using 16 processors are 7.1 to 10.9 for the ER algo-

rithm and 4.5 to 5.0 for the PV splitting algorithm. Steinberg and Solomon contribute

ER’s higher efﬁciency to low starvation loss.

3.6. Delay splitting

This algorithm delays the evaluation of each node until its eldest sibling is com-

pletely evaluated. Starvation loss is accepted in order to increase the number of cutoffs.

Evaluation delay occurs at every level for each node, thus making delay splitting dif-

ferent from PV splitting, in which delay of evaluation occurs only along the principle

variation route.

The following is Pascal-like code for delay splitting:

function DelaySplit(z : position; α, β): integer;

var d, i : integer;

value : array[1..MAXWIDTH] of integer;

begin

if (I am a leaf processor) then

return(alphabeta(z, α, β));

determine the child positions z

, ..., z

α = -DelaySplit(z

, −β, −α);

if α ≥ β then

return(α);

for i := 2 to d do in parallel

begin

value[i] := -DelaySplit(z

, −β, −α);

begincrit {critical region }

if value[i] > α then

α := value[i];

endcrit;

if α ≥ β then

return(α); { cutoff }

end;

end. { DelaySplit }

4. Experimental results

We have tested the above algorithms on a Sequent Symmetry with 26 cpus using an

unsorted tree of depth 9 and fanout of at most 15 (the fanout decreases by one at each

level) generated by the game tic-tac-toe. The algorithms tend to have more cutoffs with

a sorted tree, but are likely to have a higher efﬁciency with a worst-ﬁrst sorted tree. Not

sorting at all gives us a comparison with a reasonable amount of cutoff and a reasonable

amount of parallelism. The relative efﬁciency comparisons are shown in Figure 5.

(Relative efﬁciency compares the parallel execution time to the sequential execution time

of the same algorithm in the same environment. The sequential execution times of all

algorithms we tested are quite similar.) For this particular test, the MWF algorithm

Game-tree search 11

achieved an almost perfect efﬁciency, followed by delay splitting and ER algorithms

with speedup of over 12 and 9 respectively, using 20 processors. We attribute our ability

to exceed a speedup of 6 with MWF to DIB’s parallelization environment and the tree

structure used in this experiment.

Figure 6 shows the ratio of the number of nodes examined by the algorithms using

different numbers of machines verses using one machine (sequential algorithm).

alphabeta

MWF

delaysplit

TreeSplit10

PVS10

1 3 5 7 9 11 13 15 17 19 21

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Figure 5: Efﬁciency vs number of machines

Game-tree search 12

alphabeta

delaysplit

TreeSplit10

PVS10

MWF

1 3 5 7 9 11 13 15 17 19 21 23

0.00

1.00

2.00

3.00

Figure 6: Relative no. of nodes examined vs. number of machines

We have also tested the algorithms using Othello with a 6×6 board. All algorithms

have almost the same relative efﬁciency, with delay splitting leading when fewer than six

processors are used. In this experiment, the speedup did not exceed 7 (Figure 7).

Game-tree search 13

alphabeta

MWF

delaysplit

TreeSplit10

PVS10

1 3 5 7 9 11 13 15 17 19 21

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Figure 7: Efﬁciency vs number of machines

5. Sorting methods

We have compared the effects of four different sorting strategies on the search algo-

rithms. In all the strategies, an expansion depth parameter speciﬁes how many levels

below a node are expanded in order to sort that node’s children. We set the expansion

depth parameter to 3 for the experiments reported here. That is, to sort a node z, we

expand the tree to three levels below z, evaluate the leaves statically, apply alpha-beta to

those values to back them up to node z’s children, then sort those children accordingly.

Another adjustable parameter is the sorting depth parameter, which speciﬁes the

maximum depth of a node to which sorting may be applied. For some applications, like

tic-tac-toe with a 4×4 board, every level of the search tree may be proﬁtably sorted, but

in other applications, sorting nodes beyond some level increases the total computation

time; sorting time outweighs cutoff beneﬁts. For example, in Othello with a 6×6 board

and a search tree of nine levels, the best overall results are achieved when the sorting

depth parameter is set to six levels.

Game-tree search 14

Our ﬁrst attempt at sorting was to sort only the children of the top node. This

regime, called ‘‘TN (top-node) sorting’’, applies to all the algorithms except ER, which

has its own internal sorting mechanism. There is no noticeable difference in number of

nodes, total time, or efﬁciency for any of the algorithms between TN and no sorting at

all.

Next we extended the sorting to include all nodes on the principal variation route.

We call this sorting regime ‘‘PVR sorting’’. Our tests of PVR sorting for PV spitting,

MWF, and delay splitting show no signiﬁcant improvement in total computation time.

In the third sorting regime, we sort at the top node and at every eldest child. We

call this sorting regime ‘‘EC (eldest child) sorting’’. EC sorting applies to MWF and

delay splitting, since only in these two algorithms do we suspend evaluation of all nodes

that are not eldest children until their eldest sibling is fully evaluated. Therefore having

the best child evaluated ﬁrst should result in more cutoffs.

As expected, the results are much better for EC sorting than PRV sorting, as evi-

denced by tests with our tic-tac-toe and Othello applications. The total computation time

is almost reduced by half when fewer machines are used. The reduction in the efﬁciency,

expected due to the improvement in the sequential performance of the algorithms, is not

too bad. Figures 8, 9 and 10 show the number of nodes examined (in multiples of 1000),

total time and efﬁciency comparisons for MWF using the 4×4 tic-tac-toe game.

MWF(EC-sorting)

MWF(NoSorting)

1 3 5 7 9 11 13 15 17 19 21

0.00

50.00

100.00

150.00

200.00

250.00

300.00

350.00

400.00

450.00

500.00

550.00

600.00

650.00

Figure 8: Nodes examined vs. number of machines

Game-tree search 15

MWF(NoSorting)

MWF(EC-sorting)

SortTime(EC-sorting)

1 3 5 7 9 11 13 15 17 19 21

0.00

40.00

80.00

120.00

160.00

200.00

240.00

280.00

320.00

360.00

400.00

Figure 9: Time vs. number of machines

MWF(EC-sorting)

MWF(NoSorting)

1 3 5 7 9 11 13 15 17 19 21

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Figure 10: Efﬁciency vs. number of machines

Game-tree search 16

The last sorting regime is motivated by considering the type-2 nodes in MWF. The

eldest child of a type-2 node is evaluated before its other children are generated. There-

fore, if we also sort the children of a type-2 node there should be even more cutoffs. In

this sorting regime, in addition to sorting children of every eldest child, we also sort at

every type-2 node. We call this regime ‘‘TT (type-two) sorting’’.

TT sorting resulted in a vast improvement in total computation time and the number

of nodes generated. Unfortunately, a subtle bug in our implementation (since ﬁxed)

renders all our conclusions about this sorting method inaccurate; previous versions

of this report should not be trusted in this regard. In fact, what we implemented did

not evaluate type-2 nodes as deeply as type-1 nodes, so far fewer nodes were

evaluated. So TT sorting (in this report) implies a different evaluation strategy as

well. The computation time is almost 30 times faster than the computation time using

EC sorting for a 4×4 tic-tac-toe game, even though about 95% of the time is spent on

sorting in the 1-machine (sequential) case. These results are shown in Figures 11 and 12.

The efﬁciency is drastically reduced because the work does not seem to be divided

evenly between the machines. Our assumption is that a better distribution of work can be

achieved by some adjustments to DIB itself so that this great speed can be accompanied

by a better efﬁciency.

MWF(TT-sorting)

MWF(EC-sorting)

SortTime(EC-sorting) ........

SortTime(TT-sorting)

1 3 5 7 9 11 13 15 17 19 21

0.00

20.00

40.00

60.00

80.00

100.00

120.00

140.00

160.00

180.00

200.00

220.00

240.00

260.00

280.00

300.00

320.00

340.00

Figure 11: Time vs. number of machines

Game-tree search 17

MWF(EC-sorting)

MWF(TT-sorting)

1 3 5 7 9 11 13 15 17 19 21

0.00

25.00

50.00

75.00

100.00

125.00

150.00

175.00

200.00

225.00

250.00

Figure 12: Nodes examined vs. number of machines

We also used three-level sort for TT sorting. The great speed in this method

allowed us to build a complete game tree for our experiments. In previous experiments

with tic-tac-toe, we built a search tree with nine levels; the complete search tree for a 4×4

board has 16 levels.

6. New results

We have new results based on adjustments made to MWF concerning the selection

of type-1 nodes. In dynamic MWF (DMWF), we decide which children of a type-1 node

to fully investigate not by taking the ﬁrst (as in MWF), but by taking all those whose

static value is above the 90th percentile of its siblings. To our knowledge, no one has

tried algorithms that dynamically adjust their width of full evaluation based on evidence

provided in the tree. This adjustment can also be made to other algorithms like delay

splitting and ER.

We compared MWF and DMWF under EC sorting for a tic-tac-toe tree with a 4×4

board. Figure 13 shows the comparison of the total time (including communication and

idle time) between MWF and DMWF. DMWF is about one-third faster than MWF. This

speed is achieved with even less sorting time because it generates fewer nodes, as

demonstrated in Figure 14 (number of nodes are in multiples of 1000). There is also an

improvement in the efﬁciency (Figure 15).

Game-tree search 18

TotalTime(DMWF)

SortTime(DMWF)

TotalTime(MWF)

SortTime(MWF)

1 3 5 7 9 11 13 15 17 19 21

0.00

50.00

100.00

150.00

200.00

250.00

300.00

350.00

Figure 13: Time vs. number of machines

MWF

DMWF

1 3 5 7 9 11 13 15 17 19 21

0.00

25.00

50.00

75.00

100.00

125.00

150.00

175.00

200.00

225.00

250.00

Figure 14: Nodes examined vs. number of machines

Game-tree search 19

MWF

DMWF

1 3 5 7 9 11 13 15 17 19 21

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Figure 15: Efﬁciency vs number of machines

We also compared MWF and DMWF under EC sorting for Othello with a 6×6

board. Since Othello is a more complicated game than tic-tac-toe, the sorting depth

parameter was set to 6 levels, as mentioned above. The results of this experiment, shown

in Figures 16, 17 and 18, are similar to the results from tic-tac-toe.

Game-tree search 20

TotalTime(DMWF)

SortTime(DMWF)

TotalTime(MWF)

SortTime(MWF)

1 3 5 7 9 11 13 15 17 19 21

0.00

100.00

200.00

300.00

400.00

500.00

600.00

700.00

800.00

900.00

1000.00

1100.00

1200.00

1300.00

1400.00

1500.00

Figure 16: Time vs. number of machines

MWF

DMWF

1 3 5 7 9 11 13 15 17 19 21

0.00

50.00

100.00

150.00

200.00

250.00

300.00

350.00

400.00

450.00

500.00

Figure 17: Nodes examined vs. number of machines

Game-tree search 21

MWF

DMWF

1 3 5 7 9 11 13 15 17 19 21

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Figure 18: Efﬁciency vs number of machines

7. Future work

With enhancements made to DIB for achieving high efﬁciency in game tree search,

we have developed an environment in which we can test different algorithms in a con-

sistent way. These algorithms have been examined using a test suite of problems taken

from game trees for checkers, tic-tac-toe, and Othello. These games are coded indepen-

dently from the search algorithms, thus contributing to the consistency of the experiment.

We are currently testing these algorithms on a 26-processor Sequent Symmetry

machine. Our future plans include the use of a KSR multicomputer with 64 cpus to

examine the search algorithms with a larger number of processors.

We will also experiment with worst-case sorting, not because it would be used in

practice, but to see how each algorithm is sensitive to sorting.

Game-tree search 22

References

1. Raphael Finkel and Udi Manber, ‘‘DIB — A Distributed Implementation of Back-

tracking,’’ ACM Transactions on Programming Languages and Systems 9(2) pp.

235-256 (April 1987).

2. Vipin Kumar and V. Nageshwara Rao, Scalable Parallel Formation of Depth-First

Search.

3. Igor Steinberg and Marvin Solomon, Searching Game tree in Parallel.

4. John Philip Fishburn, ‘‘Analysis of speed up in Distributed Algorithms,’’ Ph.D.

Thesis, Department of Computer Science, University of Wisconsin-Madison

(1981).

5. D. V. Knuth and R. W. Moore, ‘‘An analysis of alpha-beta prunning,’’ Artiﬁcial

Intelligence 6 pp. 293-326 (1975).

6. Selim G. Akl, David T. Barnard, and Ralph J. Doran, ‘‘Desing, Analysis, and

Implementation of a Parallel Tree Search Algorithm,’’ IEEE PAMI-4(2)(March

1982).

7. R. A. Finkel and J. P. Fishburn, ‘‘Parallelism in alpha-beta search,’’ Artiﬁcial Intel-

ligence 19 pp. 89-106 (1982).

8. T. A. Marsland and M. Campbell, ‘‘Parallel Search of Strongly Ordered Game

Trees,’’ Computing Surveys 14(4) pp. 533-551 (December 1982).

ResearchGate has not been able to resolve any citations for this publication.

DIB - A Distributed Implementation of Backtracking.

Conference Paper

Full-text available

Jan 1985

DIB - A Distributed Implementation of Backtracking.

Article

Full-text available

Mar 1987

DIB is a general-purpose package that allows a wide range of applications such as recursive backtrack, branch and bound, and alpha-beta search to be implemented on a multicomputer. It is very easy to use. The application program needs to specify only the root of the recursion tree, the computation to be performed at each node, and how to generate children at each node. In addition, the application program may optionally specify how to synthesize values of tree nodes from their children's values and how to disseminate information (such as bounds) either globally or locally in the tree. DIB uses a distributed algorithm, transparent to the application programmer, that divides the problem into subproblems and dynamically allocates them to any number of (potentially nonhomogeneous) machines. This algorithm requires only minimal support from the distributed operating system. DIB can recover from failures of machines even if they are not detected. DIB currently runs on the Crystal multicomputer at the University of Wisconsin-Madison. Many applications have been implemented quite easily, including exhaustive traversal (N queens, knight's tour, negamax tree evaluation), branch and bound (traveling salesman) and alpha-beta search (the game of NIM). Speedup is excellent for exhaustive traversal and quite good for branch and bound.

Scalable Parallel Formulations of Depth-First Search

Article

Jan 1990

This paper presents a parallel formulation of depth-first search. To study its effectiveness we have implemented it to solve the 15-puzzle problem on a variety of commercially available multiprocessors. We are able to achieve fairly linear speedup on these multiprocessors for as many as 128 processors (the maximum configurations available to us). At the heart of this parallel formulation is a work-distribution scheme that divides the work dynamically among different processors. The effectiveness of the parallel formulation is strongly influenced by the work-distribution scheme and the target architecture. We introduce the concept of isoeffciency function to characterize the scalability of different architectures and work-distribution schemes. The isoefficiency analysis of previously known work-distribution schemes motivated the design of substantially improved schemes for ring and shared-memory architectures. The analysis shows that our parallel formulation of DFS can provide near linear speedup on very large parallel architectures.

Analysis of speedup in distributed algorithms

Article

John P. Fishburn

An abstract is not available.

Design, Analysis, and Implementation of a Parallel Tree Search Algorithm

Article

Apr 1982

The alpha-beta algorithm for searching decision trees is adapted to allow parallel activity in different parts of a tree during the search. The algorithm has been implemented in a procedural simulation language (GASP IV). The simulation environment provides the illusion of multiple software processes and multiple hardware processors. A number of preliminary experiments have been done to gather statistics on run time, nodes scored, and nodes visited. Results indicate that a substantial reduction in time of search occurs because of the use of parallelism. An analytic expression for the storage requirements of the algorithm is derived. The analysis provides an example of the classical tradeoff between time and storage.

An analysis of alpha-beta pruning

Article

Dec 1975
ARTIF INTELL

The alpha-beta technique for searching game trees is analyzed, in an attempt to provide some insight into its behavior. The first portion of this paper is an expository presentation of the method together with a proof of its correctness and a historical discussion. The alpha-beta procedure is shown to be optimal in a certain sense, and bounds are obtained for its running time with various kinds of random data.

Parallelism in Alpha-Beta Search.

Article

Sep 1982
ARTIF INTELL

We present a distributed algorithm for implementing α-β search on a tree of processors. Each processor is an independent computer with its own memory and is connected by communication lines to each of its nearest neighbors. Measurements of the algorithm's performance on the Arachne distributed operating system are presented. A theoretical model is developed that predicts at least order of speedup with k processors.

Searching Game Trees in Parallel *

Article

Jun 1992

We present a new parallel game-tree search algorithm. Our approach classifies a processor's available work as either mandatory (necessary for the solution) or speculative (may be necessary for the solution). Due to the nature of parallel game tree search, it is not possible to keep all processors busy with mandatory work. Our algorithm ER allows potential speculative work to be dynamically ordered, thereby reducing starvation without incurring an equivalent increase in speculative loss. Measurements of ER's performance on both random trees and trees from an actual game show that at least 16 processors can be applied profitably to a single search. These results contrast with previously published studies, which report a rapid drop-off of efficiency as the number of processors increases. 1. Introduction Game playing programs require a great deal of computation and would thus appear to be an ideal candidate for parallel implementation. In fact, current champion chess-playing programs gene...

A comparison of some parallel game-tree search algorithms (Revised version)

Abstract and Figures

Recommended publications

Computing the Subgame Perfect Nash Equilibriums in Parallel Allocation Indivisible Items

Parallelism in Alpha-Beta Search.

Parallel Flips on Planar Triangulations

コンピュータ将棋における高精度キラーヒューリスティックの研究