ArticlePDF Available

Quicksort for Equal Keys

April 1985
IEEE Transactions on Computers 34:362-367

April 1985
34:362-367

Source
DBLP

Authors:

Universität Kassel

When sorting a multiset of N elements with n < N distinct values, considerable savings can be obtained from an algorithm which sorts in time O(N log n) rather than O(N log N). In a previous paper, two Quicksort derivatives operating on linked lists were introduced which are stable, i.e., maintain the relative order of equal keys, and which achieve the previously unattainable lower bound in partition exchange sorting. Here, six more algorithms are presented which are unstable but operate on arrays. One of the algorithms is also designed for a speedup on presorted input. The algorithms are analyzed and compared to potential contenders in multiset sorting.

Content uploaded by Lutz Wegner

Content may be subject to copyright.

IEEE |RANS,\C-TIONS ON COUPUTIRS, VOL. C-3.1. NO. 4. ,\rRn_ l{)8-5

Correspondence

Quicksort fo. Equal Keys

LUTZ üT. WEGNER

Ärrlrddt -lvhen sorling a multiset of rV elementl with n < rV distinct

values, considerable savings can be obtained from an algorithm which

sorls in time O(N log n ) rather than O(N log N). In a previous paper, rwo

Quicksort derivalives operating on linked lists $ere introdüced lehich are

stable, i.e., mrinlain the relative order of equal kels, and which achie!e

the previously un:rttäinable lower bound in p:rrtition exchange sorting.

Here, six more l]lgorithms are presented which are unstable bu! operrte

on arrays. One of the algorithms is also designed for a speedup on pre-

sorted input. The algorithms are analyzed änd compared to potential

contenders in multiset sorting.

Inder Terns -A,nalysis of algorithms, arrays, multisets, presorted

files, Quicksort, sorting.

I. SNtoorH, STABLE, /v sfiu SoRTt\c

Consider an accounts rcceivable fiie which is sorted nccording to

customers to show which invoices, respectivelv. how many, are

outstanding from each customer. If a sort can take advantage from

the muLtiset structure of the input. considerable savings wiil result.

Naturally, any such algorithm is even more useful if it combines the

speedup for true multisets with no sisni[icxnt penalty for the case of

distinct keys since no d priorl knowledge of the nature ol the data

is assumed.

ln the tbllorving, algorirhms will be called -r/noor,4 iithey can sort

N distinct keys in time O(rV log N). on the average, and sort N

equal keys in time O(N) with a smooth transirion in between. This

term was introduced in I2l with respect to presoned lists.

The second desirable property in sorting multisets is stability. A

sorting algorithm is cailed srnble [5] if .ecords with equal keys

retain their original relative order. For the example above, assume

that the file is originally in ascending o.der according to invoice

number. Stable sorting according to customer ID will then leave rhe

invoices per customer in ascending order, an often needed property_

The third property to be mentioned concerns the space require-

ments. We say a sort is ili ri4r (in piace) if only ,((log rV)r) bits of

memory space are Lrsed for its variables besides space to store rhe N

records. Thus, Quicksort, whose recursion stack can be limited to

logarithmic size by sorting the shorter sublile lirst. is i/i .rit, but the

mergesoft, in general, is not. We extend the definition to inciude N

link fields if the input is given as a linked lisr.

The concept of a stable iinked iist Quicksort was discovered

independently by Motzkin l7l and by the aurhor Il I ], It2], who atso

extended the concept to include smoothness, demonstraled that

there are two algorithms as principal alternatives, and analyzed

them. The first algorithm, cailed TRISORT, features the elimination

of all keys equal to the pivot element during the parririoning phase.

It is therefore lermed a three-way splr algorithm. Using bina.y

Pascal compadsons as underlying cost measure, TRIsoRt is showü to

Manuscript r€ceived Aprit 16, 198,1; revised Oc(ober li, 198,1. An earlier

version ol this correspondence was presented at the Sevenreenrh Annuai Con-

ference on Informarion Sciences and Syslems, Johns Hopkins Universily.

Baltimore, MD, March 1983.

The author was with the Universität Karlsrühe, KäJlsruhe, Wesr Cermany.

He is now with the Fachhochschule Fulda, FB Angewandte Informatik und

Mathematik, D-6400 Fulda, Wesr Cemany.

require, on the average, 3(N + M)H\4! - ,lN key compdrisons to

sort a random multiset with N elements where each elemcnt occurs

M times (,ry. denores the Jirh harmonic number, ryt = ln(Ä) + I ti)r

all [ ? l). The generai pertbrmance, where each of thc n valucs

occurs ri times for I 5 -rr 5 n,-rr *.rr + .. +.r^ = N. is eivcn

rs well and is shorvn ro epproach rhe lower bound in prrirrir-,n

exchange sorting within a constant ftctor.

The second algorithm. callcd LtNKsoRT. fe:rtüres lhe eliminltion

of tll unrry subfiles in a sort. This can be achieved by applying lhc

following strategy. In the beginning of rhe partitioning phrsc, rhc

sublile is scanned from lcft to right üntil a key is discovercd which

is unequal to the pivot elemenr. [f no such key is found, thc subfilc

is unary and need not be treated any furthe.. Otherwise, tll pre

viously scanned keys are considered ls a prefix of the Icit subiile

which contains all keys S the pivot element, and the scan is con-

tinued for a nvo-*'a:- split. The cmcial point here is that for Iinked

list representations, the cost from simple scanning to partitioning is

one extra key comparison per treated subfile. Using again binary

Pascal comparisons as underlying cost measure, LTNKSORT is then

shown to require, on the average, less than 2.tvHf,.!l + rV key com-

parisons to sort a random multiset with rV elements wherc each

element occurs M times.

Here, we continue the discussion of Quickso.t for equil keys by

investigating several possibilities for array representations. A sum-

mary of the properties of the algorithms mentioned in this paper is

given in Fig. l.

ll. SrroorH QutcKsoFrT DERtvAnvEs FoR ARRAys

The question here is whethe. rhere is any way ofchanging euick-

sort to fealure a three-way splir or the elimination of unarv subfiles

*hen:pplied ro lntls. Con5t(lering Sedsewick \ wornini rem.rrL.

with respect to the elimination of all equal keys, namely thüt..no

efficient method for doing so has yet been devised" [9,p. 2,14], ir

seemed ar first that there was little hope.

The obvious problem for a three-way split is where to store keys

which have been detecred as being equal to the pivot element. Since

we insist on an in slrr soiutiol and since we do not know the split

position beforehand, we must consider intermediate storare ofequal

keys in some designated part of the array. Here, four different ways

of keeping track of equal keys are explored, namely.

. keeping them in the head and tail part of the anay and swirch,

ing them afterwards into the middle:

. keeping them only in the head part of rhe anay;

. moving them as a growing block through the ai.ay until they

reach their proper posirion;

. keeping them scatrered in the upper'.half" of the array and

collecting them in a second sweep.

- In the.followins, we depict the general siruation during the split

from which the invariant for the algorithm may be drawn. Note that

fl is the pivor element, e. g., first or last key, swap(.t, r) exchanges

the values of a and b, and circ(a,ü, c) is a trianguiar exchangJol

values which must have the following semantics to work prope;1y in

the extreme cases below: aux := c; c := b: b .= o; n := arx.

HEAD & TArL-soRT: Form a prefix and postfix of equal keys by

scanning the file f.om rhe leti and from the right sropping on keys

= 11, respectively, 5 H. Keys equai to,y are switched inro middle

after the split position is known (invoives no key comparisons).

ulirsoRT: Sca[ rhe ille from left to right stopping on keys - H;

collect keys equal to Il at front end and switch them lnto middle

afte rrv ards -

0018 -93,10/85/0400-03 6250 l 00 0 1985 IEEE

IE€€ TRj\NS^CIIONS ON COMPI]TERS, VOL, C_J.I. NO. .1, .\tRtL lgg5 l6,l

stable in situ

al qori thft

straight merge

quicksort

partitlon merge

Cook and Ki ni

spli t-sort-merge

no pdrtia I ly

yes yes

I iiked list Quick

sort (rRIs0RT/

LIN(SORT yes pdrtiälly

Yes pa.tial ly

Fig. l. Prope(ies ofdiscussed soding algorirhms

condition

atil > H. alil < H

aLil > H. a[j] = H

a[i) = H, a[jl < H

alil = H. aljl - H

swap(diil,rltjl): i :: j + t:j :=j - t

circ(dtji.z?[.r + l].alll):/ .- i + t..j :=j - lr.r

ci(r.rlrl.rit - l'.ollir, i .. t t:j .J - 1...l

swap(d[-r + l],d[i]);.r :=.r + li/ = i + I

lwap(a[) - ll,dtj]).] j= v - lij :=j _ r

condition

atil = H

a[i] > H

Fig. 3. Partitioning for uNlsoRT.

Fig. 2. Panirioning for HE,\D&TAtL-soR-r.

condition

ali) < H

a[i] > H. aljl < H

atil > H. alj) - H,t,üt.rt,!lr l/,.r :- _r r li I :- I

til,nUl);i := i + l.j :=j I

sLrDEsoRT: Scan the file from the left stoppins on keys + ,ry; then

scan the file from the right until a partner S H is found: keys equal

to Fl are afterwards in proper position.

DOUBLESoRT: Scan the file from the left stopping on keys = A

and from the righr stopping or keys <,y ; alier pointeri meer.

partition righr subfile again with the same method sropping rhis rime

on kevs + H and: H.

^ More sorts mav be creJted b1 combrnine .ome of the strJtesres

fromabore We nole thJt.rll iour algorirhm. u:e rhe.eme kei to

pcrtltron the llie- ."y rhe lir.r or median irom iirqt, mrddle rno ltsr

key, and all elimin;te all equal keys. Therefore. rhey achieve rhe

same spLit and may be compared on the basis of key comparisons

and/or exchanges per partitioning phase (see Section lll).

Here $ e close with .r rwo. ,Jy ,plir Jlgorithm. called L )l\RysORT.

whlch ellmrnales un:rr) \ubliles. The ba.ic ideJ is rhe,ame its lor

LINKSORT and the reader should have no trouble following through

the algorithm below (Fig. 6). As prelude to the following analysis,

we note that UN_ARYSORT eliminates exactly one key from"the recur,

sion Jnd need..V - 2 comp.rrr.ons to splii.V r I elemcnr. in.r lile

wh lch l\ not rnJl y. Note rl.o thar u e do * irhour tn ..eno ol_lile re.r.

in the iirsl r\ hile-roop whrch can cruse redundrnr le) comprri:un.

but.houid. in general. be more efficrenr rh.ln a re.t or rhe rn.errion

of a sentinel.

llL ANALysts oF SrrooTH euicKsoRT DERIVATIvES FoR ARRAys

We qt.rrr_$rth the cu.tomJr) cosl mcüsure, ncmely key cun_

pln5oni. lt ge d55ume three-$d) comptri:Ons r\llX j. lhen rhe

algofithm\ HE^D&TAtL-SORT. U),JISOR I, and SLIDESORT need N com_

parisons to partition N elements and eliminate all keys equal to the

pivo{ elemen(. Sorting ilgorirhm,. like eurch,ort. $nicn use thir

panluonrng to reduce .r glven lile ro t\\o or three subfil"s !Lhlch crn

Fig. .1. Panitioning lor slrDEsoRT

36+ IEEE rR^NSACTIONS ON COIIPUTERS. vol-. C-1.1. fto. .1. ^r'R lqtl5

L Scan abovc. 2- Scan below

condition

alil1 H. a[j] < H

aL(l * H, a[\) = Hsw.rp(.rttl,dtjl)l i:=l + I;j r=/ - I {ijrsr scan}

y3!qql!{-vl):.. :=.r + l:.r := r'- I {second scan}

p!öcedule UNARYSoRT (1, r: tntege!) I

{solt reculsivery an ärray dtr,.rl ellntnarr.s u.d.y subfttes

but exMlntnq €ach k€y exacrly once except !o! one key,

assue3 älN+1I > atlt fo. the ortglnal flre !^ att..NJl

w l, j: lncegelj

H, aux: keytype {el4ehls to be sorted conslst of keys only),

bEqh {INARYSORT }

H :- alllr 1 :- I + 1,

{a unä.y subfll€? }

!i!-l-l-g ä(11 'lr do ! :- 1+ r'

{.t.r'l ri H)

If1<!{notu.a!y}

!!-e!

beq!n {p.!tlrlon}

1:-1-1, l:-.+l;

!!1&i.j99

lepeat 1 :, ! + r unrit atll, Il

lepear j ,= j -rsg! a(il : H,

5'ap (ät11, atjl)

end {rhire },

{511tch bacr atiJ, at11, atjl)

circ tat i l, atU, atjt);

1f r < j - l !!9! u aRysoRr {1, j - 1),

!l I < ! !!S! uNÄRysom (1, !)

csl {tiien)

end IuNARlsoRr i;

Fig. 6. uNARysoRT-Quicksod wirh elimination of unary subfiles_

HEA.D&IArL-SORT v :[i] > H r. the lef! ro lighr scan

v aljl < s iD the liqht to left scan

UNISORT

SLIDESORT

V a{ll > ll in lhe left !o right scan

v alil I H in the left lo iighr scan

v a(il < H in the riqht to left scan

oouBLEsoRT v atil I H resuttiDg froF rhe seconai scan.

Fig. 7. Exra key comparisons obse!.r'ed fo. three way split.

be.orted recu15ively :rre Iermed prnitron-exchange so(s. Now the

following Iemma lollow s trrviaily.

Lemna J-l: If three-way comparisons are the underiying cost

measure, HEAD&TAIL-SoRT, uNIsoRT, and sLIoEsoRT are lower

bound pilrtition-exchange softs.

. .Proof.: The position ofeach key examined during the scan can

be determined by exactly one key comparison. The rJsult then fol_

lows fre6 5s6t"*1.k's lower bound theorem [9]. o

The lower bound result does not hold for the fourth alternative

(DouBLEsoRT) because the final pos ition of keys = H is determined

in a separote, second scan. If binary comparisons are the cost

measure, the analysis becomes more interesting. The central ques_

tlon nere is: Which keys Jre compared t*ice.,' fig. I giue, a

partral rnswet.

Assumingarandom inputof themultiset{,tr . 1,,..,_r" . nlwith

.rr + ' + -r" = 1y', we cao state the following result about the

average number of binan key compari:ons.

Theorem J.2: fo splir d random multiset wirh.r1 I ... i .1, =

N elements, on the average. uNrsoRT and DouBLEsoRT require

s -i- ,..., -,

r=i=,

key comparisons. HEAD&TAIL-SoRT requires Iess than

N + -l- t .l

N,+"

-; ) Zt.'.r, + -.r.-rrr.r... - .-.("1/(,V - r.

/v =,="-:

key comparisons. tnd 5LIDESORT require\

N ' 1,/ t ,.) " t 1.,, . - - .r.)rr.r, - ..- r")

+ 2(.r1 + ... i -r,))

key comparisons.

Proof: The first value is straighrforward. For N = n (N dis_

N +1

Fig. 5. Pilnitioning in two scans for DouBLEsoRr

o! co\rp! rERs. \nL. c-J{, N'). ,1, \tRtL l'lb5

tinct valucs) wc obtain N + (N + l)/2. For the sccond value,

which is only tn uppcr bound, we assumc that each key > H tbund

in thc lclt lo right scxn is exchanged against a key < H in the .ight

to lcti scirn :rnd thut.r; pivot keys account for.!i extra comparisons.

Tlris is not tme bccause keys > H, respcctivcly, kcys < H, can be

exchangcd rgirinst keys = H. Howevcr, the assumption gives a

relsonably close lpproximirtion and leads to the following lormuia:

kc,or,(.r',".rJ:N -l: (.;*, ) :,.p,,')

.v,:.'

where

p'(t)

(r,-. -.\.t t .r - ... - .'.., t /l N - r, \

-\ , /1.,, ...._.(,_, _,)/\r,_...-.,,/

is the probability that r keys > fl are encountered in the lef! to right

scan. Applying Vandermonde's convolution leads to the stated re-

sult. For N = n the bound is sharp and yields 4N/3.

In the case of sLtDEsoRT. we note that all keys + H in the left to

right scan and all keys 5 ,1 in the right to lelt scan require extra

comparisons. Since the number of keys > H on the left equiLls the

number of keys = .FI on the right, the total number of binary com-

parisons is N piüs the number of keys ( fI plus twice the number

ofkeys > Il in the lefr,!, + . . + -r, positions. From

",i ),"(".-.. ) . , r,,,, -"_,,,.?. ,.-:r'prrr)

with p(s), cespectivelv, p,(t), similar to above, the formula is de-

rived using again Vandermonde's convolution. For N = n we ob-

tain here 5N/3 - 2/3N which is ef,siiy checked using Sedservick's

well-known result that, on the average, (N - 2)/3 keys are out of

order which brrngs us to N + N/2 + lV/6: l0N/6 key com-

parisons ignoring the small additive constants. tr

This leaves ui with the following ranking for N : ri:

HEAD&TAIL-soRT uNrsoRT/DouBLEsoRT sLTDESoRT

Iess rhan 8N/b 9N/o lON b

From the general formüla above, the special ciLses for -r, = M

tl ! i l,nrrnda = I. i.e.. forJcon\tsntrepetrtron faclor'Lnd lor

unary files. can be denved whi.h yield no new insrght' our conlirm

the observation that uNrsoRT, DoUBLESoRT, and HEAD&TAIL,soRT

need 2N comparisons and 5LIDESORT only N comparisons ro sort"

N equal keys.

Next, we briefly glance at the number of exchanges needed in

each split, where we assign the triangular exchanges the same cost

as swaps even though they cost l/3 more.

Theorem 3.3: The average number of pairwise and triangular

exchanges inclüding the movemenr of pivot keys into the middle in

each partition phase for the multiset {r, 1,...,r" . n} is

5 l/2 the number of extra key comparisons + Zr,

for the HEAD&TArr--soRT,

fo. uNrsoRl,

,u,l ]=,tf' -, + -+ r^) ((.r, + + .!,) + (-r, + + .!,))

ior SLTDESoRT, and

''' + .rr-r) (,r, + ... + x" - l)

.; ) r,(rr + +,r,)

165

Proof Similar to Theo.cm 3.2. o

For the intcrcsting casc N = ll we nolv hilve =(N/6) ibr HEr\D&-

TAIL-SoRT and DoUBLESoRI. but -(N/2) fo. UNIsoRT nnd sLTDESORTI

It therefore scems that the HE,\D&T^tL-soRT is thc most tttrüctivc

candidatc for its lowest number of comparisons and rcasonublc

number ol exchungcs. What the exlra cost of the control of thc

swapping of the head and tiil pivot keys into thc middle conccrns -

implementcd, e.g., by two for loops-we should be wilrned th it

can be substantial fo( the large number of smoll subfiles to!vxrds thc

"end" of the sort.

In all likelihood, DoUBLEsoRT is the most practical cirndiditc. As

a pleasant side effect we also know the exact numbcr of kcy cont-

pirisons needed to sort a random permutation with oouaissonr

Corollary 3.1: To sort a random permutttion of a multisct in an

array, DoUBLESoRT requires, on the average, the same numbcr ol kcv

compJnson\:Is TRISORT in a Iinked li:t.

For the two-way spLit algorirhm uN,\RysoRT we observcd,V + l

comparisons to split an arbit.ary mLlltiset file of1y' kcys. Thc number

of exchanges here is easily scen to be the same as for DouBLEsoRT

minus the exchanges in the second scan. Moreover, the rcsulting

split happens to be the same as for LINKSORT [12] and therelore a

result similar to the one ior DouBLEsoRT holds.

Corollary 3.5: To sort a random permutation of a multisct in tn

array, UNARYSORT requires, on the avcrage. the same numbcr ol kc1

comparisons as LINKSoRT in a linked list.

Thus, DOUBLESORT needs 3(N + M)H. - 41y' binary key com-

parisons and UNARysoRT needs less than 2NH, , + N binary kcy

comparisons to soat a random multisel of size ly' with const:lnt

repetition factorM : N/n.

In conclusion. the newly introduced smooth Quicksort three-way

split algorithms for arrays achieve the lower bound in partition

exchange sorring. Due to their binary comparisons implementation.

however, they are -just like their linked lis t relatives - inferior to

their two-way split competitor by a considerable marsin.

lV. QurcKSoRT FoR NEARLy SoRTED FrLEs

We close the presentation of new algorithms with a sort which

smoothl)/ handles presorted input, where the majority of keys is in

their proper place and the others are permuted at random. The

algorithm is called RUNSoRT for its ability to recognize runs. i.e..

sequences of keys in ascending (descending) order_ It was derived

from L:NARysoRT, i.e., it hopes for a run and in abandoning thar hope

performs a two-way split.

The finer point here is that we may not simpLy scan rhe subfile

from left to right until we find a pair ofkeys (a[i],dli - ll) for

which a[i] > ali + ll because ar rhat poinr neither d[i] nor

a[i + l] will do as a pivot element unless we backtrack.

As can be seen from the while,loop in Fig.8, the suggested

soiution is to scan the file from the leit and from the right with 3 key

comparisons and I index comparison tor each pair ofkeys. Thus, in

the prelude to an actual split, 3N'/2 key comparisons and N'/2

index comparisons are performed forN' keys which, taken withou!

the middle piece, rvould form a run. The prelude is follorved bv an

ordinary split for which we chose a so-called fictitious pivor element

computed from (d {i l + aljl div 2 (see [l2] for a discussion ol rhis

rder,. As Ihr trming re.ulrs of lhe nex.t \ecrion \how thrs is nur

a really fast Quicksort version and we do not explore this idea

any further.

V. How S\1ooTH IS SNIOOTH?

For all but the last algorithm we have given the result of their

multiset analysis or compared them analytically on the bnsis of

key comparisons and key exchanges per partitioning phase. For

Dijkstra's previously mentioned smoothsort, Hertel [3] has shown

that the sort is opiimal to within a constant factor both tbr totally

sorted sequences and for random sequences but not very smooth

+ .I. ! 'i for DOUBLESORT

N ,.-*"

IEEE tRi\NSr\CIIONS ON CO\lptTERS. VOt_. c_J{, No. 4, ^r,Rlt. lgll_5

procedure RUNSORT(I, r: thteger) ;

var l.J:lnleger, H: keyrype,

Frocdure rrap(var a, b! keytype),

lg*E (RuNsoRa )

4i rt11 > dlrj qs! sHap(atlt, atrl),

1 :- l, I :, r,

4ffs (1+r < J) 9gl (a(11 : ai1+lt) and

(alj-rt : atil) grg (at1+1 J : a{i_jl) 99

beqln

I :'l + r, j ': j - I

9!q;

1!1.J-l{nora!un}

then

&g1! (parrltloni

F :- (ai1t + atjt) 4E 2;

IS!93! !,= r - 1 unrrl atil : r;

repeat j := j - 1 glgl atit : s;

1-€ 1. r !-!s! s*aP(arU, atjJ)

unr1l 1 > i.

fl 1 . r - I ther RrjNsoRTlt, 1_ t);

1-E J + r . r rhs puNsoRr{j + r, r)

end il < j-l )

3!q (RuNsoRr );

Fig. 8. RUNsoRr - Quickso.t for multisets aFd presoded files.

in between when measured against the number of inversions in a

:"e-luence:nd is inferior ro. e.8.. Vehthorn., "tgorirt. 1bl 'ln

tnat resnect

ns lör rhe Cook and Kim 3lgonrhm JIl. no anjl)rical re\ulr

r5 I.noun excepr the obvious facr rhcr it is a OrlV ioe Nr rime

algorithm.

_ Jo bring the dozen algorithms mentioned in this correspondence

rnto some perspective. a modest competrtion on a number ol multi.

sers ano presorted tlles rras arranged. For rhdt purpose all \ons \r.ere

coded by rhe Jurhor in pascai and run on r ringle user pDp ll. To

k€ep rhe-effort reasonable, rhe file size *"". ""if".*fy ,"i i"

N = 1000 and the avelage time in seconds from 20 random oer-

mutatjons fo. each type of input was reported.

The tlpe: of inp.rt which were considered dre random oer

mutalion5 of a multi\et wilh consrtnt repeUtion iacror .V = .'V ,

where the n distinct values are 1.2.. . . ,'l (note thar M = 500

denotes x bin3ry lile Jnd .t1 - 1000 r un.rry iitc for.v = 100ü,

nearly \oled,files of N inregers I.1......V rvrrh p p.r."nr oirh"

Ke)c femoved fiom the Jscending run. permured. ,rnd in"ened rnro

tne vacated poSluon\ ff = 0 denOtes .r completely .orreO filc). rnd

r sequence ol r\!o nrns of length r. respectively.,V _.."tar"rt"

r fl: ,1" randoml) picked from the ordered sequence.

. labie IspeJks for i(seif Jnd !re refrain fromcoÄmen!sotherthirn

that Quicksort proved again its name.

al gori thm

quicksort

(tertbook )

Q!icksort

(sedgeNi ck)

TRISORT

llflKs0RT

HEAO&TIIL

uIrs0Rt

5L]DESORT

DOUELTSORT

UNARYSORT

gDooth5ort

(Di lkstra)

Kim sort i

Rufisofi |

F'l lH.2 H=t0 HrlO0Hr5C0v=1000 P.0X p= 5l p= loX p= 20! p - 50! ry"

0.91

2.52

5.43

0.54

r.31

1.08

0.99

1.03

0.90

I .01 t -00 1.03 I .08 I .r)5

0.95 0.90 0.89 0.95 LO2

0.86 O.57 0.26 0-12 0.lo

0,85 0.66 0.38 0.18 0.06

0.50 0.51 0.54 0.62 0.82

1.38

1.96

1.44

r.23

2.41

6.75

I .41

r.3t

t.2r 0.87 0.51 0.31 o.?6

2.05 1.50 0.69 0.38 0.25

1.68 1.39 0.45 0.15 0.05

1.28 0.85 0.40 o.t7 0.10

l.l4 0.82 0.42 0.18 0.05

2.4O 2.40 2.25 1.43 0.64

6.75 6.64 5.62 2.80 0.9t

1.39 1.31 t.32 1.15 0.64

I.2S 1 .09 O.t4 0.42 o.r?

2.52 2.50 2.5A 2.48 2.45

0.96 L42 1.80 2.62 4.65

0.25 0.44 0.58 0.79 !.21

0.ll 0.s4 0.56 O.7g 0.94

rlEe TRANS,\C ONS ON COTVIPUTERS. VOL, C-1.1, NO. 4, .\PRIL l9s5

V[. CoNcLUSIoN

ln this correspondencc it is proven thit Quicksort cxn eft'cctively

be lurned into a smooth multiset sort which achieves thc previously

unaftain:rble lowcr bound in prrtilion exchlngc sorting. illorcovcr.

a two-way split algorithm with climination of unir.y subfiles and r

Quicksort version for presorted files are introduced. Ho$ever, un-

like their linked list relatives. none of the six sorts is stable. The

efficiedcy of the new algorithms is demonstrated in a run time

comparison which also involves four contenders from the literature

rnrl r$o it:rble linkcd lrrr Qurcksort vrrirnts.

ACKNOWLEDGTIENT

The author would like to thank one of the .eferees tbr pointing oul

a mistake in thc analysis and for several helpful comments.

REFERENcES

ll C.R. Cook and D.J. Kim, Besl so.ting aigorithm for neirly soned

lists,' Cdnmr?r. ACM, vol. ?3, pp. 620-624, Nov. 1980.

[2] E.W Dijkstra, "Smoothsort. An altemrlive for soning ifl situ. .sci.

Comput. Progranning, vol. I, pp. 223-233, 1982i see also. 8n'rta.

Sci. Conput. Progrunning, vol- 2, p. 85, 1982.

[3] S. HedeL. "Smoothso( s behavior on p.esoned sequences. tfrrn. Pr.r-

cessing Leu., vol. 16, no. 4, pp. 165-170, lvlay 198J.

I.ll C.A.R. Hoare, 'Quickso(. Conput. J.. vol.5, pp. 10-15, 1961.

I5l D. E. Knuth, The An of Cornputer Programning, Vol. 3: So ing atkl

Searciirg. Reading, MA: Addison-wesley, 1973.

f6l K. Mehlhorn, "Sorting presorted files,' in Theoretical Conputer

Science. 4th GI Conference, Lecture Notes in Conputer Scietrce,

K. Weihrauch. Ed. Berlin: Springer, 1979, vol. 67. pp. 199-212.

[7] D. lvtolzkin, "A slable quicksod,'Softvare Pract. tver., vol- ll,

pp. 607-611, June l98l.

[8] L. Trabb Pardo, 'Stable sol1ing and mergihg with oplimal space and lime

bounds,'S1ÄM l. Comput., vol.6. no.2. pp.351-172. June 1977.

[9] R. Sedgewick. *Quickson with equal keys," SldV J. Conput., vol. 6.

no.2, pp.2,10-267, June 1977.

uol -. 'lnplementing quicksort programs," Cammln. ÄCu vol 2l

no. 10,pp.8,17 857, 1978: see also, Corrigendum." Camnun. ACM.

vol. 21, no. 6, p. 368, 1979.

I t] L. Wegner, "So(ing a linked list wilh equal keys," lrfo',l. Prccessilg

,e0., vol. 15, no. 5, pp. 205-208, Dec. 1982.

ll2l -,

-The L'nksod family-Design and analysis of fast, stable Quick-

sort deriva!ives, dissertation. Inst- Angewdndte inibrm., Univ.

KarLsruhe. Karlsruhe, West Cermany, Rep. 123, Feb. 198i.

An Adaptive Method for Unknown Distributions in Distributive

Partitioned Sorting

PHILIP J. JANUS AND EDMUND A. LAMACNA

Ärslracl -Distributive Partitioned Sort lDPS) is a fast internal sorting

algorithm which runs in O(ll) expected time on unilorinly dist but€d data.

Unfortunately, tde method i3 biased td;ard such inputs, and its per-

formance worsens as the data become incressingly nonuniform. such a5

\rith highly skerved distributions. An adaptation of DPS, which estimates

l\4anus.npI reier!ed \,1ry L lo8-l: rc!rsed \dvember tr. t'r8-t I1,,. oo,L

was supponed in pah by Air Force Sysrllng Coinmend. Rome Air Dcvel-

opment Center. under Contract F3060?-79-C-012,1-

P J. ldnus *rr wrlh rhe Dep.rnmenl oi Cbnpu(er s. ence rno S,.rrr\r,c,.

L nr\er 'rr) or RhJJe I.l-no. Kinesron. Rl0:881 He rrnu*q.th neSult*are

Engineenng Facility, Digilai Equipment Corporailon. Nashua. NH 01062.

E. A. Lanragna is with the Depanment of Computer Scieoce and Statistics,

University of Rhode Island. Kingston, Rl 02881.

367

the cumullrtive dislrihution function of the input datr from a randomly

s€lected sample, wns derehped rnd tested. The method runs onl!

2-.1 percent slower than DPS in the uniform case, but outpcrforms

DPS by t2-ll percent on erponentially distributed data for sufficiently

lÄrge files.

lnder lems -Anrlysis of algorithms, cumulative distribution func-

tion, distributive partitioning, Quicksort, sorting.

[. INTRoDUCTIoN

Distributive Partitioned So.t, or DPS, is an internal sorting tech-

nique described in a paper by Dobosiewicz I I l. Empirical evidence

presented there suggests that the algorithm runs considerably faster

rhan several cornmoniy used sorts including Quicksort [21. []1. Thc

expected running time of DPS on uniform data is O(n) rvhere l is the

number of items ro be sorted. This is possible because the proccdure

is not based upon key comparisons like most conventional so.ts

(e.g., Quicksort, Heapsort, Bubblesort). A comparison based sort

is not crpable of O(n) behavior. Furthermore, DPS s worst case

running time is O(n log z), which is no worse than lhat of any

comparison based method.

Untbrtunately, DPS is inherently biased toward uoiformly dis-

tributed datn. Its running time degrades on highly skerved inputs.

such as an exponentiaL distribution. The primary goal of this corre-

spondence is to develop an adaptation of DPS which will sort any

unknown distribution equally well, yet whose run time remains

competitive wi!h the.t of the o.iginal algorithm in the uniform case.

II. DISTRIBUTIVE PARTITIoNED SORTING (DPS)

DPS is a distributive soft resembling Knuth's lvlultiple List In-

sertion [4] in which the number of partitions, or buckets, created

equals the number of items to be sorted. DPS sorts 4 items as

foLlows.

SELECT: Determine the maximum. minimum. and median ele-

ments. all of which can be found in O(n) time [5]-[7].

PARTITIoN: Us'.ng these valües, divide the range of data between

the maximum and minimum ioto , buckets rvith r/2 equal length

intervals on one side of the median, and another n/2 equai intcrvals

on the other side.

DISTRIBUTET For each item, determine to which bucket it belongs.

RECURSE: For each bucket with more than one item, sort the items

in that pariition using DPS.

It is easy to see tha! in the best case this algorithm is O(n). For

equally spaced data, each bucket contains one ilem after the first

pass thereby producing the sorted set. In his original paper. Dob-

osiewicz shorved that the algorithm is O(n) on the average for

uniform distributions I ll.

The worse case occurs when for each n/2 buckets divided by the

median, all elements but one are distributed into the same bucke!.

The time I(n) is then governed by the recurrence

TG) : ,n + 2T(n/2), I(l) : co

T(n) : ,n log;n + O(n)

which is O(n log n).

There are many problems with DPS as originally pubLished in [1 ].

These range from lundamentaldifficulties with the algorithm itself,

to large time and storage overheads, to lheoretically bad worst case

running times. These problems are detailed in [8i, on which this

correrpondence i. based. fhe most imponanl improrements are

discussed below.

One aspect of DPS which contributes substantially to its running

time is the median location stage. Although the median of n items

0018-9340/85/0400-0367501 .00 0 1985 IEEE

Nearly-Optimal Mergesorts: Fast, Practical Sorting Methods That Optimally Adapt to Existing Runs

Preprint

Full-text available

May 2018

We present two stable mergesort variants, "peeksort" and "powersort", that exploit existing runs and find nearly-optimal merging orders with practically negligible overhead. Previous methods either require substantial effort for determining the merging order (Takaoka 2009; Barbay & Navarro 2013) or do not have a constant-factor optimal worst-case guarantee (Peters 2001; Auger, Nicaud & Pivoteau 2015; Buss & Knop 2018). We demonstrate that our methods are competitive in terms of running time with state-of-the-art implementations of stable sorting methods.

Bibliography

Chapter

Oct 2011

Hosam M. Mahmoud

Sortieralgorithmen

Chapter

Jan 1992

Ralf Hartmut Güting

Das Sortieren einer Menge von Werten über einem geordneten Wertebereich (z.B. int,real, string), das heißt, die Berechnung einer geordneten Folge aus einer ungeordneten Folge dieser Werte, ist ein zentrales und intensiv studiertes algorithmisches Problem. Sortieralgorithmen haben viele direkte Anwendungen in der Praxis, finden aber auch häufig Einsatz als Teilschritte in Algorithmen, die ganz andere Probleme lösen. So betrachtet z.B. der Algorithmus Kruskal aus Abschnitt 5.2 Kanten in der Reihenfolge aufsteigender Kosten; implizit haben wir dazu bereits einen Sortieralgorithmus eingesetzt, nämlich Heapsort. Auch für die Plane-Sweep- und Divide-and-Conquer-Algorithmen im nächsten Kapitel ist Sortieren eine wesentliche Voraussetzung.

A Statistical Peek into Average Case Complexity

Article

Full-text available

Dec 2013

The present paper gives a statistical adventure towards exploring the average case complexity behavior of computer algorithms. Rather than following the traditional count based analytical (pen and paper) approach, we instead talk in terms of the weight based analysis that permits mixing of distinct operations into a conceptual bound called the statistical bound and its empirical estimate, the so called "empirical O". Based on careful analysis of the results obtained, we have introduced two new conjectures in the domain of algorithmic analysis. The analytical way of average case analysis falls flat when it comes to a data model for which the expectation does not exist (e.g. Cauchy distribution for continuous input data and certain discrete distribution inputs as those studied in the paper). The empirical side of our approach, with a thrust in computer experiments and applied statistics in its paradigm, lends a helping hand by complimenting and supplementing its theoretical counterpart. Computer science is or at least has aspects of an experimental science as well, and hence hopefully, our statistical findings will be equally recognized among theoretical scientists as well.

Revisiting Aggregation for Data Intensive Applications: A Performance Study

Article

Oct 2013

Aggregation has been an important operation since the early days of relational databases. Today's Big Data applications bring further challenges when processing aggregation queries, demanding adaptive aggregation algorithms that can process large volumes of data relative to a potentially limited memory budget (especially in multiuser settings). Despite its importance, the design and evaluation of aggregation algorithms has not received the same attention that other basic operators, such as joins, have received in the literature. As a result, when considering which aggregation algorithm(s) to implement in a new parallel Big Data processing platform (AsterixDB), we faced a lack of "off the shelf" answers that we could simply read about and then implement based on prior performance studies. In this paper we revisit the engineering of efficient local aggregation algorithms for use in Big Data platforms. We discuss the salient implementation details of several candidate algorithms and present an in-depth experimental performance study to guide future Big Data engine developers. We show that the efficient implementation of the aggregation operator for a Big Data platform is non-trivial and that many factors, including memory usage, spilling strategy, and I/O and CPU cost, should be considered. Further, we introduce precise cost models that can help in choosing an appropriate algorithm based on input parameters including memory budget, grouping key cardinality, and data skew.

Average Case Analysis of Java 7's Dual Pivot Quicksort

Conference Paper

Full-text available

Sep 2012

Recently, a new Quicksort variant due to Yaroslavskiy was chosen as standard sorting method for Oracle's Java 7 runtime library. The decision for the change was based on empirical studies showing that on average, the new algorithm is faster than the formerly used classic Quicksort. Surprisingly, the improvement was achieved by using a dual pivot approach, an idea that was considered not promising by several theoretical studies in the past. In this paper, we identify the reason for this unexpected success. Moreover, we present the first precise average case analysis of the new algorithm showing e. g. that a random permutation of length n is sorted using 1.9n ln n − 2.46n + O(ln n) key comparisons and 0.6n ln n + 0.08n + O(ln n) swaps.

Quicksort for Equal Keys

Article

May 1985

Lutz Wegner

When sorting a multiset of N elements with n ≪ N distinct values, considerable savings can be obtained from an algorithm which sorts in time O(N log n) rather than O(N log N). In a previous paper, two Quicksort derivatives operating on linked lists were introduced which are stable, i.e., maintain the relative order of equal keys, and which achieve the previously unattainable lower bound in partition exchange sorting. Here, six more algorithms are presented which are unstable but operate on arrays. One of the algorithms is also designed for a speedup on presorted input. The algorithms are analyzed and compared to potential contenders in multiset sorting.

Duplicate Detection and Deletion in the Extended NF² Data Model.

Conference Paper

Full-text available

Jun 1989

A current research topic in the area of relational databases is the design of systems based on the Non First Normal Form (NF2) data model. One particular development, the so-called extended NF2 data model, even permits structured values like lists and tuples to be included as attributes in relations. It is thus well suited to represent complex objects for non-standard database applications. A DBMS which uses this model, called the Advanced Information Management Prototype, is currently being implemen- ted at the IBM Heidelberg Scientific Center. In this paper we examine the problem of detec ting and deleting duplicates within this data model. Several alternative approaches are evaluated and a new me- thod, based on sorting complex objects, is proposed, which is both time- and space-efficient.

Sorting Algorithms in MOQA

Article

Full-text available

Jan 2009
Electron Notes Theor Comput Sci

A high-level overview of the MOQA programming language, which is based on the notion of randomness preservation, is described here. The representation of a series-parallel partial order data structure in this lan- guage is shown, along with some of the randomness preserving functions allowed upon it, which capture the required calculus for obtaining the average case timing of algorithms statically. The algorithm quicksort in MOQA is presented and its average case time is critically compared against the average case time of standard quicksort.

Sorting Presorted Files.

Conference Paper

Full-text available

Jan 1979

Kurt Mehlhorn

A new sorting algorithm is presented. Its running time is O(n(1+log(F/n)) where F=|{(i,j); iij}| is the total number of inversions in the input sequence xn xn–1 xn–2 ... x2 x1. In other words, presorted sequences are sorted quickly, and completely unsorted sequences are sorted in O(n log n) steps. Note that F2/2 always. Furthermore, the constant of proportionality is fairly small and hence the sorting method is competitive with existing methods.

Sorting Presorted Files

Article

Full-text available

Jan 1978

Internal sorting methods illustrated with pl/1 programs

Article

R. P. Rich

The Design and Analysis Computer Algorithms

Book

Jan 1974

Some Combinatorial Properties of Certain Trees With Applications to Searching and Sorting

Article

Jan 1962

Thomas N. Hibbard

Algorithm 63: Partition

Article

Jul 1961

C. A. R. Hoare

Many algebraic translators provide the programmer with a limited ability to allocate storage. Of course one of the most desirable features of these translators is the extent to which they remove the burden of storage allocation from the programmer. Nevertheless, ...

Implementing Quicksort Programs

Article

Oct 1978

Robert Sedgewick

This paper is a practical study of how to implement the Quicksort sorting algorithm and its best variants on real computers, including how to apply various code optimization techniques. A detailed implementation combining the most effective improvements to Quicksort is given, along with a discussion of how to implement it in assembly language. Analytic results describing the performance of the programs are summarized. A variety of special situations are considered from a practical standpoint to illustrate Quicksort's wide applicability as an internal sorting method which requires negligible extra storage.

Sort of a section of the elements of an array by determining the rank of each element (Algorithm 25)

Jan 1967
308-310

J Boothroyd

J. BOOTHROYD, Sort of a section of the elements of an array by determining the rank of each element (Algorithm 25), Computer J., 10 (1967), pp. 308-310. (See notes by R. Scowen, Computer J., 12 (1969), pp. 408-409, and by A. Woodall, Computer J., 13 (1970), pp. 295-296.)

A fast stable minimum-storage sorting algorithm

Jan 1973

R Rivest

R. RIVEST, A fast stable minimum-storage sorting algorithm, Institut de Recherche d'Informa-tique et d'Automatique Rapport 43, 1973.

Stable sol1ing and mergihg with oplimal space and lime bounds

Jun 1977
351-172

Trabb Pardo

L. Trabb Pardo, 'Stable sol1ing and mergihg with oplimal space and lime bounds,'S1ÄM l. Comput., vol.6. no.2. pp.351-172. June 1977.

Quicksort for Equal Keys

Abstract

Recommended publications

Quicksort for Equal Keys

Sorting a distributed file in a network

Sorting a distributed file in a network

Sorting a Linked List with Equal Keys.