Skip site navigation (1) Skip section navigation (2)

Re: qsort again (was Re: [PERFORM] Strange Create

From: Ron <rjpeace(at)earthlink(dot)net>
To: "Dann Corbit" <DCorbit(at)connx(dot)com>,<pgsql-performance(at)postgresql(dot)org>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: qsort again (was Re: [PERFORM] Strange Create
Date: 2006-02-18 17:01:10
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackerspgsql-performance
At 08:37 PM 2/15/2006, Dann Corbit wrote:

>Adding some randomness to the selection of the pivot is a known 
>technique to fix the oddball partitions problem.

True, but it makes QuickSort slower than say MergeSort because of the 
expense of the PRNG being called ~O(lgN) times during a sort.

>However, Bentley and Sedgewick proved that every quick sort 
>algorithm has some input set that makes it go quadratic

Yep.  OTOH, that input set can be so specific and so unusual as to 
require astronomically unlikely bad luck or hostile hacking in order 
for it to actually occur.

>  (hence the recent popularity of introspective sort, which switches 
> to heapsort if quadratic behavior is detected.  The C++ template I 
> submitted was an example of introspective sort, but PostgreSQL does 
> not use C++ so it was not helpful).
...and there are other QuickSort+Other hybrids that address the issue 
as well.  MergeSort, RadixExchangeSort, and BucketSort all come to 
mind.  See Gonnet and Baeza-Yates, etc.

>Here are some cases known to make qsort go quadratic:
>1. Data already sorted

Only if one element is used to choose the pivot; _and_ only if the 
pivot is the first or last element of each pass.
Even just always using the middle element as the pivot avoids this 
problem.  See Sedgewick or Knuth.

>2. Data reverse sorted

Ditto above.

>3. Data organ-pipe sorted or ramp

Not sure what this means?  Regardless, median of n partitioning that 
includes samples from each of the 1st 1/3, 2nd 1/3, and final 3rd of 
the data is usually enough to guarantee O(NlgN) behavior unless the 
_specific_ distribution known to be pessimal to that sampling 
algorithm is encountered.  The only times I've ever seen it ITRW was 
as a result of hostile activity: purposely arranging the data in such 
a manner is essentially a DoS attack.

>4. Almost all data of the same value

Well known fixes to inner loop available to avoid this problem.

>There are probably other cases.  Randomizing the pivot helps some, 
>as does check for in-order or reverse order partitions.
Randomizing the choice of pivot essentially guarantees O(NlgN) 
behavior no matter what the distribution of the data at the price of 
increasing the cost of each pass by a constant factor (the generation 
of a random number or numbers).

In sum, QuickSort gets all sorts of bad press that is far more FUD 
than fact ITRW.

In response to

pgsql-performance by date

Next:From: Tom LaneDate: 2006-02-18 18:36:51
Subject: Re: Force another plan.
Previous:From: Fredrik OlssonDate: 2006-02-18 16:04:34
Subject: Force another plan.

pgsql-hackers by date

Next:From: Michael FuhrDate: 2006-02-18 17:16:49
Subject: Re: Updated email signature
Previous:From: Luke LonerganDate: 2006-02-18 15:41:07
Subject: Re: Blog post on EnterpriseDB...maybe off topic

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group