Re: Randomisation for ensuring nlogn complexity in quicksort

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Randomisation for ensuring nlogn complexity in quicksort
Date: 2013-07-01 20:12:16
Message-ID: CA+Tgmoa+4jZf28DQvzqVMWNA5+xeWgcqMcw+5LcDkXwdANWJ=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 1, 2013 at 3:54 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> On Mon, Jul 1, 2013 at 4:32 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> This shouldn't be too complex, and should give us a fixed nlogn complexity even for wild data sets, without affecting existing normal data sets that are present in every day transactions. I even believe that those data sets will also benefit from the above optimisation.
>>
>> The only method of selecting a pivot for quicksort that obtain O(n lg
>> n) run time with 100% certainty is have a magical oracle inside the
>> computer that tells you in fixed time and with perfect accuracy which
>> pivot you should select.
>
> Doesn't a linear median algorithm (say median of medians) get you O(n lg n)?
>
> Granted, with a huge constant (I think 4)... but it should still be O(n lg n).

No. Thinking about this a little more, I believe the way it works
out is that any algorithm for picking the median that guarantees that
a certain *percentage* of the tuples will be in the smaller partition
will have O(n lg n) complexity, but any algorithm that only guarantees
that a fixed *number* of tuples in the smaller partition is still
quadratic in complexity. In the case of a median algorithm, you're
only guaranteed to have 2 elements in the smaller partition, which is
a constant. If you take a median of medians, you're guaranteed to
have 8 elements in the smaller partition, which is bigger, but still a
constant.

The reason why this doesn't matter much in practice is because the
data distribution that causes quadratic behavior for median-of-medians
is not one which is likely to occur in the real world and will
probably only come up if chosen by an adversary who is attempting to
make your life miserable.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2013-07-01 20:20:18 Re: "pg_ctl promote" exit status
Previous Message Joshua D. Drake 2013-07-01 20:10:23 Re: Documentation/help for materialized and recursive views