Re: Re: Which qsort is used

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Dann Corbit" <DCorbit(at)connx(dot)com>
Cc: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>, "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Luke Lonergan" <llonergan(at)greenplum(dot)com>, "Neil Conway" <neilc(at)samurai(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Which qsort is used
Date: 2005-12-17 05:03:25
Message-ID: 3148.1134795805@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Dann Corbit" <DCorbit(at)connx(dot)com> writes:
> Both of them are modified versions of Bentley's sort algorithm.

Jon Bentley of Bell Labs? Small world ... he was my thesis adviser
for awhile when he was at CMU. He's a good algorithms man, for sure.

> I just added the in-order check to both of them, and the reversed
> partition check for the second method (in the case of the subfiles
> because insertion sort is allergic to reversed partitions).

I've still got a problem with these checks; I think they are a net
waste of cycles on average. They would only be a win if you expected a
nontrivial percentage of perfectly-in-order or perfectly-reverse-order
inputs. What I think is much more probable in the Postgres environment
is almost-but-not-quite-ordered inputs --- eg, a table that was
perfectly ordered by key when filled, but some of the tuples have since
been moved by UPDATEs. This is the worst possible case for the in-order
checks, because they can then grovel over large percentages of the file
before failing ... and when they fail, those cycles are entirely wasted;
you have not advanced the state of the sort at all.

For the "usual" case of randomly ordered input, of course it doesn't
matter much at all because the in-order checks will fail after examining
not too many items. But to argue that the checks are worth making, you
have to assume that perfectly-ordered inputs are more common than
almost-ordered inputs, and I think that is exactly the wrong assumption
for the Postgres environment. I sure haven't seen any evidence that
it's a good assumption.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dann Corbit 2005-12-17 05:23:07 Re: Re: Which qsort is used
Previous Message Bruce Momjian 2005-12-17 04:10:43 Re: Automatic function replanning