Re: Re: Which qsort is used

From: "Dann Corbit" <DCorbit(at)connx(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>, "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Luke Lonergan" <llonergan(at)greenplum(dot)com>, "Neil Conway" <neilc(at)samurai(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Which qsort is used
Date: 2005-12-17 05:23:07
Message-ID: D425483C2C5C9F49B5B7A41F8944154757D388@postal.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Sent: Friday, December 16, 2005 9:03 PM
> To: Dann Corbit
> Cc: Qingqing Zhou; Bruce Momjian; Luke Lonergan; Neil Conway; pgsql-
> hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Re: Which qsort is used
>
> "Dann Corbit" <DCorbit(at)connx(dot)com> writes:
> > Both of them are modified versions of Bentley's sort algorithm.
>
> Jon Bentley of Bell Labs? Small world ... he was my thesis adviser
> for awhile when he was at CMU. He's a good algorithms man, for sure.
>
> > I just added the in-order check to both of them, and the reversed
> > partition check for the second method (in the case of the subfiles
> > because insertion sort is allergic to reversed partitions).
>
> I've still got a problem with these checks; I think they are a net
> waste of cycles on average. They would only be a win if you expected
a
> nontrivial percentage of perfectly-in-order or perfectly-reverse-order
> inputs. What I think is much more probable in the Postgres
environment
> is almost-but-not-quite-ordered inputs --- eg, a table that was
> perfectly ordered by key when filled, but some of the tuples have
since
> been moved by UPDATEs. This is the worst possible case for the
in-order
> checks, because they can then grovel over large percentages of the
file
> before failing ... and when they fail, those cycles are entirely
wasted;
> you have not advanced the state of the sort at all.
>
> For the "usual" case of randomly ordered input, of course it doesn't
> matter much at all because the in-order checks will fail after
examining
> not too many items. But to argue that the checks are worth making,
you
> have to assume that perfectly-ordered inputs are more common than
> almost-ordered inputs, and I think that is exactly the wrong
assumption
> for the Postgres environment. I sure haven't seen any evidence that
> it's a good assumption.

The benchmarks say that they (order checks) are a good idea on average
for ordered data, random data, and partly ordered data.

It does not require perfectly ordered data for the checks to be useful.
On mostly ordered data, it is likely that some partitions are perfectly
ordered.

If you trace the algorithms in a debugger you will be surprised at how
often the partitions are ordered, even with random sets as input.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Qingqing Zhou 2005-12-17 06:13:07 Re: Re: Which qsort is used
Previous Message Tom Lane 2005-12-17 05:03:25 Re: Re: Which qsort is used