Re: qsort, once again

From: "Dann Corbit" <DCorbit(at)connx(dot)com>
To: "Dann Corbit" <DCorbit(at)connx(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, "Jerry Sievers" <jerry(at)jerrysievers(dot)com>
Subject: Re: qsort, once again
Date: 2006-03-17 01:12:35
Message-ID: D425483C2C5C9F49B5B7A41F8944154757D68B@postal.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> owner(at)postgresql(dot)org] On Behalf Of Dann Corbit
> Sent: Thursday, March 16, 2006 4:42 PM
> To: Tom Lane
> Cc: Jonah H. Harris; pgsql-hackers(at)postgresql(dot)org; Jerry Sievers
> Subject: Re: [HACKERS] qsort, once again
>
> > So my feeling is we should just remove the swap_cnt code and return
to
> > the original B&M algorithm. Being much faster than expected for
> > presorted input doesn't justify being far slower than expected for
> > other inputs, IMHO. In the context of Postgres I doubt that
perfectly
> > sorted input shows up very often anyway.
> >
> > Comments?
>
> Checking for presorted input is O(n).
> If the input is random, an average of 3 elements will be tested.
> So adding an in-order check of the data should not be too expensive.
>
> I would benchmark several approaches and see which one is best when
used
> in-place.

Even if "hunks" of the input are sorted, the test is a very good idea.

Recall that we are sorting recursively and so we divide the data into
chunks.

Consider an example...

Quicksort of a field that contains Sex as 'M' for male, 'F' for female,
or NULL for unknown.

The median selection is going to pick one of 'M', 'F', or NULL.
After pass 1 of qsort we will have two partitions. One partition will
have all of one type and the other partition will have the other two
types.

An in-order check will tell us that the monotone partition is sorted and
we are done with it.

Imagine also a table that was clustered but for which we have not
updated statistics. Perhaps it is 98% sorted. Checking for order in
our partitions is probably a good idea.

I think you could also get a good optimization if you are checking for
partitions and find a big section of the partition is not ordered (even
though the whole thing is not). If you could perk the ordered size up
the tree, you could just add another partition to the merge list and
sort the unordered part.

In "C Unleashed" I call this idea partition discovery mergesort.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message William ZHANG 2006-03-17 01:33:03 Re: Bug report form: locale/encoding
Previous Message Dann Corbit 2006-03-17 00:42:20 Re: qsort, once again