Skip site navigation (1) Skip section navigation (2)

Re: Progress on fast path sorting, btree index creation time

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Progress on fast path sorting, btree index creation time
Date: 2012-01-06 17:10:09
Message-ID: CAEYLb_UdPe7zxWxGvjsqV7z05u5AA9q8h5dQAYdFpx6U4Z_gOA@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On 5 January 2012 20:23, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I don't have a problem with the idea of a pg_always_inline, but I'm
> wondering what sort of fallback mechanism you propose.  It seems to me
> that if the performance improvement is conditioned on inlining be done
> and we're not confident that we can force the compiler to inline, we
> ought to fall back to not making the specialization available, rather
> than to making something available that may not work well.  Of course
> if it's only a question of a smaller performance gain, rather than an
> actual loss, then that's not as much of an issue...

pg_always_inline is more a less a neat adjunct to what I've already
done. You can take it or leave it, but it would be irrational to
dismiss it out of hand, because it is demonstrably effective in this
case. Using non-portable things like function attributes is fairly
well precedented - we use __attribute__((format(*) )) frequently. We
don't need a fallback mechanism other than "#else #define
pg_always_inline inline #endif". I think these facilities are
available to well over 99% of our users, who use GCC, GCC compatible
compiler and MSVC builds.

I have basically no sympathy for anyone who uses a compiler that can't
inline functions. Do such people actually exist?

>> ISTM that protecting
>> against that outside the qsort implementation (but only for index
>> sorts) is wrong-headed.
>
> Assuming you mean that qsort_arg() will protect against this stuff so
> that callers don't need to do it separately, +1.

That's exactly what I mean. Going quadratic in the face of lot of
duplicates is something that, remarkably, was a problem well into the
1990s, apparently because some guy noticed it one day, though I think
that lots of popular implementations happened to be unaffected anyway.

As you know, all queries tested have lots and lots of duplicates (a
~1.5GB table that contains the same number of distinct elements as a
~10MB table once did), and removing the "duplicate protection" for
btrees, on top of everything else, sees the qsort do an awful lot
better than HEAD does with the psuedo-protection. As I've said, we
already lack any such "protection" for heap tuples. We are unaffected
now anyway, in particular, because we do this, as apparently
recommended by both Sedgewick and Knuth:

		while (pb <= pc && (r = cmp(pb, a)) <= 0)
		{
			if (r == 0)
			{
				swap(pa, pb);
				pa += es;
			}
			pb += es;
		}

Note that the partition swap occurs *because* the pivot and element
are equal. You might imagine that this is superfluous. It turns out
that it protects against the duplicates, resulting in sub-partitions
about the same size (though it occurs to me that it does so at the
expense of precluding parallelism). In short, Sedgewick would approve
of our qsort implementation.

As for the "compare a tuple to itself" thing, that's just silly, any
was probably only ever seen in some homespun implementation at some
point a relatively long time ago, but I've asserted against it anyway.

> I can't find any description of how this actually works... obviously,
> we'd like to find a close-to-median element, but how do we actually do
> that cheaply?

Uh, it works whatever way you want it to work. The implementation has
to figure out how to get the median ahead of time.

> I doubt that this would be worthwhile -- the pivot that we pick at the
> toplevel doesn't really matter much; even if it's bad, we can recover
> just fine if the pivot we pick one level down is good, or the level
> after that.  We only really have a problem if we keep systematically
> picking bad pivots every time.  And if we do have that problem,
> improving the toplevel choice of pivot is only going to help modestly,
> because we'll still be O(n^2) on each partition.
>
>> Can I get a straw poll on how much of a problem worst-case performance
>> of qsort is believed to be?
>
> I'd be reluctant to remove any of the protections we currently have,
> because I bet they were all put in to fix problems that people hit in
> the field.  But I don't know that we need more.  Of course,
> consolidating them into qsort() itself rather than its callers
> probably makes sense, as noted above.

I was merely proposing something that would compliment the med3 method
and our existing protections.

>> In a perfect world, if it were somehow possible to know the perfect
>> pivots ahead of time from a histogram or something, we'd have a quick
>> sort variant with worst-case performance of O(n log(n)). That, or the
>> limited approximation that I've sketched would perhaps be worthwhile,
>> even if it was subject to a number of caveats. Wikipedia claims that
>> the worst case for quick sort is O(n log(n)) with the refinements
>> recommended by Sedgewick's 1978 paper, but this seems like nonsense to
>> me - the med3 technique is a good heuristic in practice, but it's
>> perfectly possible in theory for it's sampling to always get things
>> completely wrong (this is not an unfamiliar problem).
>
> I agree.  After reading
> http://nanxkurniawan.wordpress.com/2010/05/31/quicksort/ I am inclined
> to think that things are still O(n lg n) even if we always partition
> so that any fixed percentage of the data -- is on one side of the
> partition element and the remainder is on the other side.  So for
> example if all of our partition elements are greater than 1% of the
> elements in their partition and less than the other 99%, we'll still
> be O(n lg n), though with a considerably higher constant factor, I
> think.  However, if our partition elements are always a fixed NUMBER
> of elements from the end - whether it's 1 or 100 - then we'll be
> O(n^2), though again the constant factor will depend on how many
> elements you knock off each time.  In practice I'm not sure whether an
> algorithmic analysis matters much, because in real life the constant
> factors are probably pretty important, hence the median-of-medians
> approach with n > 40.

Yeah, it's true that you have to be exceptionally unlucky to actually
see worst-case performance, but I imagined that it might be enough of
a difference to be worth pursuing. Experimentally, I can certainly see
the difference (I simulated it), but it is underwhelming next to
everything else.

Might be worth simplifying the swap logic, given as we only ever have
to swap SortTuple structs in qsort_arg, and given that it would make
the meta qsort_arg that much less ugly.

>> While I'm thinking out loud, here's another idea: Have a GUC which,
>> when enabled (perhaps potentially at various different granularities),
>> makes Timsort the in-memory sorting algorithm, as it now is by default
>> for Python, Java SE7 (for arrays) and the Android platform; for
>> certain applications, this could be a real win, and I don't think that
>> it would have to be invasive: it could be dropped in.
>
> I gather from a quick read that this is supposed to be especially good
> when the data is already somewhat sorted.  Would there be any merit in
> trying to guess when that's likely to be the case (e.g. based on the
> correlation statistic)?   That seems like a stretch, but OTOH a GUC
> doesn't feel great either: what is the poor user to do with a query
> that does 2 sorts, one of which is faster with QS and the other of
> which is faster with Timsort?  Ugh.

I'd imagined that it might work a bit like default_statistics_target.
Of course, I was just thinking out loud. Also, we do a very good job
on *perfectly* pre-sorted input because we check for pre-sorted input.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Responses

pgsql-hackers by date

Next:From: Dave CramerDate: 2012-01-06 17:18:01
Subject: pgsphere
Previous:From: Simon RiggsDate: 2012-01-06 16:50:45
Subject: Re: CLOG contention

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group