Progress on fast path sorting, btree index creation time

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Progress on fast path sorting, btree index creation time
Date: 2011-12-30 02:03:14
Message-ID: CAEYLb_VkoXJkG4Ehb=ah2yK6cqSCZLb-YHUCOuFNxgT7hENrrA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'll try and keep this terse. I've promised to justify the number of
specialisations that are in my fast-path sorting patch, and I may yet
conclude that a reduction is appropriate. Not today though - there are
quite a few ideas in the patch (even more now), and not all have been
exhaustively evaluated. Maybe that isn't what some had hoped for right
now, but deferring publicly and definitively answering those questions
to focus on tweaking the patch has brought additional benefits. I felt
that I went too long without checking in with the community though.
The purpose of this e-mail is to:

1. Report back a further improvement in the performance benefits seen
when sorting with multiple sortkeys. The performance benefits seen for
that case were previously relatively modest; they're better now.

2. Report improvements from applying these techniques to btree index
tuplesorting (short version: they're also quite good). The data used
is exactly the same as it was in my previous benchmark; orderlines is
1538MB, and has lots of duplicates. The environment is also identical.

3. Resolve two anticipated controversies that are, respectively,
somewhat orthogonal and completely orthogonal to the binary bloat
controversy. The first (controversy A) is that I have added a new
piece of infrastructure, pg_always_inline, which, as the name
suggests, is a portable way of insisting that a function should be
invariably inlined. Portable libraries like libc have been doing this
for a long time now, and I actually use the libc macro (that expands
to __attribute__((always_inline)) ) where possible. The second
possible point of contention (controversy B) is that I have jettisoned
various protections against bad qsort implementations that I believe
are a legacy of when we used the system qsort pre-2006, that can no
longer be justified. For example, quick sort performing badly in the
face of lots of duplicates is a well understood problem today
(partitioning should stop on equal keys), and ISTM that protecting
against that outside the qsort implementation (but only for index
sorts) is wrong-headed.

The first possibly controversial adjustment (controversy A) is clearly
also the reason for (1) - that has been well isolated. I haven't got
around to quantifying the performance improvement seen due to the
second possibly controversial adjustment (controversy B), but I
believe that it can be justified as a case of effectively removing
redundant code anyway. After all, we now assert against comparing a
tuple to itself anyway, and the "cheap insurance" that existed in
comparetup_index_btree was never present in any form in
comparetup_index_heap, and we heard no complaints, AFAIK.

Attached are:

* A detailing of libc's use of __always_inline /
__attribute__((always_inline)). Note that it often appears in code
that is built for all platforms.

* The WIP patch itself, rebased to integrate with the new SortSupport
infrastructure. I've gone a bit crazy with btree specialisations, but
I suspect that's where it matters least and makes most sense.

* A new Python script for bench marking index creation, that is
similar to the other one I previously posted for bench marking sorting
heap tuples.

* A spreadsheet that shows the results of re-running my earlier heap
tuple sorting benchmark with this new patch. The improvement in the
query that orders by 2 columns is all that is pertinent there, when
considering the value of (1) and the sense in standing still for
controversy A.

* A spreadsheet that shows the difference in index creation times,
generated with the help of the new python script.

Thoughts?

I had another idea when writing this patch that I haven't developed at
all but I'll share anyway. That is, it might be a good idea to use a
balance quicksort:

http://xlinux.nist.gov/dads//HTML/balancedqsrt.html

It might be possible to get a reasonable approximation of the actual
median value of a given column with existing statistics, which could
be hinted to qsort_arg. This would do a better job of guessing an
appropriate initial pivot value for qsort than the med3 sampling
technique (what we do now, advocated by Sedgewick: use the median of
the first, middle and last elements of the current partition), though
we'd still use that med3 technique to select all other pivots, and
perhaps signal to qsort_arg "you're on your own, fall back of med3 for
the initial pivot" with a null ptr, according to some heuristic.
There's obviously a not inconsiderable impedance mismatch to resolve
if we're to do that though, so that Tuplesortstate has a pointer to
the median SortTuple.

Can I get a straw poll on how much of a problem worst-case performance
of qsort is believed to be?

In a perfect world, if it were somehow possible to know the perfect
pivots ahead of time from a histogram or something, we'd have a quick
sort variant with worst-case performance of O(n log(n)). That, or the
limited approximation that I've sketched would perhaps be worthwhile,
even if it was subject to a number of caveats. Wikipedia claims that
the worst case for quick sort is O(n log(n)) with the refinements
recommended by Sedgewick's 1978 paper, but this seems like nonsense to
me - the med3 technique is a good heuristic in practice, but it's
perfectly possible in theory for it's sampling to always get things
completely wrong (this is not an unfamiliar problem). How often it
messes up in the real world and how much it matters is something that
I wouldn't like to speculate on, though it is the main factor that
will determine if the idea is worth pursuing. I can tell you that I
have heard one person observe that it had an unpredictable runtime.
However, they might not have noticed if this patch was applied,
because in general the worse quicksort does the better this patch
does. Also, the fact that we use a median of "medians" when n > 40
(Dr. Sedgewick again, if I'm not mistaken) makes me a little skeptical
of that claim. Come to think of it, it might not be a bad idea to add
a bunch of comments to qsort_arg while I have this swapped into my
head, as it currently has no comments at all.

While I'm thinking out loud, here's another idea: Have a GUC which,
when enabled (perhaps potentially at various different granularities),
makes Timsort the in-memory sorting algorithm, as it now is by default
for Python, Java SE7 (for arrays) and the Android platform; for
certain applications, this could be a real win, and I don't think that
it would have to be invasive: it could be dropped in.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachment Content-Type Size
libc_always_inlined_use.txt text/plain 21.3 KB
fastpath_sort_btree_rev.patch text/x-patch 47.6 KB
results_server_btree.ods application/vnd.oasis.opendocument.spreadsheet 11.7 KB
tuplesort_btree_test.py text/x-python 2.8 KB
new_results_server_heap.ods application/vnd.oasis.opendocument.spreadsheet 11.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2011-12-30 02:11:22 Re: [RFC] grants vs. inherited tables
Previous Message Daniel Farina 2011-12-30 00:40:53 Re: backup_label during crash recovery: do we know how to solve it?