Quick Links

B-Tree index builds, CLUSTER, and sortsupport

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	B-Tree index builds, CLUSTER, and sortsupport
Date:	2014-10-11 00:26:48
Message-ID:	CAM3SWZTfKZHTUiWDdHg+6tcYuMsdHoC=bMuAiVgMP9AThj42Gw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Both Robert and Heikki expressed dissatisfaction with the fact that
B-Tree index builds don't use sortsupport. Because B-Tree index builds
cannot really use the "onlyKey" optimization, the historic oversight
of not supporting B-Tree builds (and CLUSTER) might have been at least
a little understandable [1]. But with the recent work on sortsupport
for text, it's likely that B-Tree index builds will be missing out on
rather a lot by not taking advantage of sortsupport.

Attached patch modifies tuplesort to correct this oversight. What's
really nice about it is that there is actually a net negative code
footprint:

src/backend/access/nbtree/nbtsort.c | 63 +++---
src/backend/utils/sort/sortsupport.c | 77 ++++++--
src/backend/utils/sort/tuplesort.c | 274 +++++++++++----------------
src/include/utils/sortsupport.h | 3 +
4 files changed, 203 insertions(+), 214 deletions(-)

I've been able to throw out a lot of code, including two virtually
identical inlinable comparators (that have logic for NULL-wise
comparisons). This patch pretty much justifies itself as a refactoring
exercise, without performance improvements entering into it, which is
nice. I haven't benchmarked it yet, but it seems reasonable to assume
that much the same benefits will be seen as were seen for the
MinimalTuple case (for multiple-attribute sorts, that similarly cannot
use the "onlyKey" optimization).

With this patch, all callers of _bt_mkscankey_nodata() now use
sortsupport. This raises the question: Why not just have
_bt_mkscankey_nodata() do it all for us? I think that continuing to
have B-Tree provide a scanKey, and working off of that is a better
abstraction. It would save a few cycles to have the
index_getprocinfo() call currently within _bt_mkscankey_nodata() look
for BTSORTSUPPORT_PROC rather than BTORDER_PROC and be done with it,
but I'd call that a modularity violation. Tuplesort decides a strategy
(BTGreaterStrategyNumber or BTLessStrategyNumber) based on the scanKey
B-Tree private flag SK_BT_DESC, and it's appropriate for that to live
in tuplesort's head if possible, because tuplesort/sortsupport tracks
sort "direction" in a fairly intimate way. Besides, I think that
sortsupport already has enough clients for what it is.

I imagine it makes sense to directly access a sortsupport-installed
comparator through a scanKey, for actual index scans (think
abbreviated keys in internal B-Tree pages), but I still want
uniformity with the other cases within tuplesort (i.e. the
MinimalTuple and Datum cases), so I'm not about to change scanKeys to
have a comparator that doesn't need a fmgr trampoline for the benefit
of this patch. Note that I don't even store a scanKey within
Tuplesortstate anymore. That uniformity is what allowed to to throw
out the per-tuplesort-case reversedirection() logic, and use generic
reversedirection() logic instead (as anticipated by current comments
within Tuplesortstate).

Thoughts?

[1] http://www.postgresql.org/message-id/CAM3SWZQLg8nx2YEb+67xx0giG+-fOLfbtQJKg+jL15_zhi1V7w@mail.gmail.com
--
Peter Geoghegan

Attachment	Content-Type	Size
sortsupport-btree.2014_10_10.patch	text/x-patch	29.2 KB

Responses

Re: B-Tree index builds, CLUSTER, and sortsupport at 2014-11-05 16:36:02 from Jim Nasby
Re: B-Tree index builds, CLUSTER, and sortsupport at 2014-11-06 00:33:29 from Andreas Karlsson

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2014-10-11 00:34:24	Re: Materialized views don't show up in information_schema
Previous Message	Noah Misch	2014-10-11 00:24:33	Re: orangutan seizes up during isolation-check