Re: The case for removing replacement selection sort

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: The case for removing replacement selection sort
Date: 2017-08-30 23:59:56
Message-ID: CAH2-WznnfLMYo1cdnpojau3H4aPkcEe-7++sjWRR=kwbgb2yyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 30, 2017 at 3:14 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> This is significantly faster, in a way that's clearly reproducible and
> consistent, despite the fact that we need about 10 merge passes
> without replacement selection, and only have enough memory for 7
> tapes. I think that I could find a case that makes replacement
> selection look much worse, if I tried.

Just to see where we end up, I quickly wrote a patch to remove
replacement selection + replacement_sort_tuples. The LOC impact worked
out like this:

$ git diff master --stat
doc/src/sgml/config.sgml | 39 ----
src/backend/utils/init/globals.c | 1 -
src/backend/utils/misc/guc.c | 10 -
src/backend/utils/misc/postgresql.conf.sample | 1 -
src/backend/utils/sort/tuplesort.c | 403
+++++------------------------------
src/include/miscadmin.h | 1 -
src/test/regress/expected/cluster.out | 17 +-
src/test/regress/sql/cluster.sql | 14 +-
8 files changed, 52 insertions(+), 434 deletions(-)

It was nice to be able to remove comments that make certain
distinctions that replacement selection cares about. These were
tedious to read. I managed to remove several paragraphs within
tuplesort.c's header comments.

Another advantage of the changes made that I noticed in passing is
that SortTuple.tupindex is now only used while merging. It would be
quite possible to remove SortTuple.tupindex entirely, as an additional
piece of work, by providing some other indirection to get that
information when merging. From there, we could replace SortTuple.tuple
with a bitfield, that has one bit for NULLness, and treats other bits
as a 63-bit or 31-bit integer. This integer would be used an offset
into a buffer that we repalloc(), thus removing all retail palloc()s,
even during run generation. Using one big buffer for tuples would make
tuplesort.c quite a bit more memory efficient (SortTuple will only be
16 bytes, we won't waste memory on palloc() headers, and sequential
access is made more cache efficient in presorted pass-by-reference
cases).

I'm not planning to work on this myself. It is similar to something
that Heikki described last year, but goes a bit further by eliminating
all palloc() header overhead, while also not playing tricks with
reclaiming bits from pointers (I recall that Tom didn't like that
aspect of Heikki's proposal at all). There would be new infrastructure
required to make this work -- we'd have to be able to ask routines
like ExecCopySlotMinimalTuple() and heap_copytuple() to copy into our
own large tuple buffer, rather than doing their own palloc() on
tuplesort's behalf. But that seems like a good idea anyway.

I may submit the simple patch to remove replacement selection, if
other contributors are receptive. Apart from everything else, the
"incrementalism" of replacement selection works against cleverer batch
memory management of the type I just outlined, which seems like a
useful direction to take tuplesort in.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-08-31 00:37:08 Re: Update low-level backup documentation to match actual behavior
Previous Message Michael Paquier 2017-08-30 23:52:34 Re: [Proposal] Allow users to specify multiple tables in VACUUM commands