Re: Using quicksort for every external sort run

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Subject: Re: Using quicksort for every external sort run
Date: 2015-12-24 03:48:16
Message-ID: CAM3SWZQkgxVKbqZ91BjfCgKO9AkAjf-wJxsrgGWEgMQvAcdy7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 23, 2015 at 1:03 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> The regression I found when building an index on a column of
> 400,000,000 md5(random()::text) with 64MB maintenance_work_mem was not
> hard to find at all. I still don't understand what is going on with
> it, but it is reproducible. Perhaps it is very unlikely and I just
> got very lucky in finding it immediately after switching to that
> data-type for my tests, but I wouldn't assume that on current
> evidence.

Well, that is a lot of tuples to sort with such a small amount of memory.

I have a new theory. Maybe part of the problem here is that in very
low memory conditions, the tape overhead really is kind of wasteful,
and we're back to having to worry about per-tape overhead (6 tapes may
have been far too miserly as a universal number back before that was
fixed [1], but that doesn't mean that the per-tape overhead is
literally zero). You get a kind of thrashing, perhaps. Also, more
tapes results in more random I/O, and that's an added cost, too; the
cure may be worse than the disease.

I also think that this might be a problem in your case:

* In this calculation we assume that each tape will cost us about 3 blocks
* worth of buffer space (which is an underestimate for very large data
* volumes, but it's probably close enough --- see logtape.c).

I wonder, what's the situation here like with the attached patch
applied on top of what you were testing? I think that we might be
better off with more merge steps when under enormous memory pressure
at the low end, in order to be able to store more tuples per tape (and
do more sorting using quicksort). I also think that under conditions
such as you describe, this code may play havoc with memory accounting:

/*
* Decrease availMem to reflect the space needed for tape buffers; but
* don't decrease it to the point that we have no room for tuples. (That
* case is only likely to occur if sorting pass-by-value Datums; in all
* other scenarios the memtuples[] array is unlikely to occupy more than
* half of allowedMem. In the pass-by-value case it's not important to
* account for tuple space, so we don't care if LACKMEM becomes
* inaccurate.)
*/
tapeSpace = (int64) maxTapes *TAPE_BUFFER_OVERHEAD;

if (tapeSpace + GetMemoryChunkSpace(state->memtuples) < state->allowedMem)
USEMEM(state, tapeSpace);

Remember, this is after the final grow_memtuples() call that uses your
intelligent resizing logic [2], so we'll USEMEM() in a way that
effectively makes some non-trivial proportion of our optimal memtuples
sizing unusable. Again, that could be really bad for cases like yours,
with very little memory relatively to data volume.

Thanks

[1] Commit df700e6b4
[2] Commit 8ae35e918
--
Peter Geoghegan

Attachment Content-Type Size
maxorder_theory.patch text/x-patch 1.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2015-12-24 04:05:34 Re: Additional role attributes && superuser review
Previous Message Tomas Vondra 2015-12-24 03:32:43 Re: PATCH: use foreign keys to improve join estimates v1