Re: Using quicksort for every external sort run

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Subject: Re: Using quicksort for every external sort run
Date: 2016-03-16 22:31:10
Message-ID: CA+TgmoZEo029yf8A65970dFTCwKdaAqiUefxZRVndiXCaeZq_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 10, 2016 at 9:54 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> 1. Creates a separate memory context for tuplesort's copies of
> caller's tuples, which can be reset at key points, avoiding
> fragmentation. Every SortTuple.tuple is allocated there (with trivial
> exception); *everything else*, including the memtuples array, is
> allocated in the existing tuplesort context, which becomes the parent
> of this new "caller's tuples" context. Roughly speaking, that means
> that about half of total memory for the sort is managed by each
> context in common cases. Even with a high work_mem memory budget,
> memory fragmentation could previously get so bad that tuplesort would
> in effect claim a share of memory from the OS that is *significantly*
> higher than the work_mem budget allotted to its sort. And with low
> work_mem settings, fragmentation previously made palloc() thrash the
> sort, especially during non-final merging. In this latest revision,
> tuplesort now almost gets to use 100% of the memory that was requested
> from the OS by palloc() is cases tested.

I spent some time looking at this part of the patch yesterday and
today. This is not a full review yet, but here are some things I
noticed:

- I think that batchmemtuples() is somewhat weird. Normally,
grow_memtuples() doubles the size of the array each time it's called.
So if you somehow called this function when you still had lots of
memory available, it would just double the size of the array.
However, I think the expectation is that it's only going to be called
when availMem is less than half of allowedMem, in which case we're
going to get the special "last increment of memtupsize" behavior,
where we expand the memtuples array by some multiple between 1.0 and
2.0 based on allowedMem/memNowUsed. And after staring at this for a
long time ... yeah, I think this does the right thing. But it
certainly is hairy.

- It's not exactly clear what you mean when you say that the tuplesort
context contains "everything else". I don't understand why that only
ends up containing half the memory ... what, other than the memtuples
array, ends up there?

- If I understand correctly, the point of the MemoryContextReset call
is: there wouldn't be any tuples in memory at that point anyway. But
the OS-allocated chunks might be divided up into a bunch of small
chunks that then got stuck on freelists, and those chunks might not be
the right size for the next merge pass. Resetting the context avoids
that problem by blowing up the freslists. Right? Clever.

- I haven't yet figured out why we use batching only for the final
on-the-fly merge pass, instead of doing it for all merges. I expect
you have a reason. I just don't know what it is.

- I have also not yet figured out why you chose to replace
state->datumTypByVal with state->tuples and reverse the sense. I bet
there's a reason for this, too. I don't know what it is, either.

That's as far as I've gotten thus far.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2016-03-16 23:17:12 Re: Patch: Implement failover on libpq connect level.
Previous Message Jim Nasby 2016-03-16 22:16:14 Re: fd.c doesn't remove files on a crash-restart