Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeremy Harris <jgh(at)wizmail(dot)org>
Subject: Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"
Date: 2015-07-30 11:35:13
Message-ID: CANP8+jL1Z_x4nvXZ58L7X2Pbx1PekaWzfpwNgavDH=-K+haiow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 30 July 2015 at 12:26, Greg Stark <stark(at)mit(dot)edu> wrote:

>
> On Thu, Jul 30, 2015 at 12:09 PM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
> wrote:
>
>>
>> True, you need a heap to hold the next tuple from each tape in the merge
>> step. To avoid exceeding work_mem, you'd need to push some tuples from the
>> in-memory array to the tape to make room for that. In practice, though, the
>> memory needed for the merge step's heap is tiny. Even if you merge 1000
>> tapes, you only need memory for 1000 tuples in the heap. But yeah, you'll
>> need some logic to share/divide the in-memory array between the heap and
>> the "in-memory tail" of the last tape.
>
>
> It's a bit worse than that because we buffer up a significant chunk of the
> tape to avoid randomly seeking between tapes after every tuple read. But I
> think in today's era of large memory we don't need anywhere near the entire
> work_mem just to buffer to avoid random access. Something simple like a
> fixed buffer size per tape probably much less than 1MB/tape.
>

MERGE_BUFFER_SIZE is currently 0.25 MB, but there was benefit seen above
that. I'd say we should scale that up to 1 MB if memory allows.

Yes, that could be tiny for small number of runs. I mention it because
Heikki's comment that we could use this in the general case would not be
true for larger numbers of runs. Number of runs decreases quickly with more
memory anyway, so the technique is mostly good for larger memory but
certainly not for small memory allocations.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Anastasia Lubennikova 2015-07-30 11:51:16 [PATCH] Microvacuum for gist.
Previous Message Greg Stark 2015-07-30 11:26:38 Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"