Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeremy Harris <jgh(at)wizmail(dot)org>
Subject: Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"
Date: 2015-07-30 11:09:54
Message-ID: 55BA0602.50605@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07/30/2015 01:47 PM, Simon Riggs wrote:
> On 30 July 2015 at 08:00, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> So here's a shorter/different explanation of this optimization: When it's
>> time to perform the sort, instead of draining the in-memory heap one tuple
>> at a time to the last tape, you sort the heap with quicksort, and pretend
>> that the sorted heap belongs to the last tape, after all the other tuples
>> in the tape.
>>
>> Some questions/thoughts on that:
>>
>> Isn't that optimization applicable even when you have multiple runs?
>> Quicksorting the heap and keeping it as an array in memory is surely always
>> faster than heapsorting and pushing it to the tape.
>
> It's about use of memory. If you have multiple runs on tape, then they will
> need to be merged and you need memory to do that efficiently. If there are
> tuples in the last batch still in memory then it can work, but it depends
> upon how full memory is from the last batch and how many batches there are.

True, you need a heap to hold the next tuple from each tape in the merge
step. To avoid exceeding work_mem, you'd need to push some tuples from
the in-memory array to the tape to make room for that. In practice,
though, the memory needed for the merge step's heap is tiny. Even if you
merge 1000 tapes, you only need memory for 1000 tuples in the heap. But
yeah, you'll need some logic to share/divide the in-memory array between
the heap and the "in-memory tail" of the last tape.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2015-07-30 11:26:35 Re: multivariate statistics / patch v7
Previous Message Simon Riggs 2015-07-30 10:47:46 Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"