Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeremy Harris <jgh(at)wizmail(dot)org>
Subject: Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"
Date: 2015-07-31 06:56:23
Message-ID: CAM3SWZRH+vN1NxVz5tpdj=9N3dqhBCsW3aGPxNCOrPRJuer2zQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 30, 2015 at 4:01 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Thu, Jul 30, 2015 at 12:00 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> Hmm. You don't really need to merge the in-memory array into the tape, as
>> you know that all the tuples in the in-memory must come after the tuples
>> already on the tape. You can just return all the tuples from the tape first,
>> and then all the tuples from the array.
>
> It's more complicated than it appears, I think. Tuples may be variable
> sized. WRITETUP() performs a pfree(), and gives us back a variable
> amount of availMem. What if we dumped a single, massive, outlier tuple
> out when a caller passes it and it goes to the root of the heap? We'd
> dump that massive tuple in one go (this would be an incremental
> dumptuples() call, which we still do in the patch), making things
> !LACKMEM() again, but by an usually comfortable margin. We read in a
> few more regular tuples, but we're done consuming tuples before things
> ever get LACKMEM() again (no more dumping needed, at least with this
> patch applied).
>
> What prevents the tuple at the top of the in-memory heap at the point
> of tuplesort_performsort() (say, one of the ones added to the heap as
> our glut of memory was *partially* consumed) being less than the
> last/greatest tuple on tape? If the answer is "nothing", a merge step
> is clearly required.

It's simple to prove this with the attached rough patch, intended to
be applied on top of Postgres with my patch. It hacks
tuplesort_gettuple_common() to always return tape tuples first, before
returning memtuples only when tape tuples have been totally exhausted.
If you run my cursory regression test suite with this, you'll see
serious regressions. I also attach a regression test diff file from my
development system, to save you the trouble of trying this yourself.
Note how the "count(distinct(s))" numbers get closer to being correct
(lower) as work_mem increases make tuplesort approach an internal
sort.

It's possible that we can get away with something cheaper than a merge
step, but my impression right now is that it isn't terribly expensive.
OTOH, if we can make this work with the randomAccess case by being
more clever about merging, that could be worthwhile.
--
Peter Geoghegan

Attachment Content-Type Size
no_merge_step_problematic.patch text/x-patch 909 bytes
regression.diffs application/octet-stream 4.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2015-07-31 07:59:07 Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"
Previous Message Atri Sharma 2015-07-31 06:45:19 Re: Updatable view?