Re: Using quicksort for every external sort run

From: Greg Stark <stark(at)mit(dot)edu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Subject: Re: Using quicksort for every external sort run
Date: 2016-02-07 18:51:53
Message-ID: CAM-w4HPOTjeUqjahY+Dh2JzaiNRz_VU4Z9-9wVbeiJsFN4rmvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 7, 2016 at 8:21 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Feb 7, 2016 at 11:00 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> > It was my understanding, based on your emphasis on producing only a
> > single run, as well as your recent remarks on this thread about the
> > first run being special, that you are really only interested in the
> > presorted case, where one run is produced. That is, you are basically
> > not interested in preserving the general ability of replacement
> > selection to double run size in the event of a uniform distribution.
>...
> > You don't want to change the behavior of the current patch for the
> > second or subsequent run; that should remain a quicksort, pure and
> > simple. Do I have that right?
>
> Yes.

I'm not even sure this is necessary. The idea of missing out on
producing a single sorted run sounds bad but in practice since we
normally do the final merge on the fly there doesn't seem like there's
really any difference between reading one tape or reading two or three
tapes when outputing the final results. There will be the same amount
of I/O happening and a 2-way or 3-way merge for most data types should
be basically free.

On Sun, Feb 7, 2016 at 8:21 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> 3. At this point, we have one sorted tape per worker. Perform a final
> merge pass to get the final result.

I don't even think you have to merge until you get one tape per
worker. You can statically decide how many tapes you can buffer in
memory based on work_mem and merge until you get N/workers tapes so
that a single merge in the gather node suffices. I would expect that
to nearly always mean the workers are only responsible for generating
the initial sorted runs and the single merge pass is done in the
gather node on the fly as the tuples are read.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-02-07 19:17:24 Re: First-draft release notes for next week's back-branch releases
Previous Message Robert Haas 2016-02-07 17:21:10 Re: Using quicksort for every external sort run