Re: parallelism and sorting

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallelism and sorting
Date: 2015-11-24 14:23:00
Message-ID: CA+TgmoZ2B5_DA+N3oCuwN0F4LrbOPQx5xhrE5AdMbKETZmHAeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 24, 2015 at 7:59 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Tue, Nov 24, 2015 at 8:59 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> One idea about parallel sort is that perhaps if multiple workers feed
>> data into the sort, they can each just sort what they have and then
>> merge the results.
>
> Sounds like a good approach for parallel sorting, however small extension
> to it that could avoid merging the final results is that workers allocated
> for sort will perform range-based sorting. A simple example to sort integers
> from 1-100 will be, worker-1 will be responsible for sorting any integer
> between 1-50 and worker-2 will be responsible for sorting integers from
> 51-100 and then master backend just needs to ensure that it first returns
> the tuples from worker-1 and then from worker-2. I think it has some
> similarity to your idea-5 (use of repartition), but not exactly same.

This is not so easy to accomplish for a couple of reasons. First, how
would you know where to partition the range? That would work fine if
you had all the data in sorted order to begin with, but of course if
you had that you wouldn't be sorting it. Second, remember that the
data is probably arriving in separate streams in each worker - e.g.
the sort may be being fed by a parallel sequential scan. If you do
what I'm proposing, those workers don't need to communicate with each
other except for the final merge at the end; but to do what you're
proposing, you'd need to move each tuple from the worker that got it
originally to the correct worker. I would guess that would be at
least as expensive as the final merge pass you are hoping to avoid,
and maybe significantly moreso.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chapman Flack 2015-11-24 14:43:48 Re: problem with msvc linker - cannot build orafce
Previous Message Michael Paquier 2015-11-24 14:20:02 Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.