Re: Parallel Append implementation

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Append implementation
Date: 2017-04-04 05:28:32
Message-ID: CAJ3gD9crnBW=apd7n=RynX08EzrLSnyzgfAordEuHHufDfTKhA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks Andres for your review comments. Will get back with the other
comments, but meanwhile some queries about the below particular
comment ...

On 4 April 2017 at 10:17, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2017-04-03 22:13:18 -0400, Robert Haas wrote:
>> On Mon, Apr 3, 2017 at 4:17 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> > Hm. I'm not really convinced by the logic here. Wouldn't it be better
>> > to try to compute the minimum total cost across all workers for
>> > 1..#max_workers for the plans in an iterative manner? I.e. try to map
>> > each of the subplans to 1 (if non-partial) or N workers (partial) using
>> > some fitting algorith (e.g. always choosing the worker(s) that currently
>> > have the least work assigned). I think the current algorithm doesn't
>> > lead to useful #workers for e.g. cases with a lot of non-partial,
>> > high-startup plans - imo a quite reasonable scenario.

I think I might have not understood this part exactly. Are you saying
we need to consider per-subplan parallel_workers to calculate total
number of workers for Append ? I also didn't get about non-partial
subplans. Can you please explain how many workers you think should be
expected with , say , 7 subplans out of which 3 are non-partial
subplans ?

>>
>> Well, that'd be totally unlike what we do in any other case. We only
>> generate a Parallel Seq Scan plan for a given table with one # of
>> workers, and we cost it based on that. We have no way to re-cost it
>> if we changed our mind later about how many workers to use.
>> Eventually, we should probably have something like what you're
>> describing here, but in general, not just for this specific case. One
>> problem, of course, is to avoid having a larger number of workers
>> always look better than a smaller number, which with the current
>> costing model would probably happen a lot.
>
> I don't think the parallel seqscan is comparable in complexity with the
> parallel append case. Each worker there does the same kind of work, and
> if one of them is behind, it'll just do less. But correct sizing will
> be more important with parallel-append, because with non-partial
> subplans the work is absolutely *not* uniform.
>
> Greetings,
>
> Andres Freund

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2017-04-04 05:35:13 Re: Statement timeout behavior in extended queries
Previous Message Tsunakawa, Takayuki 2017-04-04 05:24:40 Re: Statement timeout behavior in extended queries