Re: Parallel Append implementation

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Append implementation
Date: 2017-10-09 10:33:02
Message-ID: CAA4eK1LfMmnNAY8Gjed37kJvbDL7Oz=FAe8wthYJONMiEWcm7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 6, 2017 at 12:03 PM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> On 6 October 2017 at 08:49, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>
>> Okay, but why not cheapest partial path?
>
> I gave some thought on this point. Overall I feel it does not matter
> which partial path it should pick up. RIght now the partial paths are
> not ordered. But for non-partial paths sake, we are just choosing the
> very last path. So in case of mixed paths, leader will get a partial
> path, but that partial path would not be the cheapest path. But if we
> also order the partial paths, the same logic would then pick up
> cheapest partial path. The question is, should we also order the
> partial paths for the leader ?
>
> The only scenario I see where leader choosing cheapest partial path
> *might* show some benefit, is if there are some partial paths that
> need to do some startup work using only one worker. I think currently,
> parallel hash join is one case where it builds the hash table, but I
> guess here also, we support parallel hash build, but not sure about
> the status.
>

You also need to consider how merge join is currently work in parallel
(each worker need to perform the whole of work of right side). I
think there could be more scenario's where the startup cost is high
and parallel worker needs to do that work independently.

For such plan, if leader starts it, it would be slow, and
> no other worker would be able to help it, so its actual startup cost
> would be drastically increased. (Another path is parallel bitmap heap
> scan where the leader has to do something and the other workers wait.
> But here, I think it's not much work for the leader to do). So
> overall, to handle such cases, it's better for leader to choose a
> cheapest path, or may be, a path with cheapest startup cost. We can
> also consider sorting partial paths with decreasing startup cost.
>

Yeah, that sounds reasonable.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2017-10-09 10:35:31 Re: Proposal: Local indexes for partitioned table
Previous Message Amit Kapila 2017-10-09 09:56:16 Re: parallelize queries containing initplans