Re: Parallel Append implementation

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Append implementation
Date: 2017-10-05 10:29:00
Message-ID: CAA4eK1LFZurZFywJ+5y9GC6e5t2sUuaFti8jRQVbsx0jyd08rQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 2, 2017 at 6:21 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Oct 1, 2017 at 9:55 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> Isn't it for both? I mean it is about comparing the non-partial paths
>> for child relations of the same relation and also when there are
>> different relations involved as in Union All kind of query. In any
>> case, the point I was trying to say is that generally non-partial
>> relations will have relatively smaller scan size, so probably should
>> take lesser time to complete.
>
> I don't think that's a valid inference. It's true that a relation
> could fail to have a partial path because it's small, but that's only
> one reason among very many. The optimal index type could be one that
> doesn't support parallel index scans, for example.
>

Valid point.

>> Sure, I think it is quite good if we can achieve that but it seems to
>> me that we will not be able to achieve that in all scenario's with the
>> patch and rather I think in some situations it can result in leader
>> ended up picking the costly plan (in case when there are all partial
>> plans or mix of partial and non-partial plans). Now, we are ignoring
>> such cases based on the assumption that other workers might help to
>> complete master backend. I think it is quite possible that the worker
>> backends picks up some plans which emit rows greater than tuple queue
>> size and they instead wait on the master backend which itself is busy
>> in completing its plan. So master backend will end up taking too much
>> time. If we want to go with a strategy of master (leader) backend and
>> workers taking a different strategy to pick paths to work on, then it
>> might be better if we should try to ensure that master backend always
>> starts from the place which has cheapest plans irrespective of whether
>> the path is partial or non-partial.
>
> I agree that it's complicated to get this right in all cases; I'm
> realizing that's probably an unattainable ideal.
>
> However, I don't think ignoring the distinction between partial and
> non-partial plans is an improvement, because the argument that other
> workers may be able to help the leader is a correct one. You are
> correct in saying that the workers might fill up their tuple queues
> while the leader is working, but once the leader returns one tuple it
> will switch to reading from the queues. Every other tuple can be
> returned by some worker. On the other hand, if the leader picks a
> non-partial plan, it must run that plan through to completion.
>
> Imagine (a) a non-partial path with a cost of 1000 returning 100
> tuples and (b) a partial path with a cost of 10,000 returning 1,000
> tuples. No matter which path the leader picks, it has about 10 units
> of work to do to return 1 tuple. However, if it picks the first path,
> it is committed to doing 990 more units of work later, regardless of
> whether the workers have filled the tuple queues or not. If it picks
> the second path, it again has about 10 units of work to do to return 1
> tuple, but it hasn't committed to doing all the rest of the work of
> that path. So that's better.
>
> Now it's hard to get all of the cases right. If the partial path in
> the previous example had a cost of 1 crore then even returning 1 tuple
> is more costly than running the whole non-partial path. When you
> introduce partition-wise join and parallel hash, there are even more
> problems. Now the partial path might have a large startup cost, so
> returning the first tuple may be very expensive, and that work may
> help other workers (if this is a parallel hash) or it may not (if this
> is a non-parallel hash).
>

Yeah, these were the type of cases I am also worried.

> I don't know that we can get all of these
> cases right, or that we should try. However, I still think that
> picking partial paths preferentially is sensible -- we don't know all
> the details, but we do know that they're typically going to at least
> try to divide up the work in a fine-grained fashion, while a
> non-partial path, once started, the leader must run it to completion.
>

Okay, but can't we try to pick the cheapest partial path for master
backend and maybe master backend can try to work on a partial path
which is already picked up by some worker.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Khandekar 2017-10-05 10:41:16 Re: Parallel Append implementation
Previous Message Etsuro Fujita 2017-10-05 10:11:23 Comment typo