Re: Parallel Append implementation

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Append implementation
Date: 2017-04-05 09:22:38
Message-ID: CAJ3gD9fM5Fa-+PHPG8d6bwgnKTOcOM3aw95EZZOrT84T8p6qng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5 April 2017 at 01:43, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2017-04-04 08:01:32 -0400, Robert Haas wrote:
>> On Tue, Apr 4, 2017 at 12:47 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> > I don't think the parallel seqscan is comparable in complexity with the
>> > parallel append case. Each worker there does the same kind of work, and
>> > if one of them is behind, it'll just do less. But correct sizing will
>> > be more important with parallel-append, because with non-partial
>> > subplans the work is absolutely *not* uniform.
>>
>> Sure, that's a problem, but I think it's still absolutely necessary to
>> ramp up the maximum "effort" (in terms of number of workers)
>> logarithmically. If you just do it by costing, the winning number of
>> workers will always be the largest number that we think we'll be able
>> to put to use - e.g. with 100 branches of relatively equal cost we'll
>> pick 100 workers. That's not remotely sane.
>
> I'm quite unconvinced that just throwing a log() in there is the best
> way to combat that. Modeling the issue of starting more workers through
> tuple transfer, locking, startup overhead costing seems a better to me.
>
> If the goal is to compute the results of the query as fast as possible,
> and to not use more than max_parallel_per_XXX, and it's actually
> beneficial to use more workers, then we should. Because otherwise you
> really can't use the resources available.
>
> - Andres

This is what the earlier versions of my patch had done : just add up
per-subplan parallel_workers (1 for non-partial subplan and
subpath->parallel_workers for partial subplans) and set this total as
the Append parallel_workers.

Robert had a valid point that this would be inconsistent with the
worker count that we would come up with if it were a single table with
a cost as big as the total cost of all Append subplans. We were
discussing rather about partitioned table versus if it were
unpartitioned, but I think the same argument goes for a union query
with non-partial plans : if we want to clamp down the number of
workers for a single table for a good reason, we should then also
follow that policy and prevent assigning too many workers even for an
Append.

Now I am not sure of the reason why for a single table parallel scan,
we increase number of workers logarithmically; but I think there might
have been an observation that after certain number of workers, adding
up more workers does not make significant difference, but this is just
my guess.

If we try to calculate workers based on each of the subplan costs
rather than just the number of workers, still I think the total worker
count should be a *log* of the total cost, so as to be consistent with
what we did for other scans. Now log(total_cost) does not increase
significantly with cost. For cost of 1000 units, the log3(cost) will
be 6, and for cost of 10,000 units, it is 8, i.e. just 2 more workers.
So I think since its a logarithmic value, it would be might as well
better to just drop the cost factor, and consider only number of
workers.

But again, in the future if we drop the method of log(), then the
above is not valid. But I think till then we should follow some common
strategy we have been following.

BTW all of the above points apply only for non-partial plans. For
partial plans, what we have done in the patch is : Take the highest of
the per-subplan parallel_workers, and make sure that Append workers is
at least as high as this value.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2017-04-05 09:37:40 Re: strange parallel query behavior after OOM crashes
Previous Message Etsuro Fujita 2017-04-05 09:20:42 Re: postgres_fdw: support parameterized foreign joins