Re: Parallel Append implementation

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Append implementation
Date: 2017-02-17 09:26:49
Message-ID: CAJ3gD9ddT4AgbGvZkvHxmFK2T4aiJzcicS2qZ9XXC8rYmyct4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16 February 2017 at 20:37, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Feb 16, 2017 at 1:34 AM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>>> What I was thinking about is something like this:
>>>
>>> 1. First, take the maximum parallel_workers value from among all the children.
>>>
>>> 2. Second, compute log2(num_children)+1 and round up. So, for 1
>>> child, 1; for 2 children, 2; for 3-4 children, 3; for 5-8 children, 4;
>>> for 9-16 children, 5, and so on.
>>>
>>> 3. Use as the number of parallel workers for the children the maximum
>>> of the value computed in step 1 and the value computed in step 2.
>>
>> Ah, now that I closely look at compute_parallel_worker(), I see what
>> you are getting at.
>>
>> For plain unpartitioned table, parallel_workers is calculated as
>> roughly equal to log(num_pages) (actually it is log3). So if the table
>> size is n, the workers will be log(n). So if it is partitioned into p
>> partitions of size n/p each, still the number of workers should be
>> log(n). Whereas, in the patch, it is calculated as (total of all the
>> child workers) i.e. n * log(n/p) for this case. But log(n) != p *
>> log(x/p). For e.g. log(1000) is much less than log(300) + log(300) +
>> log(300).
>>
>> That means, the way it is calculated in the patch turns out to be much
>> larger than if it were calculated using log(total of sizes of all
>> children). So I think for the step 2 above, log(total_rel_size)
>> formula seems to be appropriate. What do you think ? For
>> compute_parallel_worker(), it is actually log3 by the way.
>>
>> BTW this formula is just an extension of how parallel_workers is
>> calculated for an unpartitioned table.
>
> log(total_rel_size) would be a reasonable way to estimate workers when
> we're scanning an inheritance hierarchy, but I'm hoping Parallel
> Append is also going to apply to UNION ALL queries, where there's no
> concept of the total rel size.
Yes ParallelAppend also gets used in UNION ALL.

> For that we need something else, which
> is why the algorithm that I proposed upthread doesn't rely on it.

The log2(num_children)+1 formula which you proposed does not take into
account the number of workers for each of the subplans, that's why I
am a bit more inclined to look for some other logic. May be, treat the
children as if they belong to partitions, and accordingly calculate
the final number of workers. So for 2 children with 4 and 5 workers
respectively, Append parallel_workers would be : log3(3^4 + 3^5) .

>
>>> The decision to use fewer workers for a smaller scan isn't really
>>> because we think that using more workers will cause a regression.
>>> It's because we think it may not help very much, and because it's not
>>> worth firing up a ton of workers for a relatively small scan given
>>> that workers are a limited resource. I think once we've got a bunch
>>> of workers started, we might as well try to use them.
>>
>> One possible side-effect I see due to this is : Other sessions might
>> not get a fair share of workers due to this. But again, there might be
>> counter argument that, because Append is now focussing all the workers
>> on a last subplan, it may finish faster, and release *all* of its
>> workers earlier.
>
> Right. I think in general it's pretty clear that there are possible
> fairness problems with parallel query. The first process that comes
> along seizes however many workers it thinks it should use, and
> everybody else can use whatever (if anything) is left. In the long
> run, I think it would be cool to have a system where workers can leave
> one parallel query in progress and join a different one (or exit and
> spawn a new worker to join a different one), automatically rebalancing
> as the number of parallel queries in flight fluctuates. But that's
> clearly way beyond anything we can do right now. I think we should
> assume that any parallel workers our process has obtained are ours to
> use for the duration of the query, and use them as best we can.

> Note that even if the Parallel Append tells one of the workers that there
> are no more tuples and it should go away, some higher level of the
> query plan could make a different choice anyway; there might be
> another Append elsewhere in the plan tree.
Yeah, that looks good enough to justify not losing the workers

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Sharma 2017-02-17 09:44:24 Re: Should we cacheline align PGXACT?
Previous Message Alexander Korotkov 2017-02-17 09:23:28 Re: Should we cacheline align PGXACT?