From: | Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel Append implementation |
Date: | 2017-02-16 06:34:04 |
Message-ID: | CAJ3gD9dbYZDFOUmaqZ3ejXhf90ze0Tf6ZUgxiJCCO3x-Hk3Vfw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 15 February 2017 at 18:40, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Feb 15, 2017 at 4:43 AM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>>> On 14 February 2017 at 22:35, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>> For example, suppose that I have a scan of two children, one
>>>> of which has parallel_workers of 4, and the other of which has
>>>> parallel_workers of 3. If I pick parallel_workers of 7 for the
>>>> Parallel Append, that's probably too high.
>>
>> In the patch, in such case, 7 workers are indeed selected for Parallel
>> Append path, so that both the subplans are able to execute in parallel
>> with their full worker capacity. Are you suggesting that we should not
>> ?
>
> Absolutely. I think that's going to be way too many workers. Imagine
> that there are 100 child tables and each one is big enough to qualify
> for 2 or 3 workers. No matter what value the user has selected for
> max_parallel_workers_per_gather, they should not get a scan involving
> 200 workers.
>
> What I was thinking about is something like this:
>
> 1. First, take the maximum parallel_workers value from among all the children.
>
> 2. Second, compute log2(num_children)+1 and round up. So, for 1
> child, 1; for 2 children, 2; for 3-4 children, 3; for 5-8 children, 4;
> for 9-16 children, 5, and so on.
>
> 3. Use as the number of parallel workers for the children the maximum
> of the value computed in step 1 and the value computed in step 2.
Ah, now that I closely look at compute_parallel_worker(), I see what
you are getting at.
For plain unpartitioned table, parallel_workers is calculated as
roughly equal to log(num_pages) (actually it is log3). So if the table
size is n, the workers will be log(n). So if it is partitioned into p
partitions of size n/p each, still the number of workers should be
log(n). Whereas, in the patch, it is calculated as (total of all the
child workers) i.e. n * log(n/p) for this case. But log(n) != p *
log(x/p). For e.g. log(1000) is much less than log(300) + log(300) +
log(300).
That means, the way it is calculated in the patch turns out to be much
larger than if it were calculated using log(total of sizes of all
children). So I think for the step 2 above, log(total_rel_size)
formula seems to be appropriate. What do you think ? For
compute_parallel_worker(), it is actually log3 by the way.
BTW this formula is just an extension of how parallel_workers is
calculated for an unpartitioned table.
>>> For example, suppose that I have a scan of two children, one
>>> of which has parallel_workers of 4, and the other of which has
>>> parallel_workers of 3. If I pick parallel_workers of 7 for the
>>> Parallel Append, that's probably too high. Had those two tables been
>>> a single unpartitioned table, I would have picked 4 or 5 workers, not
>>> 7. On the other hand, if I pick parallel_workers of 4 or 5 for the
>>> Parallel Append, and I finish with the larger table first, I think I
>>> might as well throw all 4 of those workers at the smaller table even
>>> though it would normally have only used 3 workers.
>>
>>> Having the extra 1-2 workers exit does not seem better.
>>
>> It is here, where I didn't understand exactly why would we want to
>> assign these extra workers to a subplan which tells use that it is
>> already being run by 'parallel_workers' number of workers.
>
> The decision to use fewer workers for a smaller scan isn't really
> because we think that using more workers will cause a regression.
> It's because we think it may not help very much, and because it's not
> worth firing up a ton of workers for a relatively small scan given
> that workers are a limited resource. I think once we've got a bunch
> of workers started, we might as well try to use them.
One possible side-effect I see due to this is : Other sessions might
not get a fair share of workers due to this. But again, there might be
counter argument that, because Append is now focussing all the workers
on a last subplan, it may finish faster, and release *all* of its
workers earlier.
BTW, there is going to be some logic change in the choose-next-subplan
algorithm if we consider giving extra workers to subplans.
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2017-02-16 06:35:34 | Re: ICU integration |
Previous Message | Justin Workman | 2017-02-16 06:15:49 | Re: Possible issue with expanded object infrastructure on Postgres 9.6.1 |