Re: Parallel Append implementation

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Append implementation
Date: 2017-02-17 16:26:04
Message-ID: CAJ3gD9d=bS9F3D_rkRaBGSghki-47STPitbsQgHkXEJ_Npk9Vg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> Do we have any performance measurements where we see that Goal B
> performs better than Goal A, in such a situation? Do we have any
> performance measurement comparing these two approaches in other
> situations. If implementation for Goal B beats that of Goal A always,
> we can certainly implement it directly. But it may not.

I will get back with some performance numbers.

> Also, separating patches for Goal A and Goal B might make reviews easier.

Do you anyways want the patch with the current state to be split ?
Right now, I am not sure how exactly you need me to split it.

>
>>
>>>
>>> BTW, Right now, the patch does not consider non-partial paths for a
>>> child which has partial paths. Do we know, for sure, that a path
>>> containing partial paths for a child, which has it, is always going to
>>> be cheaper than the one which includes non-partial path. If not,
>>> should we build another paths which contains non-partial paths for all
>>> child relations. This sounds like a 0/1 knapsack problem.
>>
>> I didn't quite get this. We do create a non-partial Append path using
>> non-partial child paths anyways.
>
> Let's say a given child-relation has both partial and non-partial
> paths, your approach would always pick up a partial path. But now that
> parallel append can handle non-partial paths as well, it may happen
> that picking up non-partial path instead of partial one when both are
> available gives an overall better performance. Have we ruled out that
> possibility.

Yes, one Append can contain a child c1 with partial path, another
Append path can contain child c1 with non-partial path, and each of
this combination can have two more combinations for child2, and so on,
leading to too many Append paths. I think that's what you referred to
as 0/1 knapsack problem. Right, this does not seem worth it.

I had earlier considered adding a partial Append path containing only
non-partial paths, but for some reason I had concluded that it's not
worth having this path, as it's cost is most likely going to be higher
due to presence of all single-worker paths *and* also a Gather above
them. I should have documented the reason. Let me give a thought on
this.

>>>> Let me try keeping the per-subplan max_worker info in Append path
>>>> itself, like I mentioned above. If that works, the bitmap will be
>>>> replaced by max_worker field. In case of non-partial subpath,
>>>> max_worker will be 1. (this is the same info kept in AppendState node
>>>> in the patch, but now we might need to keep it in Append path node as
>>>> well).
>>>
>>> It will be better if we can fetch that information from each subpath
>>> when creating the plan. As I have explained before, a path is minimal
>>> structure, which should be easily disposable, when throwing away the
>>> path.
>>
>> Now in the v2 patch, we store per-subplan worker count. But still, we
>> cannot use the path->parallel_workers to determine whether it's a
>> partial path. This is because even for a non-partial path, it seems
>> the parallel_workers can be non-zero. For e.g., in
>> create_subqueryscan_path(), it sets path->parallel_workers to
>> subpath->parallel_workers. But this path is added as a non-partial
>> path. So we need a separate info as to which of the subpaths in Append
>> path are partial subpaths. So in the v2 patch, I continued to use
>> Bitmapset in AppendPath. But in Append plan node, number of workers is
>> calculated using this bitmapset. Check the new function
>> get_append_num_workers().
>
> If the subpath from childrel->partial_pathlist, then we set the
> corresponding bit in the bitmap. Now we can infer that for any path if
> that path is found in path->parent->partial_pathlist. Since the code
> always chooses the first partial path, the search in partial_pathlist
> should not affect performance. So, we can avoid maintaining a bitmap
> in the path and keep accumulating it when collapsing append paths.

Thanks. Accordingly did these changes in attached v4 patch.
get_append_num_workers() now uses
linitial(path->parent->partial_pathlist) to determine whether the
subpath is a partial or a non-partial path. Removed the bitmapset
field from AppendPath.

>>>>
>>>>> 12. cost_append() essentially adds costs of all the subpaths and then divides
>>>>> by parallel_divisor. This might work if all the subpaths are partial paths. But
>>>>> for the subpaths which are not partial, a single worker will incur the whole
>>>>> cost of that subpath. Hence just dividing all the total cost doesn't seem the
>>>>> right thing to do. We should apply different logic for costing non-partial
>>>>> subpaths and partial subpaths.
>>>>
>>>> WIth the current partial path costing infrastructure, it is assumed
>>>> that a partial path node should return the average per-worker cost.
>>>> Hence, I thought it would be best to do it in a similar way for
>>>> Append. But let me think if we can do something. With the current
>>>> parallelism costing infrastructure, I am not sure though.
>>>
>>> The current parallel mechanism is in sync with that costing. Each
>>> worker is supposed to take the same burden, hence the same (average)
>>> cost. But it will change when a single worker has to scan an entire
>>> child relation and different child relations have different sizes.
>>
>> I gave more thought on this. Considering each subplan has different
>> number of workers, I think it makes sense to calculate average
>> per-worker cost even in parallel Append. In case of non-partial
>> subplan, a single worker will execute it, but it will next choose
>> another subplan. So on average each worker is going to process the
>> same number of rows, and also the same amount of CPU. And that amount
>> of CPU cost and rows cost should be calculated by taking the total
>> count and dividing it by number of workers (parallel_divsor actually).
>>
>
> That's not entirely true. Consider N child relations with chosen paths
> with costs C1, C2, ... CN which are very very different. If there are
> N workers, the total cost should correspond to the highest of the
> costs of subpaths, since no worker will execute more than one plan.
> The unfortunate worker which executes the costliest path would take
> the longest time.

Yeah, there seems to be no specific method that can compute the total
cost as the maximum of all the subplans total cost. So the assumption
is that there would be roughly equal distribution of workers.

In the new patch, there is a new test case output modification for
inherit.sql , because that test case started failing on account of
getting a ParallelAppend plan instead of Merge Append for an
inheritence table where seqscan was disabled.

Attachment Content-Type Size
ParallelAppend_v4.patch application/octet-stream 33.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Khandekar 2017-02-17 16:26:26 Re: Parallel Append implementation
Previous Message Keith Fiske 2017-02-17 16:23:18 Re: Index corruption with CREATE INDEX CONCURRENTLY