Re: Fix BUG #17335: Duplicate result rows in Gather node

From: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fix BUG #17335: Duplicate result rows in Gather node
Date: 2022-01-23 11:56:46
Message-ID: 382c3b52071a087fd4357e22959b3ddb05e524b7.camel@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

В Чт, 20/01/2022 в 09:32 +1300, David Rowley пишет:
> On Fri, 31 Dec 2021 at 00:14, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
> > Suggested quick (and valid) fix in the patch attached:
> > - If Append has single child, then copy its parallel awareness.
>
> I've been looking at this and I've gone through changing my mind about
> what's the right fix quite a number of times.
>
> My current thoughts are that I don't really like the fact that we can
> have plans in the following shape:
>
> Finalize Aggregate
> -> Gather
> Workers Planned: 1
> -> Partial Aggregate
> -> Parallel Hash Left Join
> Hash Cond: (gather_append_1.fk = gather_append_2.fk)
> -> Index Scan using gather_append_1_ix on gather_append_1
> Index Cond: (f = true)
> -> Parallel Hash
> -> Parallel Seq Scan on gather_append_2
>
> It's only made safe by the fact that Gather will only use 1 worker.
> To me, it just seems too fragile to assume that's always going to be
> the case. I feel like this fix just relies on the fact that
> create_gather_path() and create_gather_merge_path() do
> "pathnode->num_workers = subpath->parallel_workers;". If someone
> decided that was to work a different way, then we risk this breaking
> again. Additionally, today we have Gather and GatherMerge, but we may
> one day end up with more node types that gather results from parallel
> workers, or even a completely different way of executing plans.

It seems strange parallel_aware and parallel_safe flags neither affect
execution nor are properly checked.

Except parallel_safe is checked in ExecSerializePlan which is called from
ExecInitParallelPlan, which is called from ExecGather and ExecGatherMerge.
But looks like this check doesn't affect execution as well.

>
> I think a safer way to fix this is to just not remove the
> Append/MergeAppend node if the parallel_aware flag of the only-child
> and the Append/MergeAppend don't match. I've done that in the
> attached.
>
> I believe the code at the end of add_paths_to_append_rel() can remain as is.

I found clean_up_removed_plan_level also called from set_subqueryscan_references.
Is there a need to patch there as well?

And there is strange state:
- in the loop by subpaths, pathnode->node.parallel_safe is set to AND of
all its subpath's parallel_safe
(therefore there were need to copy it in my patch version),
- that means, our AppendPath is parallel_aware but not parallel_safe.
It is ridiculous a bit.

And it is strange AppendPath could have more parallel_workers than sum of
its children parallel_workers.

So it looks like whole machinery around parallel_aware/parallel_safe has
no enough consistency.

Either way, I attach you version of fix with my tests as new patch version.

regards,
Yura Sokolov

Attachment Content-Type Size
v2-0001-Fix-duplicate-result-rows-after-Append-path-remov.patch text/x-patch 11.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marcos Pegoraro 2022-01-23 14:00:12 current_schema will not use an text index ?
Previous Message Trevor Gross 2022-01-23 10:56:11 Re: WIP: System Versioned Temporal Table