Re: Parallel Append can break run-time partition pruning

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel Append can break run-time partition pruning
Date: 2020-10-25 01:06:38
Message-ID: CAApHDvqQR7i6feyEvFaE_gr2UgX0=BNHW63tyHHZrsvu5UEYrA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 16 Oct 2020 at 03:01, Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>
> Going over the last few emails, it seems that David held off from
> committing the patch, because of the lack of confidence in its
> robustness. With the patch, a sub-partitioned child's partial
> Append's child paths will *always* be pulled up into the parent's set
> of partial child paths thus preventing the nesting of Appends, which
> the run-time pruning code can't currently cope with. The lack of
> confidence seems to be due to the fact that always pulling up a
> sub-Append's child paths into the parent partial Append's child paths
> *may* cause the latter to behave wrongly under parallelism and the new
> code structure will prevent add_partial_path() from being the
> arbitrator of whether such a path is really the best in terms of cost.
>
> If we can't be confident in that approach, maybe we should consider
> making the run-time pruning code cope with nested Appends?

I've been thinking about this and my thoughts are:

There are other cases where we don't pullup sub-Merge Append nodes
anyway, so I think we should just make run-time pruning work for more
than just the top-level Append/Merge Append.

The case I'm thinking about is the code added in 959d00e9dbe for
ordered Append scans. It's possible a sub-partitioned table has
partitions which don't participate in the same ordering. We need to
keep the sub-Merge Append in those cases... well, at least when
there's more than 1 partition remaining after plan-time pruning.

I've attached the patch which, pending a final look, I'm proposing to
commit for master only. I just don't quite think this is a bug fix,
and certainly, due to the invasiveness of the proposed patch, that
means no backpatch.

I fixed all the stuff about the incorrectly set partitioned_rels list.
What I ended up with there is making it accumulate_append_subpath's
job to also pullup the sub-paths partitioned_rels fields when pulling
up a nested Append/MergeAppend. If there's no pullup, there then we
don't care about the sub-path's partitioned_rels. That's for it to
deal with. With that, I think that run-time pruning will only get the
RT indexes for partitions that we actually have sub-paths for. That's
pretty good, so I added an Assert() to verify that in
make_partitionedrel_pruneinfo(). (I hope I don't regret that later)

This does mean we need to maintain a different partitioned_rels list
for each Append path we consider. So there's a number (six) of these
now between add_paths_to_append_rel() and
generate_orderedappend_paths(). To try to minimise the impact of
that, I've changed the field so that instead of being a List of
IntLists, it's just a List of Relids. The top-level list just
contains a single element until you get a UNION ALL that selects from
a partitioned table on each side of the union. Merging sub-path
partitioned rel RT indexes into the existing element is now pretty
cheap as it just uses bms_add_members() rather the list_concat we'd
have had to have used if it was still a List of IntLists.

After fixing up how partitioned_rels is built, I saw there were no
usages of RelOptInfo.partitioned_child_rels, so I got rid of it.

I did another couple of little cleanups and wrote some regression
tests to test all this.

Overall I'm fairly happy with this, especially getting rid of a
partitioned table related field from RelOptInfo.

David

Attachment Content-Type Size
v1-0001-Allow-run-time-pruning-on-nested-Append-MergeAppe.patch text/plain 36.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hou, Zhijie 2020-10-25 01:22:55 Fix typo in src/backend/utils/cache/lsyscache.c
Previous Message Justin Pryzby 2020-10-24 19:59:49 Re: pg_dump, ATTACH, and independently restorable child partitions