Re: [sqlsmith] Failed assertion during partition pruning

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Zhihong Yu <zyu(at)yugabyte(dot)com>, Andreas Seltenreich <seltenreich(at)gmx(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [sqlsmith] Failed assertion during partition pruning
Date: 2021-01-31 23:49:22
Message-ID: CAApHDvrpE6FtW1-0cc5ZX2q9VcP-_utGF1mvWAuKLQuzrB5bTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 31 Jan 2021 at 11:42, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> This fixes the cases reported by Andreas and Jaime, leaving me
> more confident that there's nothing wrong with David's Assert.

I agree that there is nothing wrong with the Assert.

The commit message of a929e17e5 mentions:

> Here we tighten that up so that partitioned_rels only ever contains the RT
> index for partitioned tables which actually have subpaths in the given
> Append/MergeAppend. We can now Assert that every PartitionedRelPruneInfo
> has a non-empty present_parts. That should allow us to catch any weird
> corner cases that have been missed.

It seems in this case the Assert *did* find a case that I missed.
Unfortunately, I just missed the email that reported the problem,
until yesterday.

> I did not try to add a regression test case, mainly because it's
> not quite clear to me where the original bug is. (I'm a little
> suspicious that the blame might lie with the "match_partition_order"
> cases in generate_orderedappend_paths, which try to bypass
> accumulate_append_subpath without updating the partitioned_rels
> data. But I'm not excited about trying to prove that.)

Yeah, that suspicion is correct. More specifically when
get_singleton_append_subpath() finds a single subpath Append /
MergeAppend, the single subpath is returned.

So what's happening is that we first build the Append paths for the
sub-partitioned table which, in the example case, will have a single
subpath Append path with partitioned_rels containing just the parent
of that partition (i.e trigger_parted_p1). When it comes to doing
generate_orderedappend_paths() for the base relation (trigger_parted),
we find match_partition_order to be true and allow
get_singleton_append_subpath() to pullup the single-subplan Append
path. Unfortunately we just completely ignore that Append path's
partitioned_rels. Back in generate_orderedappend_paths() the
startup_partitioned_rels and total_partitioned_rels variables are
used, both of which only mention the trigger_parted table. The mention
of trigger_parted_p1 is lost.

It could be fixed by modifying get_singleton_append_subpath() to
modify the partitioned_rels when it collapses these single-subpath
Append/MergeAppend path, but I'm very open to looking at just getting
rid of the partitioned_rels field. Prior to a929e17e5 that field was
very loosely set and in serval cases could contain RT indexes of
partitioned tables that didn't even have any subpaths in the given
Append/MergeAppend. a929e17e5 attempted to tighten all that up but
looks like I missed the case above.

> I wonder whether we should consider back-patching this. Another
> thing that seems unclear is whether there is any serious consequence
> to omitting some intermediate partitions from the set considered
> by make_partition_pruneinfo. Presumably it could lead to missing
> some potential run-time-pruning opportunities, but is there any
> worse issue? If there isn't, this is a bigger change than I want
> to put in the back braches.

It shouldn't be backpatched. a929e17e5 only exists in master. Prior to
that AppendPath/MergeAppendPath's partitioned_rels field could only
contain additional partitioned table RT index. There are no known
cases of missing ones prior to a929e17e5, so this bug shouldn't exist
in PG13.

a929e17e5 introduced run-time partition pruning for
sub-Append/MergeAppend paths. The commit message of that explains
that there are plenty of legitimate cases where we can't flatten these
sub-Append/MergeAppend paths down into the top-level
Append/MergeAppend. Originally I had thought we should only bother
doing run-time pruning on the top-level Append/MergeAppend because I
thought these cases were very uncommon. I've now changed my mind.

For a929e17e5, it was not just a matter of removing the lines in [1]
to allow run-time pruning on nested Append/MergeAppends. I also needed
to clean up the sloppy setting of partitioned_rels. The remainder of
the patch attempted that.

FWIW, I hacked together a patch which fixes the bug by passing a
Bitmapset ** pointer to get_singleton_append_subpath(), which set the
bits for the Append / MergeAppend path's partitioned_rels that we get
rid of when it only has a single subpath. This stops the Assert
failure mentioned here. However, I'd much rather explore getting rid
of partitioned_rels completely. I'll now have a look at the patch
you're proposing for that.

Thanks for investigating this and writing the patch. Apologies for
this email missing my attention closer to the time to when it was
initially reported.

David

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=src/backend/optimizer/plan/createplan.c;h=40abe6f9f623ed2922ccc4e1991e97e90322f47d;hp=94280a730c4d9abb2143416fca8e74e76dada042;hb=a929e17e5;hpb=dfc797730fc7a07c0e6bd636ad1a564aecab3161

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-02-01 00:05:15 Re: new heapcheck contrib module
Previous Message Tom Lane 2021-01-31 22:23:25 Re: [HACKERS] [PATCH] Generic type subscripting