Re: [sqlsmith] Failed assertion during partition pruning

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Zhihong Yu <zyu(at)yugabyte(dot)com>, Andreas Seltenreich <seltenreich(at)gmx(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [sqlsmith] Failed assertion during partition pruning
Date: 2021-01-30 22:42:00
Message-ID: 3010978.1612046520@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> As I said, I'm now thinking it's not the Assert that's faulty.
> If I'm right about that, it's likely that the mistaken labeling
> of these paths has other consequences beyond triggering this
> assertion. (If it has none, I think we'd be better off to remove
> these Path fields altogether, and re-identify the parent rels
> here from the RelOptInfo data.)

After more time poking at this, I've concluded that my parenthetical
remark is the right solution: [Merge]AppendPath.partitioned_rels
ought to be nuked from orbit. It requires a whole lot of squirrely,
now-known-to-be-buggy logic in allpaths.c, and practically the only
place where we use the result is in make_partition_pruneinfo().
That means we're expending a lot of work *per AppendPath*, which
is quite foolish unless there's some reason we can't get the
information at create_plan() time instead. Which we can.

For simplicity of review I divided the patch into two parts.
0001 revises make_partition_pruneinfo() and children to identify
the relevant parent partitions for themselves, which is not too
hard to do by chasing up the child-to-parent AppendRelInfo links.
Not formerly documented, AFAICT, is that we want to collect just
the parent partitions that are between the Append path's own rel
(possibly equal to it) and the subpaths' rels. I'd first tried
to code this by using the top_parent_relids and part_rels[] links
in the RelOptInfos, but that turns out not to work. We might
ascend to a top parent that's above the Append's rel (if considering
an appendpath for a sub-partition, which happens in partitionwise
join). We could also descend to a child at or below some subpath
level, since there are also cases where subpaths correspond to
unflattened non-leaf partitions. Either of those things result
in failures. But once you wrap your head around handling those
restrictions, it's quite simple.

Then 0002 just nukes all the no-longer-necessary logic to compute
the partitioned_rels fields. There are a couple of changes that
are not quite trivial. create_[merge_]append_plan were testing
best_path->partitioned_rels != NIL to shortcut calling
make_partition_pruneinfo at all. I just dropped that, reasoning
that it wasn't saving enough to worry about. Also,
create_append_path was similarly testing partitioned_rels != NIL
to decide if it needed to go to the trouble of using
get_baserel_parampathinfo. I changed that to an IS_PARTITIONED_REL()
test, which isn't *quite* the same thing but seems close enough.
(It changes no regression results, anyway.)

This fixes the cases reported by Andreas and Jaime, leaving me
more confident that there's nothing wrong with David's Assert.

I did not try to add a regression test case, mainly because it's
not quite clear to me where the original bug is. (I'm a little
suspicious that the blame might lie with the "match_partition_order"
cases in generate_orderedappend_paths, which try to bypass
accumulate_append_subpath without updating the partitioned_rels
data. But I'm not excited about trying to prove that.)

I wonder whether we should consider back-patching this. Another
thing that seems unclear is whether there is any serious consequence
to omitting some intermediate partitions from the set considered
by make_partition_pruneinfo. Presumably it could lead to missing
some potential run-time-pruning opportunities, but is there any
worse issue? If there isn't, this is a bigger change than I want
to put in the back braches.

regards, tom lane

Attachment Content-Type Size
0001-dont-use-partitioned_rels-in-partprune.patch text/x-diff 11.8 KB
0002-remove-partitioned_rels.patch text/x-diff 25.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2021-01-31 01:26:31 Re: shared tempfile was not removed on statement_timeout
Previous Message Tom Lane 2021-01-30 21:56:09 Re: Add primary keys to system catalogs