Calculation of relids (pull_varnos result) for PlaceHolderVars

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Calculation of relids (pull_varnos result) for PlaceHolderVars
Date: 2021-01-17 02:12:03
Message-ID: 171041.1610849523@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've been looking into the planner failure reported at [1].
The given test case is comparable to this query in the
regression database:

regression=# select i8.*, ss.v, t.unique2
from int8_tbl i8
left join int4_tbl i4 on i4.f1 = 1
left join lateral (select i4.f1 + 1 as v) as ss on true
left join tenk1 t on t.unique2 = ss.v
where q2 = 456;
ERROR: failed to assign all NestLoopParams to plan nodes

The core of the problem turns out to be that pull_varnos() returns
an incorrect result for the PlaceHolderVar that represents the ss.v
output. Because the only Var within the PHV's expression is i4.f1,
pull_varnos() returns just the relid set (2), implying that the
value can be calculated after having scanned only the i4 relation.
But that's wrong: i4.f1 here represents an outer-join output, so it
can only be computed after forming the join (1 2) of i8 and i4.
In this example, the erroneous calculation leads the planner to
construct a plan with an invalid join order, which triggers a
sanity-check failure in createplan.c.

The relid set (2) is the minimum possible join level at which such
a PHV could be evaluated; (1 2) is the maximum level, corresponding
to the PHV's syntactic position above the i8/i4 outer join. After
thinking about this I've realized that what pull_varnos() ideally
ought to use is the PHV's ph_eval_at level, which is the join level
we actually intend to evaluate it at. There are a couple of
problems, one that's not too awful and one that's a pain in the
rear:

1. pull_varnos() can be used before we've calculated ph_eval_at, as
well as during deconstruct_jointree() which can change ph_eval_at.
This doesn't seem fatal. We can fall back to the conservative
assumption of using the syntactic level if the PlaceHolderInfo isn't
there yet. Once it is (i.e., within deconstruct_jointree()) I think
we are okay, because any given PHV's ph_eval_at should have reached
its final value before we consider any qual involving that PHV.

2. pull_varnos() is not passed the planner "root" data structure,
so it can't get at the PlaceHolderInfo list. We can change its
API of course, but that propagates to dozens of places.

The 0001 patch attached goes ahead and makes those API changes.
I think this is perfectly reasonable to do in HEAD, but it most
likely is an unacceptable API/ABI break for the back branches.
There's one change needed in contrib/postgres_fdw, and other
extensions likely call one or more of the affected functions too.

As an alternative back-branch fix, we could consider the 0002
patch attached, which simply changes pull_varnos() to make the
most conservative assumption that ph_eval_at could wind up as
the PHV's syntactic level (phrels). The trouble with this is
that we'll lose some valid optimizations. There is only one
visible plan change in the regression tests, but it's kind of
unpleasant: we fail to remove a join that we did remove before.
So I'm not sure how much of a problem this'd be in the field.

A third way is to preserve the existing pull_varnos() API in
the back branches, changing all the internal calls to use a
new function that has the additional "root" parameter. This
seems feasible but I've not attempted to code it yet.

(We might be able to get rid of a lot of this mess if I ever
finish the changes I have in mind to represent outer-join outputs
more explicitly. That seems unlikely to happen for v14 at this
point, and it'd certainly never be back-patchable.)

One loose end is that I'm not sure how far to back-patch. The
given test case only fails back to v12. I've not bisected, but
I suspect that the difference is the v12-era changes to collapse out
trivial Result RTEs (4be058fe9 and follow-ons), such as the lateral
sub-select in this test case. The pull_varnos() calculation is
surely just as wrong for a long time before that, but perhaps it's
only a latent bug before that? I've not managed to construct a test
case that fails in v11, but I don't have a lot of confidence that
there isn't one.

Thoughts?

regards, tom lane

[1] https://www.postgresql.org/message-id/237d2b72-6dd0-7b24-3a6f-94288cd44b9c@bfw-online.de

Attachment Content-Type Size
0001-full-fix-for-pull_varnos.patch text/x-diff 47.8 KB
0002-minimal-fix-for-pull_varnos.patch text/x-diff 6.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhihong Yu 2021-01-17 02:55:20 Re: PoC/WIP: Extended statistics on expressions
Previous Message Tomas Vondra 2021-01-17 02:01:34 Re: list of extended statistics on psql