Re: Oversight in reparameterize_path_by_child leading to executor crash

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Oversight in reparameterize_path_by_child leading to executor crash
Date: 2023-08-23 05:37:46
Message-ID: CAMbWs4-CSR4VnZCDep3ReSoHGTA7E+3tnjF_LmHcX7yiGrkVfQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 22, 2023 at 10:39 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> > I'm wondering if we can instead adjust the 'inner_req_outer' in
> > create_nestloop_path before we perform the check to work around this
> > issue for the back branches, something like
> > + /*
> > + * Adjust the parameterization information, which refers to the
> topmost
> > + * parent.
> > + */
> > + inner_req_outer =
> > + adjust_child_relids_multilevel(root, inner_req_outer,
> > + outer_path->parent->relids,
> > +
> outer_path->parent->top_parent_relids);
>
> Mmm ... at the very least you'd need to not do that when top_parent_relids
> is unset.

Hmm, adjust_child_relids_multilevel would just return the given relids
unchanged when top_parent_relids is unset, so I think it should be safe
to call it even for non-other rel.

> ... But I think the risk/reward ratio for messing with this in the
> back branches is unattractive in any case: to fix a corner case that
> apparently nobody uses in the field, we risk breaking any number of
> mainstream parameterized-path cases. I'm content to commit the v5 patch
> (or a successor) into HEAD, but at this point I'm not sure I even want
> to risk it in v16, let alone perform delicate surgery to get it to work
> in older branches. I think we ought to go with the "tablesample scans
> can't be reparameterized" approach in v16 and before.

If we go with the "tablesample scans can't be reparameterized" approach
in the back branches, I'm a little concerned that what if we find more
cases in the futrue where we need modify RTEs for reparameterization.
So I spent some time seeking and have managed to find one: there might
be lateral references in a scan path's restriction clauses, and
currently reparameterize_path_by_child fails to adjust them.

regression=# explain (verbose, costs off)
select count(*) from prt1 t1 left join lateral
(select t1.b as t1b, t2.* from prt2 t2) s
on t1.a = s.b where s.t1b = s.a;
QUERY PLAN
----------------------------------------------------------------------
Aggregate
Output: count(*)
-> Append
-> Nested Loop
-> Seq Scan on public.prt1_p1 t1_1
Output: t1_1.a, t1_1.b
-> Index Scan using iprt2_p1_b on public.prt2_p1 t2_1
Output: t2_1.b
Index Cond: (t2_1.b = t1_1.a)
Filter: (t1.b = t2_1.a)
-> Nested Loop
-> Seq Scan on public.prt1_p2 t1_2
Output: t1_2.a, t1_2.b
-> Index Scan using iprt2_p2_b on public.prt2_p2 t2_2
Output: t2_2.b
Index Cond: (t2_2.b = t1_2.a)
Filter: (t1.b = t2_2.a)
-> Nested Loop
-> Seq Scan on public.prt1_p3 t1_3
Output: t1_3.a, t1_3.b
-> Index Scan using iprt2_p3_b on public.prt2_p3 t2_3
Output: t2_3.b
Index Cond: (t2_3.b = t1_3.a)
Filter: (t1.b = t2_3.a)
(24 rows)

The clauses in 'Filter' are not adjusted correctly. Var 't1.b' still
refers to the top parent.

For this query it would just give wrong results.

regression=# set enable_partitionwise_join to off;
SET
regression=# select count(*) from prt1 t1 left join lateral
(select t1.b as t1b, t2.* from prt2 t2) s
on t1.a = s.b where s.t1b = s.a;
count
-------
100
(1 row)

regression=# set enable_partitionwise_join to on;
SET
regression=# select count(*) from prt1 t1 left join lateral
(select t1.b as t1b, t2.* from prt2 t2) s
on t1.a = s.b where s.t1b = s.a;
count
-------
5
(1 row)

For another query it would error out during planning.

regression=# explain (verbose, costs off)
select count(*) from prt1 t1 left join lateral
(select t1.b as t1b, t2.* from prt2 t2) s
on t1.a = s.b where s.t1b = s.b;
ERROR: variable not found in subplan target list

To fix it we may need to modify RelOptInfos for Path, BitmapHeapPath,
ForeignPath and CustomPath, and modify IndexOptInfos for IndexPath. It
seems that that is not easily done without postponing reparameterization
of paths until createplan.c.

Attached is a patch which is v5 + fix for this new issue.

Thanks
Richard

Attachment Content-Type Size
v7-0001-Postpone-reparameterization-of-paths-until-when-creating-plans.patch application/octet-stream 21.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2023-08-23 05:38:02 Re: Synchronizing slots from primary to standby
Previous Message vignesh C 2023-08-23 05:30:11 Re: persist logical slots to disk during shutdown checkpoint