Re: Oversight in reparameterize_path_by_child leading to executor crash

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Oversight in reparameterize_path_by_child leading to executor crash
Date: 2023-08-23 16:15:44
Message-ID: CAExHW5vJpm4t9JU+6FF6TcqM3Z3D2SHcQerabtm_wBhUhQTmoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 23, 2023 at 11:07 AM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
>
>
> On Tue, Aug 22, 2023 at 10:39 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>
>> Richard Guo <guofenglinux(at)gmail(dot)com> writes:
>> > I'm wondering if we can instead adjust the 'inner_req_outer' in
>> > create_nestloop_path before we perform the check to work around this
>> > issue for the back branches, something like
>> > + /*
>> > + * Adjust the parameterization information, which refers to the topmost
>> > + * parent.
>> > + */
>> > + inner_req_outer =
>> > + adjust_child_relids_multilevel(root, inner_req_outer,
>> > + outer_path->parent->relids,
>> > + outer_path->parent->top_parent_relids);
>>
>> Mmm ... at the very least you'd need to not do that when top_parent_relids
>> is unset.
>
>
> Hmm, adjust_child_relids_multilevel would just return the given relids
> unchanged when top_parent_relids is unset, so I think it should be safe
> to call it even for non-other rel.
>
>>
>> ... But I think the risk/reward ratio for messing with this in the
>> back branches is unattractive in any case: to fix a corner case that
>> apparently nobody uses in the field, we risk breaking any number of
>> mainstream parameterized-path cases. I'm content to commit the v5 patch
>> (or a successor) into HEAD, but at this point I'm not sure I even want
>> to risk it in v16, let alone perform delicate surgery to get it to work
>> in older branches. I think we ought to go with the "tablesample scans
>> can't be reparameterized" approach in v16 and before.
>
>
> If we go with the "tablesample scans can't be reparameterized" approach
> in the back branches, I'm a little concerned that what if we find more
> cases in the futrue where we need modify RTEs for reparameterization.
> So I spent some time seeking and have managed to find one: there might
> be lateral references in a scan path's restriction clauses, and
> currently reparameterize_path_by_child fails to adjust them.
>
> regression=# explain (verbose, costs off)
> select count(*) from prt1 t1 left join lateral
> (select t1.b as t1b, t2.* from prt2 t2) s
> on t1.a = s.b where s.t1b = s.a;
> QUERY PLAN
> ----------------------------------------------------------------------
> Aggregate
> Output: count(*)
> -> Append
> -> Nested Loop
> -> Seq Scan on public.prt1_p1 t1_1
> Output: t1_1.a, t1_1.b
> -> Index Scan using iprt2_p1_b on public.prt2_p1 t2_1
> Output: t2_1.b
> Index Cond: (t2_1.b = t1_1.a)
> Filter: (t1.b = t2_1.a)
> -> Nested Loop
> -> Seq Scan on public.prt1_p2 t1_2
> Output: t1_2.a, t1_2.b
> -> Index Scan using iprt2_p2_b on public.prt2_p2 t2_2
> Output: t2_2.b
> Index Cond: (t2_2.b = t1_2.a)
> Filter: (t1.b = t2_2.a)
> -> Nested Loop
> -> Seq Scan on public.prt1_p3 t1_3
> Output: t1_3.a, t1_3.b
> -> Index Scan using iprt2_p3_b on public.prt2_p3 t2_3
> Output: t2_3.b
> Index Cond: (t2_3.b = t1_3.a)
> Filter: (t1.b = t2_3.a)
> (24 rows)
>
> The clauses in 'Filter' are not adjusted correctly. Var 't1.b' still
> refers to the top parent.
>
> For this query it would just give wrong results.
>
> regression=# set enable_partitionwise_join to off;
> SET
> regression=# select count(*) from prt1 t1 left join lateral
> (select t1.b as t1b, t2.* from prt2 t2) s
> on t1.a = s.b where s.t1b = s.a;
> count
> -------
> 100
> (1 row)
>
> regression=# set enable_partitionwise_join to on;
> SET
> regression=# select count(*) from prt1 t1 left join lateral
> (select t1.b as t1b, t2.* from prt2 t2) s
> on t1.a = s.b where s.t1b = s.a;
> count
> -------
> 5
> (1 row)
>
>
> For another query it would error out during planning.
>
> regression=# explain (verbose, costs off)
> select count(*) from prt1 t1 left join lateral
> (select t1.b as t1b, t2.* from prt2 t2) s
> on t1.a = s.b where s.t1b = s.b;
> ERROR: variable not found in subplan target list
>
> To fix it we may need to modify RelOptInfos for Path, BitmapHeapPath,
> ForeignPath and CustomPath, and modify IndexOptInfos for IndexPath. It
> seems that that is not easily done without postponing reparameterization
> of paths until createplan.c.

Maybe I am missing something here but why aren't we copying these
restrictinfos in the path somewhere? I have a similar question for
tablesample clause as well.

--
Best Wishes,
Ashutosh Bapat

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-08-23 16:35:20 Re: add timing information to pg_upgrade
Previous Message Yugo NAGATA 2023-08-23 15:16:23 Re: pgbench: allow to exit immediately when any client is aborted