Re: Check lateral references within PHVs for memoize cache keys

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Check lateral references within PHVs for memoize cache keys
Date: 2023-07-13 07:12:29
Message-ID: CAMbWs49+Cjoy0S0xkCRDcHXGHvsYLOdvr9jq9OTONOBnsgzXOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jul 9, 2023 at 1:28 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> > Rebase the patch on HEAD as cfbot reminds.
>
> This fix seems like a mess. The function that is in charge of filling
> RelOptInfo.lateral_vars is extract_lateral_references; or at least
> that was how it was done up to now. Can't we compute these additional
> references there? If not, maybe we ought to just merge
> extract_lateral_references into create_lateral_join_info, rather than
> having the responsibility split. I also wonder whether this change
> isn't creating hidden dependencies on RTE order (which would likely be
> bugs), since create_lateral_join_info itself examines the lateral_vars
> lists as it walks the rtable.

Yeah, you're right. Currently all RelOptInfo.lateral_vars are filled in
extract_lateral_references. And then in create_lateral_join_info these
lateral_vars, together with all PlaceHolderInfos, are examined to
compute the per-base-relation lateral refs relids. However, in the
process of extract_lateral_references, it's possible that we create new
PlaceHolderInfos. So I think it may not be a good idea to extract the
lateral references within PHVs there. But I agree with you that it's
also not a good idea to compute these additional lateral Vars within
PHVs in create_lateral_join_info as the patch does. Actually with the
patch I find that with PHVs that are due to be evaluated at a join we
may get a problematic plan. For instance

explain (costs off)
select * from t t1 left join lateral
(select t1.a as t1a, t2.a as t2a from t t2 join t t3 on true) s on true
where s.t1a = s.t2a;
QUERY PLAN
------------------------------------
Nested Loop
-> Seq Scan on t t1
-> Nested Loop
Join Filter: (t1.a = t2.a)
-> Seq Scan on t t2
-> Memoize
Cache Key: t1.a
Cache Mode: binary
-> Seq Scan on t t3
(9 rows)

There are no lateral refs in the subtree of the Memoize node, so it
should be a Materialize node rather than a Memoize node. This is caused
by that for a PHV that is due to be evaluated at a join, we fill its
lateral refs in each baserel in the join, which is wrong.

So I'm wondering if it'd be better that we move all this logic of
computing additional lateral references within PHVs to get_memoize_path,
where we can examine only PHVs that are evaluated at innerrel. And
considering that these lateral refs are only used by Memoize, it seems
more sensible to compute them there. But I'm a little worried that
doing this would make get_memoize_path too expensive.

Please see v4 patch for this change.

Thanks
Richard

Attachment Content-Type Size
v4-0001-Check-lateral-refs-within-PHVs-for-memoize-cache-keys.patch application/octet-stream 11.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message フブキダイスキ 2023-07-13 07:17:17 \di+ cannot show the same name indexes
Previous Message Peter Eisentraut 2023-07-13 07:07:15 Re: Consistent coding for the naming of LR workers