Re: Oversight in reparameterize_path_by_child leading to executor crash

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Oversight in reparameterize_path_by_child leading to executor crash
Date: 2023-08-08 07:14:10
Message-ID: CAMbWs4-pXDTs_w+LMEmGUCYSAGxMRwdndaHiRSZHRA72qqzRBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 7, 2023 at 9:29 PM Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
wrote:

> Thanks. The patch looks good overall. I like it because of its potential
> to reduce memory consumption in reparameterization. That's cake on top of
> cream :)
>

Thanks for reviewing this patch!

> Some comments here.
>
> + {
> + FLAT_COPY_PATH(new_path, path, Path);
> +
> + if (path->pathtype == T_SampleScan)
> + {
> + Index scan_relid = path->parent->relid;
> + RangeTblEntry *rte;
> +
> + /* it should be a base rel with a tablesample clause... */
> + Assert(scan_relid > 0);
> + rte = planner_rt_fetch(scan_relid, root);
> + Assert(rte->rtekind == RTE_RELATION);
> + Assert(rte->tablesample != NULL);
> +
> + ADJUST_CHILD_EXPRS(rte->tablesample, TableSampleClause *);
> + }
> + }
>
> This change makes this function usable only after the final plan has been
> selected. If some code accidently uses it earlier, it would corrupt
> rte->tablesample without getting detected easily. I think we should mention
> this in the function prologue and move it somewhere else. Other option is
> to a.
> leave rte->tablesample unchanged here with a comment as to why we aren't
> changing it b. reparameterize tablesample expression in
> create_samplescan_plan() if the path is parameterized by the parent. We
> call
> reparameterize_path_by_child() in create_nestloop_plan() as this patch does
> right now. I like that we are delaying reparameterization. It saves a
> bunch of
> memory. I haven't measured it but from my recent experiments I know it
> will be
> a lot.
>

I agree that we should mention in the function's comment that
reparameterize_path_by_child can only be used after the best path has
been selected because the RTE might be modified by this function. I'm
not sure if it's a good idea to move the reparameterization of
tablesample to create_samplescan_plan(). That would make the
reparameterization work separated in different places. And in the
future we may find more cases where we need modify RTEs or RELs for
reparameterization. It seems better to me that we keep all the work in
one place.

> * If the inner path is parameterized, it is parameterized by the
> - * topmost parent of the outer rel, not the outer rel itself. Fix
> - * that.
> + * topmost parent of the outer rel, not the outer rel itself. We will
>
> There's something wrong with the original sentence (which was probably
> written
> by me back then :)). I think it should have read "If the inner path is
> parameterized by the topmost parent of the outer rel instead of the outer
> rel
> itself, fix that.". If you agree, the new comment should change it
> accordingly.
>

Right. Will do that.

> @@ -2430,6 +2430,16 @@ create_nestloop_path(PlannerInfo *root,
> {
> NestPath *pathnode = makeNode(NestPath);
> Relids inner_req_outer = PATH_REQ_OUTER(inner_path);
> + Relids outerrelids;
> +
> + /*
> + * Paths are parameterized by top-level parents, so run parameterization
> + * tests on the parent relids.
> + */
> + if (outer_path->parent->top_parent_relids)
> + outerrelids = outer_path->parent->top_parent_relids;
> + else
> + outerrelids = outer_path->parent->relids;
>
> This looks like an existing bug. Did partitionwise join paths ended up with
> extra restrict clauses without this fix? I am not able to see why this
> change
> is needed by rest of the changes in the patch?
>

This is not an existing bug. Our current code (without this patch)
would always call create_nestloop_path() after the reparameterization
work has been done for the inner path. So we can safely check against
outer rel (not its topmost parent) in create_nestloop_path(). But now
with this patch, the reparameterization is postponed until createplan.c,
so we have to check against the topmost parent of outer rel in
create_nestloop_path(), otherwise we'd have duplicate clauses in the
final plan. For instance, without this change you'd get a plan like

regression=# explain (costs off)
select * from prt1 t1 left join lateral
(select t1.a as t1a, t2.* from prt1 t2) s on t1.a = s.a;
QUERY PLAN
---------------------------------------------------------
Append
-> Nested Loop Left Join
Join Filter: (t1_1.a = t2_1.a)
-> Seq Scan on prt1_p1 t1_1
-> Index Scan using iprt1_p1_a on prt1_p1 t2_1
Index Cond: (a = t1_1.a)
-> Nested Loop Left Join
Join Filter: (t1_2.a = t2_2.a)
-> Seq Scan on prt1_p2 t1_2
-> Index Scan using iprt1_p2_a on prt1_p2 t2_2
Index Cond: (a = t1_2.a)
-> Nested Loop Left Join
Join Filter: (t1_3.a = t2_3.a)
-> Seq Scan on prt1_p3 t1_3
-> Index Scan using iprt1_p3_a on prt1_p3 t2_3
Index Cond: (a = t1_3.a)
(16 rows)

Note that the clauses in Join Filter are duplicate.

> Anyway, let's rename outerrelids to something else e.g.
> outer_param_relids to reflect
> its true meaning.
>

Hmm, I'm not sure this is a good idea. Here the 'outerrelids' is just
the relids of the outer rel or outer rel's topmost parent. I think it's
better to keep it as-is, and this is consistent with 'outerrelids' in
try_nestloop_path().

> +bool
> +path_is_reparameterizable_by_child(Path *path)
> +{
> + switch (nodeTag(path))
> + {
> + case T_Path:
> ... snip ...
> + case T_GatherPath:
> + return true;
> + default:
> +
> + /* We don't know how to reparameterize this path. */
> + return false;
> + }
> +
> + return false;
> +}
>
> How do we make sure that any changes to reparameterize_path_by_child() are
> reflected here? One way is to call this function from
> reparameterize_path_by_child() the first thing and return if the path can
> not
> be reparameterized.
>

Good question. But I don't think calling this function at the beginning
of reparameterize_path_by_child() can solve this problem. Even if we do
that, if we find that the path cannot be reparameterized in
reparameterize_path_by_child(), it would be too late for us to take any
measures. So we need to make sure that such situation cannot happen. I
think we can emphasize this point in the comments of the two functions,
and meanwhile add an Assert in reparameterize_path_by_child() that the
path must be reparameterizable.

> -#define ADJUST_CHILD_ATTRS(node) \
> +#define ADJUST_CHILD_EXPRS(node, fieldtype) \
> ((node) = \
> - (List *) adjust_appendrel_attrs_multilevel(root, (Node *) (node), \
> - child_rel, \
> - child_rel->top_parent))
> + (fieldtype) adjust_appendrel_attrs_multilevel(root, (Node *) (node), \
> + child_rel, \
> + child_rel->top_parent))
> +
> +#define ADJUST_CHILD_ATTRS(node) ADJUST_CHILD_EXPRS(node, List *)
>
> IIRC, ATTRS here is taken from the function being called. I think we should
> just change the macro definition, not its name. If you would like to have a
> separate macro for List nodes, create ADJUST_CHILD_ATTRS_IN_LIST or some
> such.
>

Agreed. I introduced the new macro ADJUST_CHILD_EXPRS just to keep
macro ADJUST_CHILD_ATTRS as-is, so callers to ADJUST_CHILD_ATTRS can
keep unchanged. It seems better to just change ADJUST_CHILD_ATTRS as
well as its callers. Will do that.

Attaching v3 patch to address all the reviews above.

Thanks
Richard

Attachment Content-Type Size
v3-0001-Postpone-reparameterization-of-paths-until-when-creating-plans.patch application/octet-stream 17.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-08-08 07:20:03 Re: cpluspluscheck vs ICU
Previous Message Amit Langote 2023-08-08 06:58:02 Re: initial pruning in parallel append