Re: Missing MaterialPath support in reparameterize_path_by_child

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Missing MaterialPath support in reparameterize_path_by_child
Date: 2022-12-02 12:49:40
Message-ID: CAMbWs4_0Rk3QN9bUWCvKF7S_OMzg0jsE5OsNM7KZZ5NkO5trMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 2, 2022 at 7:21 PM Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
wrote:

> > I'm suspicious now that reparameterize_path() should be
> > extended likewise, but I don't really have any hard
> > evidence for that.
>
> I think we need it there since the scope of paths under appendrel has
> certainly expanded a lot because of partitioned table optimizations.

I tried to see if the similar error can be triggered because of the lack
of MaterialPath support in reparameterize_path but didn't succeed.
Instead I see the optimization opportunity here if we can extend
reparameterize_path. As an example, consider query

create table t (a int, b int);
insert into t select i, i from generate_series(1,10000)i;
create index on t(a);
analyze t;

explain (costs off)
select * from (select * from t t1 union all select * from t t2 TABLESAMPLE
system_time (10)) s join (select * from t t3 limit 1) ss on s.a > ss.a;

Currently parameterized append path is not possible because MaterialPath
is not supported in reparameterize_path. The current plan looks like

QUERY PLAN
--------------------------------------------------------------------
Nested Loop
Join Filter: (t1.a > t3.a)
-> Limit
-> Seq Scan on t t3
-> Append
-> Seq Scan on t t1
-> Materialize
-> Sample Scan on t t2
Sampling: system_time ('10'::double precision)
(9 rows)

If we extend reparameterize_path to support MaterialPath, we would have
the additional parameterized append path and generate a better plan as
below

QUERY PLAN
--------------------------------------------------------------------
Nested Loop
-> Limit
-> Seq Scan on t t3
-> Append
-> Index Scan using t_a_idx on t t1
Index Cond: (a > t3.a)
-> Materialize
-> Sample Scan on t t2
Sampling: system_time ('10'::double precision)
Filter: (a > t3.a)
(10 rows)

So I also agree it's worth doing.

BTW, the code changes I'm using:

--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -3979,6 +3979,17 @@ reparameterize_path(PlannerInfo *root, Path *path,
apath->path.parallel_aware,
-1);
}
+ case T_Material:
+ {
+ MaterialPath *matpath = (MaterialPath *) path;
+ Path *spath = matpath->subpath;
+
+ spath = reparameterize_path(root, spath,
+ required_outer,
+ loop_count);
+
+ return (Path *) create_material_path(rel, spath);
+ }

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ronan Dunklau 2022-12-02 12:58:27 Re: Fix gin index cost estimation
Previous Message Andrew Dunstan 2022-12-02 12:37:59 Re: Using AF_UNIX sockets always for tests on Windows