Re: Get rid of runtime handling of AlternativeSubPlan?

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Get rid of runtime handling of AlternativeSubPlan?
Date: 2020-08-31 00:23:53
Message-ID: CAKU4AWox8HiWMMLR7TjX80+KDfm12b1zJWfNcoSioN9+QvORWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Aug 30, 2020 at 7:26 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I wrote:
> > Back in bd3daddaf232d95b0c9ba6f99b0170a0147dd8af, which introduced
> > AlternativeSubPlans, I wrote:
> > There is a lot more that could be done based on this infrastructure: in
> > particular it's interesting to consider switching to the hash plan if
> we start
> > out using the non-hashed plan but find a lot more upper rows going by
> than we
> > expected. I have therefore left some minor inefficiencies in place,
> such as
> > initializing both subplans even though we will currently only use one.
> >
> > That commit will be twelve years old come August, and nobody has either
> > built anything else atop it or shown any interest in making the plan
> choice
> > switchable mid-run. So it seems like kind of a failed experiment.
> >
> > Therefore, I'm considering the idea of ripping out all executor support
> > for AlternativeSubPlan and instead having the planner replace an
> > AlternativeSubPlan with the desired specific SubPlan somewhere late in
> > planning, possibly setrefs.c.
>
> Here's a proposed patchset for that. This runs with the idea I'd had
> that setrefs.c could be smarter than the executor about which plan node
> subexpressions will be executed how many times. I did not take it very
> far, for fear of adding an undue number of planning cycles, but it's still
> better than what we have now.
>
> For ease of review, 0001 adds the new planner logic, while 0002 removes
> the now-dead executor support.
>
> There's one bit of dead code that I left in place for the moment, which is
> ruleutils.c's support for printing AlternativeSubPlans. I'm not sure if
> that's worth keeping or not --- it's dead code for normal use, but if
> someone tried to use ruleutils.c to print partially-planned expression
> trees, maybe there'd be a use for it?
>
> (It's also arguable that readfuncs.c's support is now dead code, but
> I have little interest in stripping that out.)
>
> regards, tom lane
>
>
Thank you for this code! I still have some confusion about when a SubPlan
should be executed when a join is involved. I care about this because this
has an impact on when we can get the num_exec for a subplan.

The subplan in a target list, it is executed after the join in my case.
The subplan
can be execute after the scan of T1(see below example) and it can also be
executed
after the join. Which one is better depends on which methods make the
num_exec
smaller. Is it something we already considered? I drill-down to
populate_joinrel_with_paths and not find this logic.

# explain (costs off) select (select a from t2 where t2.b = t1.b) from t1,
t3;
QUERY PLAN
------------------------------
Nested Loop
-> Seq Scan on t1
-> Materialize
-> Seq Scan on t3
SubPlan 1
-> Seq Scan on t2
Filter: (b = t1.b)
(7 rows)

When the subplan is in a Qual, it is supposed to be executed as soon as
possible,
The current implementation matches the below cases. So can we say we
knows the num_execs of SubPlan just after we plan the dependent rels?
(In Q1 below the dependent rel is t1 vs t3, in Q2 it is t1 only) If we can
choose
a subplan and recost the related path during(not after) creating the best
path, will
we get better results for some cases (due to the current cost method for
AlternativeSubPlan[1])?

-- the subplan depends on the result of t1 join t3
# explain (costs off) select t1.* from t1, t3 where
t1.a > (select max(a) from t2 where t2.b = t1.b and t2.c = t3.c);
QUERY PLAN
-----------------------------------------------------
Nested Loop
Join Filter: (t1.a > (SubPlan 1))
-> Seq Scan on t1
-> Materialize
-> Seq Scan on t3
SubPlan 1
-> Aggregate
-> Seq Scan on t2
Filter: ((b = t1.b) AND (c = t3.c))
(9 rows)

-- the subplan only depends on t1.
# explain (costs off) select t1.* from t1, t3 where
t1.a > (select max(a) from t2 where t2.b = t1.b);
QUERY PLAN
------------------------------------------------
Nested Loop
-> Seq Scan on t3
-> Materialize
-> Seq Scan on t1
Filter: (a > (SubPlan 1))
SubPlan 1
-> Aggregate
-> Seq Scan on t2
Filter: (b = t1.b)
(9 rows)

At last, I want to use the commonly used table
in src/test/regress/sql/create_table.sql
when providing an example, but I always have issues running the
create_table.sql which
makes me uncomfortable to use that. Am I missing something?

CREATE TABLE fail_part PARTITION OF range_parted3 FOR VALUES FROM (1,
minvalue) TO (1, maxvalue);
psql:src/test/regress/sql/create_table.sql:611: ERROR: partition
"fail_part" would overlap partition "part10"

CREATE TABLE fail_part PARTITION OF hash_parted2 FOR VALUES WITH (MODULUS
2, REMAINDER 1);
psql:src/test/regress/sql/create_table.sql:622: ERROR: partition
"fail_part" would overlap partition "h2part_4"

[1]
https://www.postgresql.org/message-id/07b3fa88-aa4e-2e13-423d-8389eb1712cf%40imap.cc

--
Best Regards
Andy Fan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2020-08-31 00:51:20 Re: Terminate the idle sessions
Previous Message Tatsuro Yamada 2020-08-30 23:56:52 Re: list of extended statistics on psql