Re: Get rid of runtime handling of AlternativeSubPlan?

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Get rid of runtime handling of AlternativeSubPlan?
Date: 2020-09-01 00:10:35
Message-ID: CAKU4AWpZKKXG5wC8rfLYpRC9_+NRdiKVhEh4okJedoCEKs03Uw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 1, 2020 at 1:42 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> writes:
> > Thank you for this code! I still have some confusion about when a
> SubPlan
> > should be executed when a join is involved. I care about this because
> this
> > has an impact on when we can get the num_exec for a subplan.
>
> > The subplan in a target list, it is executed after the join in my case.
> > The subplan
> > can be execute after the scan of T1(see below example) and it can also be
> > executed
> > after the join. Which one is better depends on which methods make the
> > num_exec
> > smaller. Is it something we already considered?
>
> Uh, I'm not following your concern. SubPlans appearing in the join
> targetlist *must* be executed "after the join", ie only for valid
> join rows. Otherwise we could have cases where, say, they throw
> errors that should not occur.

I am feeling I'm wrong somewhere however I can't figure it out until now.

Q1: select (select t3.a from t3 where t3.c = t1.c) from t1, t2 where t1.b
= t2.b;

should equals Q2:

1. select (select t3.a from t3 where t3.c = t1.c) as a, b from t1 ==>
t13.
2. select t13.a from t13, t2 where t13.b = t2.b;

With the following data, Q1 will execute the subplan twice (since we get 2
rows
after join t1, t2). while Q2 executes the subplan once (since t1 has only
1 row).
Finally the result is the same.

postgres=# select * from t1;
a | b | c
---+---+---
1 | 1 | 1
(1 row)

postgres=# select * from t2;
a | b | c
---+---+---
1 | 1 | 1
1 | 1 | 2
(2 rows)

postgres=# select * from t3;
a | b | c
---+---+---
1 | 1 | 1
(1 row)

On the other hand, SubPlans appearing
> in the join's qual conditions have to be executed "before the join",
> although exactly what that means is fuzzy because we don't make any
> promises about the relative ordering of different qual conditions.
>
> > When the subplan is in a Qual, it is supposed to be executed as soon as
> > possible,
> > The current implementation matches the below cases. So can we say we
> > knows the num_execs of SubPlan just after we plan the dependent rels?
>
> I wouldn't say so. If the SubPlan's qual actually only depends on one
> of the input rels, distribute_qual_to_rels would have pushed it down
> further than the join. Among the quals that do have to be evaluated at the
> join,

a qual involving a SubPlan is best executed last on cost
> grounds, or so I'd guess anyway. So the number of executions is probably

less than the product of the input rel sizes. That's what motivates
> the choice of NUM_EXEC_QUAL in my patch.

Understand now. Thank you!

--
Best Regards
Andy Fan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-09-01 00:44:56 Re: Append with naive multiplexing of FDWs
Previous Message Peter Geoghegan 2020-08-31 23:01:10 Re: Boundary value check in lazy_tid_reaped()