Re: Extend more usecase for planning time partition pruning and init partition pruning.

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Extend more usecase for planning time partition pruning and init partition pruning.
Date: 2021-03-05 23:02:20
Message-ID: CAKU4AWqR2hXzQRx1h7ETaAUv2SiEj6nLC6N4fP=SOsvoWszXdg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Amit:
Thanks for your review!

On Thu, Mar 4, 2021 at 5:07 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:

> Hi Andy,
>
> On Sun, Jan 24, 2021 at 7:34 PM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
> > I recently found a use case like this. SELECT * FROM p, q WHERE
> p.partkey =
> > q.colx AND (q.colx = $1 OR q.colx = $2); Then we can't do either
> planning time
> > partition prune or init partition prune. Even though we have run-time
> > partition pruning work at last, it is too late in some cases since we
> have
> > to init all the plan nodes in advance. In my case, there are 10+
> > partitioned relation in one query and the execution time is short, so
> the
> > init plan a lot of plan nodes cares a lot.
> >
> > The attached patches fix this issue. It just get the "p.partkey = q.colx"
> > case in root->eq_classes or rel->joinlist (outer join), and then check
> if there
> > is some baserestrictinfo in another relation which can be used for
> partition
> > pruning. To make the things easier, both partkey and colx must be Var
> > expression in implementation.
> >
> > - v1-0001-Make-some-static-functions-as-extern-and-extend-C.patch
> >
> > Just some existing refactoring and extending ChangeVarNodes to be able
> > to change var->attno.
> >
> > - v1-0002-Build-some-implied-pruning-quals-to-extend-the-us.patch
>
> IIUC, your proposal is to transpose the "q.b in (1, 2)" in the
> following query as "p.a in (1, 2)" and pass it down as a pruning qual
> for p:
>
> select * from p, q where p.a = q.b and q.b in (1, 2);
>
> or "(q.b = 1 or q.b = 2)" in the following query as "(p.a = 1 or p.a = 2)":
>
> select * from p, q where p.a = q.b and (q.b = 1 or q.b = 2);
>
>
Yes, you understand me correctly.

> While that transposition sounds *roughly* valid, I have some questions
> about the approach:
>
> * If the transposed quals are assumed valid to use for partition
> pruning, could they also not be used by, say, the surviving
> partitions' index scan paths? So, perhaps, it doesn't seem right that
> partprune.c builds the clauses on-the-fly for pruning and dump them
> once done.
>
> * On that last part, I wonder if partprune.c isn't the wrong place to
> determine that "q.b in (1, 2)" and "p.a in (1, 2)" are in fact
> equivalent. That sort of thing is normally done in the phase of
> planning when distribute_qual_to_rels() runs and any equivalences
> found stored in PlannerInfo.eq_classes. Have you investigated why the
> process_ machinery doesn't support working with ScalarArrayOpExpr and
> BoolExpr to begin with?
>
> * Or maybe have you considered generalizing what
> build_implied_pruning_quals() does so that other places like
> indxpath.c can use the facility?
>
>
Actually at the beginning of this work, I do think I should put the implied
quals to baserestictinfo in the distribute_qual_for_rels stage. That
probably
can fix all the issues you reported. However that probably more complex
than what I did with more risks and I have a very limited timeline to handle
the real custom issue, so I choose this strategy. But it is the time to
re-think
the baserestrictinfo way now. I will spend some time in this direction,
Thank you
for this kind of push-up:) I just checked this stuff on Oracle, Oracle
does use
this strategy.

SQL> explain plan for select * from t1, t2 where t1.a = t2.a and t1.a > 2;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1838229974

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | 4 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 1 | 52 | 4 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| T1 | 1 | 26 | 2 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| T2 | 1 | 26 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("T1"."A"="T2"."A")

* 2 - filter("T1"."A">2) 3 - filter("T2"."A">2)*

17 rows selected.

postgres=# explain (costs off) select * from t1, t2 where t1.a = t2.a and
t1.a > 2;
QUERY PLAN
-------------------------------
Merge Join
Merge Cond: (t1.a = t2.a)
-> Sort
Sort Key: t1.a
-> Seq Scan on t1
Filter: (a > 2)
-> Sort
Sort Key: t2.a
-> Seq Scan on t2
(9 rows)

--
Best Regards
Andy Fan (https://www.aliyun.com/)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2021-03-05 23:10:52 Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Previous Message Andy Fan 2021-03-05 22:45:11 Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)