From: | Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> |
---|---|
To: | Amit Langote <amitlangote09(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Extend more usecase for planning time partition pruning and init partition pruning. |
Date: | 2021-03-05 23:02:20 |
Message-ID: | CAKU4AWqR2hXzQRx1h7ETaAUv2SiEj6nLC6N4fP=SOsvoWszXdg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Amit:
Thanks for your review!
On Thu, Mar 4, 2021 at 5:07 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> Hi Andy,
>
> On Sun, Jan 24, 2021 at 7:34 PM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
> > I recently found a use case like this. SELECT * FROM p, q WHERE
> p.partkey =
> > q.colx AND (q.colx = $1 OR q.colx = $2); Then we can't do either
> planning time
> > partition prune or init partition prune. Even though we have run-time
> > partition pruning work at last, it is too late in some cases since we
> have
> > to init all the plan nodes in advance. In my case, there are 10+
> > partitioned relation in one query and the execution time is short, so
> the
> > init plan a lot of plan nodes cares a lot.
> >
> > The attached patches fix this issue. It just get the "p.partkey = q.colx"
> > case in root->eq_classes or rel->joinlist (outer join), and then check
> if there
> > is some baserestrictinfo in another relation which can be used for
> partition
> > pruning. To make the things easier, both partkey and colx must be Var
> > expression in implementation.
> >
> > - v1-0001-Make-some-static-functions-as-extern-and-extend-C.patch
> >
> > Just some existing refactoring and extending ChangeVarNodes to be able
> > to change var->attno.
> >
> > - v1-0002-Build-some-implied-pruning-quals-to-extend-the-us.patch
>
> IIUC, your proposal is to transpose the "q.b in (1, 2)" in the
> following query as "p.a in (1, 2)" and pass it down as a pruning qual
> for p:
>
> select * from p, q where p.a = q.b and q.b in (1, 2);
>
> or "(q.b = 1 or q.b = 2)" in the following query as "(p.a = 1 or p.a = 2)":
>
> select * from p, q where p.a = q.b and (q.b = 1 or q.b = 2);
>
>
Yes, you understand me correctly.
> While that transposition sounds *roughly* valid, I have some questions
> about the approach:
>
> * If the transposed quals are assumed valid to use for partition
> pruning, could they also not be used by, say, the surviving
> partitions' index scan paths? So, perhaps, it doesn't seem right that
> partprune.c builds the clauses on-the-fly for pruning and dump them
> once done.
>
> * On that last part, I wonder if partprune.c isn't the wrong place to
> determine that "q.b in (1, 2)" and "p.a in (1, 2)" are in fact
> equivalent. That sort of thing is normally done in the phase of
> planning when distribute_qual_to_rels() runs and any equivalences
> found stored in PlannerInfo.eq_classes. Have you investigated why the
> process_ machinery doesn't support working with ScalarArrayOpExpr and
> BoolExpr to begin with?
>
> * Or maybe have you considered generalizing what
> build_implied_pruning_quals() does so that other places like
> indxpath.c can use the facility?
>
>
Actually at the beginning of this work, I do think I should put the implied
quals to baserestictinfo in the distribute_qual_for_rels stage. That
probably
can fix all the issues you reported. However that probably more complex
than what I did with more risks and I have a very limited timeline to handle
the real custom issue, so I choose this strategy. But it is the time to
re-think
the baserestrictinfo way now. I will spend some time in this direction,
Thank you
for this kind of push-up:) I just checked this stuff on Oracle, Oracle
does use
this strategy.
SQL> explain plan for select * from t1, t2 where t1.a = t2.a and t1.a > 2;
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1838229974
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | 4 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 1 | 52 | 4 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| T1 | 1 | 26 | 2 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| T2 | 1 | 26 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."A"="T2"."A")
* 2 - filter("T1"."A">2) 3 - filter("T2"."A">2)*
17 rows selected.
postgres=# explain (costs off) select * from t1, t2 where t1.a = t2.a and
t1.a > 2;
QUERY PLAN
-------------------------------
Merge Join
Merge Cond: (t1.a = t2.a)
-> Sort
Sort Key: t1.a
-> Seq Scan on t1
Filter: (a > 2)
-> Sort
Sort Key: t2.a
-> Seq Scan on t2
(9 rows)
--
Best Regards
Andy Fan (https://www.aliyun.com/)
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2021-03-05 23:10:52 | Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier? |
Previous Message | Andy Fan | 2021-03-05 22:45:11 | Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series) |