Re: Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, wjzeng <wjzeng2012(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, shawn wang <shawn(dot)wang(dot)pg(at)gmail(dot)com>, "ggysxcq(at)gmail(dot)com" <ggysxcq(at)gmail(dot)com>
Subject: Re: Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?
Date: 2021-12-11 17:09:11
Message-ID: CALNJ-vQYBoVesOwDCGSxOxD=mXM2rB0C-WjJ9pV0AVZh=KYKgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 11, 2021 at 7:31 AM 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com> wrote:

>
>
> ------------------原始邮件 ------------------
> *发件人:*Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
> *发送时间:*Wed Dec 8 11:26:35 2021
> *收件人:*曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>, shawn wang <
> shawn(dot)wang(dot)pg(at)gmail(dot)com>, ggysxcq(at)gmail(dot)com <ggysxcq(at)gmail(dot)com>,
> PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
> *抄送:*wjzeng <wjzeng2012(at)gmail(dot)com>
> *主题:*Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?
>
>> Hi,
>>
>> On 12/7/21 10:44, 曾文旌(义从) wrote:
>> > Hi Hackers
>> >
>> > For my previous proposal, I developed a prototype and passed
>> > regression testing. It works similarly to subquery's qual pushdown.
>> > We know that sublink expands at the beginning of each level of
>> > query. At this stage, The query's conditions and equivalence classes
>> > are not processed. But after generate_base_implied_equalities the
>> > conditions are processed, which is why qual can push down to
>> > subquery but sublink not.
>> >
>> > My POC implementation chose to delay the sublink expansion in the
>> > SELECT clause (targetList) and where clause. Specifically, it is
>> > delayed after generate_base_implied_equalities. Thus, the equivalent
>> > conditions already established in the Up level query can be easily
>> > obtained in the sublink expansion process (make_subplan).
>> >
>> > For example, if the up level query has a.id = 10 and the sublink
>> > query has a.id = b.id, then we get b.id = 10 and push it down to the
>> > sublink quey. If b is a partitioned table and is partitioned by id,
>> > then a large number of unrelated subpartitions are pruned out, This
>> > optimizes a significant amount of Planner and SQL execution time,
>> > especially if the partitioned table has a large number of
>> > subpartitions and is what I want.
>> >
>> > Currently, There were two SQL failures in the regression test,
>> > because the expansion order of sublink was changed, which did not
>> > affect the execution result of SQL.
>> >
>> > Look forward to your suggestions on this proposal.
>> >
>>
>> I took a quick look, and while I don't see / can't think of any problems
>> with delaying it until after generating implied equalities, there seems
>> to be a number of gaps.
>>
>> *Thank you for your attention.*
>>
>> 1) Are there any regression tests exercising this modified behavior?
>> Maybe there are, but if the only changes are due to change in order of
>> targetlist entries, that doesn't seem like a clear proof.
>>
>> It'd be good to add a couple tests exercising both the positive and
>> negative case (i.e. when we can and can't pushdown a qual).
>>
>> *I added several samples to the regress(qual_pushdown_to_sublink.sql). *
>> *and I
>> used the partition table to show the plan status of qual being pushed down into sublink.*
>>
>> *Hopefully this will help you understand the details of this patch. Later, I will add more cases.*
>>
>> 2) apparently, contrib/postgres_fdw does crash like this:
>>
>> #3 0x000000000077b412 in adjust_appendrel_attrs_mutator
>> (node=0x13f7ea0, context=0x7fffc3351b30) at appendinfo.c:470
>> 470 Assert(!IsA(node, SubLink));
>> (gdb) p node
>> $1 = (Node *) 0x13f7ea0
>> (gdb) p *node
>> $2 = {type = T_SubLink}
>>
>> Backtrace attached.
>>
>> *For the patch attached in the last email, I passed all the tests under
>> src/test/regress.*
>> *As you pointed out, there was a problem with regression under contrib(in
>> contrib/postgres_fdw). *
>> *This time I fixed it and the current patch (V2) can pass the
>> check-world.*
>>
>> 3) various parts of the patch really need at least some comments, like:
>>
>> - try_push_outer_qual_to_sublink_query really needs some docs
>>
>> - new stuff at the end of initsplan.c
>>
>> *Ok, I added some comments and will
>> add more. If you have questions about any details,*
>> *please point them out directly.*
>>
>> 4) generate_base_implied_equalities
>>
>> shouldn't this
>>
>> if (ec->ec_processed)
>> ;
>>
>> really be?
>>
>> if (ec->ec_processed)
>> continue;
>>
>> *You are right. I fixed it.*
>>
>> 5) I'm not sure why we need the new ec_processed flag.
>>
>>
>> *I did this to eliminate duplicate equalities from the two generate_base_implied_equalities calls*
>>
>> *1) I need the base equivalent expression generated after generate_base_implied_equalities,*
>> *which is used to pushdown qual to sublink(lazy_process_sublinks)*
>>
>> *2) The expansion of sublink may result in an equivalent expression with parameters, such as a = $1,*
>> *which needs to deal with the equivalence classes again.*
>>
>> *3) So, I added ec_processed and asked to process it again (generate_base_implied_equalities)*
>> *after the equivalence class changed (add_eq_member/process_equivalence).*
>>
>> *Maybe you have a better suggestion, please let me know.*
>>
>> 6) So we now have lazy_process_sublink callback? Does that mean we
>> expand sublinks in two places - sometimes lazily, sometimes not?
>>
>> *Yes, not all sublink is delayed. Let me explain this:*
>> *1) I added a GUC switch enable_lazy_process_sublink. If it is turned off, all
>> lazy process sublink will not happen,*
>>
>> *qual pushdown to sublink depend on lazy procee sublink, which means no quals will be pushed down.*
>> *2) Even if enable_lazy_process_sublink = true
>> If Query in this level contains some complex features,*
>> *sublink in this level query will not try do qual pushdown. (see function
>> query_has_sublink_try_pushdown_qual).*
>>
>> *I want to support a minimum subset first. Then consider complex features such as CTE/DML.*
>> *3) Finally, under conditions 1 and 2,
>> all kinds of sublink contained in the SELECT clause or*
>> *WHERE clause will delays expansion and try pushdown qual. The
>> sublink elsewhere in the SQL statement*
>> *does not delay process.*
>>
>> *The current status meets my requirements for
>> now. Of course, after this scheme is proved to be feasible, maybe*
>> *we can discuss that all sublinks are processed by overall delay, just like
>> qual pushdown to subquery.*
>>
>> *thanks*
>>
>> *Wenjing*
>>
>>
>>
>> regards
>>
>> --
>> Tomas Vondra
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
> Hi,

+ /* The outer var could exist in any of the upper-level queries to
find these roots */

to find these roots -> so find these roots

+ if (has_unexpand_sublink(root) && checkExprHasSubLink(node))

has_unexpand_sublink -> has_unexpanded_sublink

+ if (enable_lazy_process_sublink)
+ return true;

The above can be simplified to:

return enable_lazy_process_sublink;

+ if (checkExprHasSubLink(qual))
+ {
+ qual = lazy_process_sublink_qual(root, qual);
+ newquals = lappend(newquals, qual);
+ }
+ else
+ newquals = lappend(newquals, qual);

Since the lappend() is common to both branches, you can remove the else
clause. In the if block, only call lazy_process_sublink_qual().

+ /* under lazy process sublink, parent root may have some data that
child not need, so set it to NULL */
+ subroot->join_info_list = NIL;

minor correction to the comment above:
under lazy process sublink, parent root may have some data that child
does not need, so set it to NIL

+void
+preprocess_qual_conditions(PlannerInfo *root, Node *jtnode, bool istop)

Please add a comment explaining the meaning of istop.

+ if (istop)
+ f->quals = preprocess_expression_ext(root, f->quals,
EXPRKIND_QUAL, false);
+ else
+ f->quals = preprocess_expression(root, f->quals, EXPRKIND_QUAL);

I think the code would be more readable if you replace
the preprocess_expression() call in else branch with call
to preprocess_expression_ext().

+ context->root->unexpand_sublink_counter++;

unexpand_sublink_counter -> unexpanded_sublink_counter++

For sublink_query_push_qual(), the return at the end is not needed.

For condition_is_safe_pushdown_to_sublink, you can initialize context this
way :

+ equal_expr_info_context context = {0};

+ if (cvar && cvar->varattno > 0 && equal(cvar, var))
+ return true;

The last few lines of condition_is_safe_pushdown_to_sublink() can be
written as:

return cvar && cvar->varattno > 0 && equal(cvar, var);

+ if (equal_expr_safety_check(node, &context))
+ {
+ /* It needs to be something like outer var = inner var */
+ if (context.inner_var &&

The nested if blocks can be merged into one if block.

Cheers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-12-11 17:52:29 Re: Unifying VACUUM VERBOSE and log_autovacuum_min_duration output
Previous Message 曾文旌 (义从) 2021-12-11 15:30:40 回复:Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?