Re: Aggregate Push Down - Performing aggregation on foreign server

From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Aggregate Push Down - Performing aggregation on foreign server
Date: 2016-10-19 09:36:44
Message-ID: CAM2+6=Vx+a9ub8W0QX-Neziwa86De25qZwrgf_jjTkJFR0qqng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 12, 2016 at 3:38 PM, Ashutosh Bapat <
ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:

>
> >> I don't think add_foreign_grouping_path() is the right place to change
> >> a structure managed by the core and esp when we are half-way adding
> >> paths. An FDW should not meddle with an upper RelOptInfo. Since
> >> create_foreignscan_plan() picks those up from RelOptInfo and both of
> >> those are part of core, we need a place in core to set the
> >> RelOptInfo::relids for an upper relation OR we have stop using
> >> fs_relids for upper relation foreign scans.
> >
> >
> > Yes, agree that relids must be set by the core rather than a fdw.
> > However I don't want to touch the core for this patch and also not
> > exactly sure where we can do that. I think, for this patch, we can
> > copy relids into grouped_rel in create_grouping_paths() at place
> > where we assign fdwroutine, userid etc. from the input relation.
>
> I don't think this is acceptable since it changes the search key for
> an upper without going through fetch_upper_rel(). Rest of the code,
> which tries to find this upper rel with relids = NULL, will not be
> able to find it anymore. I have started a discussion on this topic
> [1]. One possible solution discussed there is to use all_baserels for
> an upper rel. But that solution looks temporary. The discussion has
> not yet concluded.
>
> In case we decide not to have any relids in the upper relation, we
> will have to obtain those from the scanrel in the context, which is
> not being used right now.
>
> Also there are places in code where we check reloptkind and assume it
> to be either JOINREL or BASEREL e.g. deparseLockingClause. Those need
> to be examined again since now we handle an upper relation. As
> discussed before, we might user bms_num_members() on relids to decide
> whether the underlying scan is join or base relation.
>
>
Done.
Using scan rel's relids in the context and checking for any foreign var
reference. Added Relids member in foreign_glob_cxt and scanrel in
deparse_expr_cxt to accomplish this. Removed logic of copying relids
from scanrel to grouped_rel. In create_foreignscan_plan(), assigned
all_baserels to fs_relids for upper relations per one of the suggestion
mentioned in the said email discussion.

>
> Ok.
> In this function
> + /* Must force parens for other expressions */
> Please explain why we need parenthesis.
>
>
I have observed that in deparse.c we do add parenthesis around
expressions, unlike to PRETTY_PAREN handling in ruleutils.c.
Thus we need parenthesis here too. However changed this comments
per other comment in the file.

>
> Since this patch adds showtype argument which is present in
> get_const_expr(), it's clear that we haven't really kept those two
> functions in sync, even though the comment says so. It's kind of ugly
> to see those hardcoded values. But you have a point. Let's see what
> committer says.
>

Sure, no issues.

>
> >
> > ExecInitForeignScan() while determining scan tuple type from passed
> > fdw_scan_tlist, has target list containing Vars with varno set to
> > INDEX_VAR. Due to which while deparsing we go through
> > resolve_special_varno() and get_special_variable() function which
> > forces parenthesis around the expression.
> > I can't think of any easy and quick solution here. So keeping this
> > as is. Input will be welcome or this can be done separately later.
> >
>
> I guess, you can decide whether or not to add paranthesis by looking
> at the corresponding namespace in the deparse_context. If the
> planstate there is ForeignScanState, you may skip the paranthesis if
> istoplevel is true. Can you please check if this works?
>

I have tried it. Attached separate patch for it.
However I have noticed that istoplevel is always false (at-least for the
testcase we have, I get it false always). And I also think that taking
this decision only on PlanState is enough. Does that in attached patch.
To fix this, I have passed deparse_namespace to the private argument and
accessed it in get_special_variable(). Changes looks very simple. Please
have a look and let me know your views.
This issue is not introduced by the changes done for the aggregate push
down, it only got exposed as we now ship expressions in the target list.
Thus I think it will be good if we make these changes separately as new
patch, if required.

>
> > To me it is better to have expression in the GROUP BY clause as it is
> > more readable and easy to understand. Also it is in sync with deparsing
> > logic in ruleutils.c.
>
> Not exactly. get_rule_sortgroupclause() prints only the target list
> positions instead of actual expression in GROUP BY or ORDER BY clause
> if requested so. We could opt to do the same in all cases. But I think
> it's better to add the whole expression in the remote query for
> readability.
>

Right.

>
> >> case
> >> you do that, adjust capitalization accordingly.
> >
> >
> > Moved testcases after Join testcases.
> > However I have not made any capitalization changes. I see many tests
> > like those elsewhere in the file too. Let me know if that's the policy
> > to have appropriate capitalization in test-case. I don't want violate
> > the policy if any and will prefer to do the changes as per the policy.
>
> This file has used different styles for different sets. But we usually
> adopt the style from surrounding testcases. The testcases surrounding
> aggregate testcase block are capitalized. It will look odd to have
> just the aggregate tests using different style.
>

I see mixed use of capitalization in the file and use of it is adhoc too.
So not convinced with your argument yet. I don't think there is any policy
for that; otherwise use of capitalization in this file would have been
consistent already. Can we defer this to the committer's opinion.

>
> Please use "foreign server" instead of "remote".
> + /*
> + * If aggregate's input collation is not derived from a
> foreign
> + * Var, it can't be sent to remote.
> + */
>

This is consistent with other nodetyps usage. see FuncExpr for example.

>
> I guess, aggregate's input collation should consider aggfilter collation,
> since
> that doesn't affect aggregates collation. But we should make sure that
> aggfilter is safe to pushdown from collation's perspective.
> + /* Check aggregate filter */
> + if (!foreign_expr_walker((Node *) agg->aggfilter,
> + glob_cxt, &inner_cxt))
> + return false;
>

inner_cxt is a merger of all the args/expressions context which is then
checked with aggregate's input collation. So I believe aggfilter collation
is also verified as expected.
Am I missing anything here?

Fixed all other comments including test-cases changes as suggested.

>
> [1] http://www.mail-archive.com/pgsql-hackers(at)postgresql(dot)org/
> msg295435.html
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>

Thanks

--
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachment Content-Type Size
pg_agg_push_down_v4.patch binary/octet-stream 126.9 KB
avoid_unnecessary_parens_for_ForeignScan_in_ruleutils.patch binary/octet-stream 39.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2016-10-19 09:53:19 Re: Indirect indexes
Previous Message Heikki Linnakangas 2016-10-19 09:07:13 Re: FSM corruption leading to errors