Re: Aggregate Push Down - Performing aggregation on foreign server

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Aggregate Push Down - Performing aggregation on foreign server
Date: 2016-10-12 10:08:21
Message-ID: CAFjFpRc9Y6aHfuH4ep6u=ON+s=byf1wXd0i-nE2o2q-7ro7ApQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks Jeevan for taking care of the comments.

On Fri, Sep 30, 2016 at 5:23 PM, Jeevan Chalke
<jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:
> Hi Ashutosh,
>
> On Mon, Sep 26, 2016 at 2:28 PM, Ashutosh Bapat
> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>>
>> Thanks Jeevan for working on the comments.
>>
>> Ok. Yes, we should also handle bare conditions in
>> classifyConditions(). I am fine doing it as a separate patch.
>
>
> Doing that with separate patch would be good.

Ok.

>
>>
>>
>> I don't think add_foreign_grouping_path() is the right place to change
>> a structure managed by the core and esp when we are half-way adding
>> paths. An FDW should not meddle with an upper RelOptInfo. Since
>> create_foreignscan_plan() picks those up from RelOptInfo and both of
>> those are part of core, we need a place in core to set the
>> RelOptInfo::relids for an upper relation OR we have stop using
>> fs_relids for upper relation foreign scans.
>
>
> Yes, agree that relids must be set by the core rather than a fdw.
> However I don't want to touch the core for this patch and also not
> exactly sure where we can do that. I think, for this patch, we can
> copy relids into grouped_rel in create_grouping_paths() at place
> where we assign fdwroutine, userid etc. from the input relation.

I don't think this is acceptable since it changes the search key for
an upper without going through fetch_upper_rel(). Rest of the code,
which tries to find this upper rel with relids = NULL, will not be
able to find it anymore. I have started a discussion on this topic
[1]. One possible solution discussed there is to use all_baserels for
an upper rel. But that solution looks temporary. The discussion has
not yet concluded.

In case we decide not to have any relids in the upper relation, we
will have to obtain those from the scanrel in the context, which is
not being used right now.

Also there are places in code where we check reloptkind and assume it
to be either JOINREL or BASEREL e.g. deparseLockingClause. Those need
to be examined again since now we handle an upper relation. As
discussed before, we might user bms_num_members() on relids to decide
whether the underlying scan is join or base relation.

>
>>
>>
>> 2. SortGroupClause is a parser node, so we can name
>> appendSortGroupClause() as
>> deparseSortGroupClause(), to be consistent with the naming convention. If
>> we do
>> that change, may be it's better to pass SortGroupClause as is and handle
>> other
>> members of that structure as well.
>
>
> Renamed appendSortGroupClause() to deparseSortGroupClause().
> I have kept this function in sync with get_rule_sortgroupclause() which
> takes the Index ref from SortGroupClause(). This function require just
> an index and thus passing SortGroupClause as whole is unnecessary. However
> we cannot pass entire order by list or group by list, because in case of
> order by list we need some extra processing on list elements. So passing
> just Index is sufficient and in sync with get_rule_sortgroupclause() too.

Ok.
In this function
+ /* Must force parens for other expressions */
Please explain why we need parenthesis.

>
>>
>> 5. In deparseConst(), showtype is compared with hardcoded values. The
>> callers of this function pass hardcoded values. Instead, we should
>> define an enum and use it. I understand that this logic has been borrowed
>> from
>> get_const_expr(), which also uses hardcoded values. Any reason why not to
>> adopt
>> a better style here? In fact, it only uses two states for showtype, 0 and
>> ">
>> 0". Why don't we use a boolean then OR why isn't the third state in
>> get_const_expr() applicable here?
>
>
> We certainly can add an enum here, but for consistency with other related
> code I think using hard-coded value is good for now. Also I see this
> comment in prologue of deparseConst()
> * This function has to be kept in sync with ruleutils.c's get_const_expr.
>
> So better to have handling like it.
> Also, currently we use only two values for showtype. But for consistency
> let use int instead of bool. In future if we add support for coercion
> expr, then we need this third value. At that time we will not need changes
> here.
> However if required, we can submit a separate patch for adding enum
> instead of int for showtype in ruleutils.c.
>

Since this patch adds showtype argument which is present in
get_const_expr(), it's clear that we haven't really kept those two
functions in sync, even though the comment says so. It's kind of ugly
to see those hardcoded values. But you have a point. Let's see what
committer says.

>>
>>
>> 7. The changes in block ...
>> should be in sync. The later block adds both aggregates and Vars but the
>> first
>> one doesn't. Why is this difference?
>
>
> add_to_flat_tlist() is taking care of duplicate entries and thus in second
> block, I was calling it after the loop to avoid calling it for every list
> element. Well, moved that inside loop like first block.
>

+ /*
+ * We may need to modify the sortgrouprefs from path target, thus copy it
+ * so that we will not have any undesired effect. We need to modify the
+ * sortgrouprefs when it points to one of the ORDER BY expression but not
+ * to any GROUP BY expression and that expression is not pushed as is. If
+ * we do not clear such entries, then we will end up into an error.
+ */
What kind of error? Explain it better.

>>
>>
>> 9. EXPLAIN of foreign scan is adding paranthesis around output
>> expressions,
>> which is not needed. EXPLAIN output of other plan nodes like Aggregate
>> doesn't
>> do that. If it's easy to avoid paranthesis, we should do it in this patch.
>> With
>> this patch, we have started pushing down expressions in the target list,
>> so
>> makes sense to fix it in this patch.
>
>
> ExecInitForeignScan() while determining scan tuple type from passed
> fdw_scan_tlist, has target list containing Vars with varno set to
> INDEX_VAR. Due to which while deparsing we go through
> resolve_special_varno() and get_special_variable() function which
> forces parenthesis around the expression.
> I can't think of any easy and quick solution here. So keeping this
> as is. Input will be welcome or this can be done separately later.
>

I guess, you can decide whether or not to add paranthesis by looking
at the corresponding namespace in the deparse_context. If the
planstate there is ForeignScanState, you may skip the paranthesis if
istoplevel is true. Can you please check if this works?

>>
>>
>> 10. While deparsing the GROUP BY clauses, the code fetches the expressions
>> and
>> deparses the expressions. We might save some CPU cycles if we deparse the
>> clauses by their positional indexes in SELECT clause instead of the actual
>> expressions. This might make the query slightly unreadable. What do you
>> think?
>
>
> To me it is better to have expression in the GROUP BY clause as it is
> more readable and easy to understand. Also it is in sync with deparsing
> logic in ruleutils.c.

Not exactly. get_rule_sortgroupclause() prints only the target list
positions instead of actual expression in GROUP BY or ORDER BY clause
if requested so. We could opt to do the same in all cases. But I think
it's better to add the whole expression in the remote query for
readability.

>
>>
>>
>> Comments about tests:
>>
>> 4. May be bulk of these testcases need to be moved next to Join testcases.
>> That way their outputs do not change in case the DMLs change in future. In
>> case
>> you do that, adjust capitalization accordingly.
>
>
> Moved testcases after Join testcases.
> However I have not made any capitalization changes. I see many tests
> like those elsewhere in the file too. Let me know if that's the policy
> to have appropriate capitalization in test-case. I don't want violate
> the policy if any and will prefer to do the changes as per the policy.

This file has used different styles for different sets. But we usually
adopt the style from surrounding testcases. The testcases surrounding
aggregate testcase block are capitalized. It will look odd to have
just the aggregate tests using different style.

>
>>
>>
>> 13. Wouldn't having random() in the testcase, make them produce different
>> output in different runs? Also, please see if we can reduce the output
>> rows to
>> a handful few. Following query produces 21 rows. Is it really important to
>> have
>> all those rows in the output?
>> +select c2, sum(c1) from ft2 group by c2 having sum(c1) * random() <
>> 500000 order by c2;
>
>
> The condition is written in such a way that we will get all rows,
> nullifying the random() effect, so I don't think there will be any
> issue with the multiple runs.
>

While I agree with that, we can't deny a future change where the test
returns random results. Usually the other tests have used t1.c8 to
induce an unshippable condition. Or use a condition like (random() <=
1) which is always true for any value returned by random(). The
expression will be deemed as volatile. See if you can do something
like that.

>>
>>
>> 20. Why do we need the last statement ALTERing server extensions? Is it
>> not
>> already done when the function was added to the extension?
>> +alter extension postgres_fdw drop function least_accum(anyelement,
>> variadic anyarray);
>> +alter extension postgres_fdw drop aggregate least_agg(variadic items
>> anyarray);
>> +alter server loopback options (set extensions 'postgres_fdw');
>
>
> Whenever we alter extension, we must refresh the server and thus required
> last statement ALTERing server. Without that elements added into extension
> does not reflect in the server.
>

Ok.

>>
>> 23. This comment talks about deeper code level internal details. Should
>> have a
>> better user-visible explanations.
>> +-- ORDER BY expression is not present as is in target list of remote
>> query. This needed clearing out sortgrouprefs flag.
>
>
> Not sure how can I put those comments without referring some code level
> internal details. I have tried altering those comments but it will be
> good if you too try rephrasing it.

I think that testcase can be simplified if we use a non-shippable
expression where an aggregate if shippable e.g. random() * sum(c2) and
order the result by this expression. Because of the unshippable
expression, ORDER BY won't be pushed down and we test the case when
ORDER BY expression is part of the targetlist but can not be pushed
down. If we do that, this need not be a separate testcase and can be
merged with one of the tests earlier.

I guess, it will suffice if the comment just says that it's testing a
case where ORDER BY expression is part of the targetlist but not
pushed down. You don't need to explain what kind of problem that
showed and how it was fixed. If the code changes later, the comment in
the test will be out of sync.

>
>>
>> postgres_fdw.out has grown by 2000 lines because of testcases in this
>> patch. We need to compact the testcases so that easier to maintain in
>> the future.
>
>
> I have removed many duplicate tests and also combined many of them.
> Also removed tests involving local tables. Testcase changes now
> become 1/3 of earlier one.

I guess, there are still avenues for compressing the testcases e.g.
following three testcases and some test cases with HAVING clauses, can
be merged together, since they are testing orthogonal functionality.

+-- Simple aggregates
+explain (verbose, costs off)
+select count(c6), sum(c1), avg(c1), min(c2), max(c1), stddev(c2) from
ft1 group by c2 order by 1, 2;
+select count(c6), sum(c1), avg(c1), min(c2), max(c1), stddev(c2) from
ft1 group by c2 order by 1, 2;
+
+-- Only sum() is pushed. Expression will be executed locally.
+explain (verbose, costs off)
+select sum(c1) * random() from ft1;
+
+-- Simple aggregate with WHERE clause
+explain (verbose, costs off)
+select sum(c1) from ft2 where c2 < 5;
+select sum(c1) from ft2 where c2 < 5;

Here are some more (mostly minor comments)
Why do we need this deparse.c? Even without this include, the code
compiles fine for me.
+#include "utils/ruleutils.h"
+#include "nodes/print.h"

May be you want to explain when foreignrel and scanrel will be different and
their purpose in the comment just above this structure definition.

Please use "foreign server" instead of "remote".
+ /*
+ * If aggregate's input collation is not derived from a foreign
+ * Var, it can't be sent to remote.
+ */

I guess, aggregate's input collation should consider aggfilter collation, since
that doesn't affect aggregates collation. But we should make sure that
aggfilter is safe to pushdown from collation's perspective.
+ /* Check aggregate filter */
+ if (!foreign_expr_walker((Node *) agg->aggfilter,
+ glob_cxt, &inner_cxt))
+ return false;

We need an "a" before upper, I guess.
+ * For upper relation, we have already built the target list while checking

If we add a new kind of RELOPT, this assertion will unintentionally accept it.
May be it's good idea to explicitly list all the RELOPTs even if we support all
of them except one. It was probably I who suggested this change, but thinking
about it again, it doesn't seems to be a good idea. What do you think?
- /* We handle relations for foreign tables and joins between those */
- Assert(rel->reloptkind == RELOPT_JOINREL ||
- rel->reloptkind == RELOPT_BASEREL ||
- rel->reloptkind == RELOPT_OTHER_MEMBER_REL);
+ /* We handle all relations other than dead one. */
+ Assert(rel->reloptkind != RELOPT_DEADREL);

missing "expect" after ... so we don't
+ /*
+ * Queries with grouping sets are not pushed down, so we don't grouping
+ * sets here.
+ */

Do we need following #include in postgres_fdw.c? I am able to compile the code
fine with those.
+#include "nodes/print.h"

[1] http://www.mail-archive.com/pgsql-hackers(at)postgresql(dot)org/msg295435.html
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2016-10-12 10:21:37 Re: Speed up Clog Access by increasing CLOG buffers
Previous Message Amit Langote 2016-10-12 09:14:13 Re: Comment typo