Re: WIP: Aggregation push-down

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: legrand legrand <legrand_legrand(at)hotmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Aggregation push-down
Date: 2020-04-26 08:12:04
Message-ID: CAKU4AWpzp1ZaDdD+hX8q4zpXhgNg9S93zNwmAc4Q3h5Xqj0NSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 24, 2020 at 8:10 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:

> Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
>
> > The more tests on your patch, the more powerful I feel it is!
>
> Thanks for the appreciation. Given the poor progress it's increasingly hard
> for me to find motivation to work on it. I'll try to be more professional
> :-)
>
>
Let's not give up:) I see your patch can push down the aggregation in 3
cases
at least:

1). The group by clause exists in the join eq clause.
2). The group by clause doesn't exist in join eq clause, but we have
pk on on side of join eq clause.
3). The group by clause doesn't exist in join eq clause, and we don't have
the pk as well.

Tom well explained the correctness of the first 2 cases [1] and probably
the case
3) is correct as well, but it is a bit of hard to understand. If this is a
case for others
as well, that probably make the review harder.

So my little suggestion is can we split the patch into some smaller commit
to handle each case? like: commit 1 & 2 handles case 1 & 2,commit 3 handles
join relation as well. commit 4 handles the case 3. Commit 5 can avoid
the
two-phase aggregation for case 2. Commit 6 introduces the aggmultifn. and
commit 7 to handle some outer join case Ashutosh raised at [2]. However not
all the cases need to be handled at one time.

Actually when I read the code, I want such separation by my own to verify my
understanding is correct, but I don't have such time at least now. It
maybe much
easier for you since you are the author. I will be pretty pleasure to help
in any case
after I close my current working item (Hopefully in 2 months).

> At the code level, I did some slight changes on init_grouping_targets
> which may
> > make the code easier to read. You are free to to use/not use it.
>
> I'm going to accept your change of create_rel_agg_info(), but I hesitate
> about
> the changes to init_grouping_targets().
>
> First, is it worth to spend CPU cycles on construction of an extra list
> grouping_columns? Is there a corner case in which we cannot simply pass
> grouping_columns=target->exprs to check_functional_grouping()?
>
> Second, it's obvious that you prefer the style
>
> foreach ()
> {
> if ()
> ...
> else if ()
> ...
> else
> ...
> }
>
> over this
>
> foreach ()
> {
> if ()
> {
> ...
> continue;
> }
>
> if ()
> {
> ...
> continue;
> }
>
> ...
> }
>
> I often prefer the latter and I see that the existing planner code uses
> this
> style quite often too. I think the reason is that it allows for more
> complex
> tests, while the "else-if style" requires all tests to take place inside
> the
> "if ()" expression. However, if several (not necessarily tightly related)
> tests become "compressed" this way, it's less obvious how where to put
> comments. Indeed it seems that some comments got lost due to your changes.

Your explanation looks reasonable, thanks for that. the changes also used
some builtin function to avoid the one more level for-loop.
like tlist_member.

As for the high level design, based on my current knowledge, probably no
need
to change.

[1] https://www.postgresql.org/message-id/9726.1542577439%40sss.pgh.pa.us
[2]
https://www.postgresql.org/message-id/flat/CAFjFpRdpeMTd8kYbM_x0769V-aEKst5Nkg3%2BcoG%3D8ki7s8Zqjw%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2020-04-26 09:48:51 Re: Subplan result caching
Previous Message 曾文旌 2020-04-26 08:09:02 Re: [Proposal] Global temporary tables