Re: Partial aggregates pushdown

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)mitsubishielectric(dot)co(dot)jp>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Ilya Gladyshev <i(dot)gladyshev(at)postgrespro(dot)ru>
Subject: Re: Partial aggregates pushdown
Date: 2023-04-07 01:59:39
Message-ID: ZC95C0+PVhVP3iax@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 31, 2023 at 05:49:21AM +0000, Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp wrote:
> Hi Mr.Momjian
>
> > First, am I correct?
> Yes, you are correct. This patch uses new special aggregate functions for partial aggregate
> (then we call this partialaggfunc).

First, my apologies for not addressing this sooner. I was so focused on
my own tasks that I didn't realize this very important patch was not
getting attention. I will try my best to get it into PG 17.

What amazes me is that you didn't need to create _any_ actual aggregate
functions. Rather, you just needed to hook existing functions into the
aggregate tables for partial FDW execution.

> > Second, how far away is this from being committable
> > and/or what work needs to be done to get it committable, either for PG 16 or 17?
> I believe there are three: 1. and 2. are not clear if they are necessary or not; 3. are clearly necessary.
> I would like to hear the opinions of the development community on whether or not 1. and 2. need to be addressed.
>
> 1. Making partialaggfunc user-defined function
> In v17, I make partialaggfuncs as built-in functions.
> Because of this approach, v17 changes specification of BKI file and pg_aggregate.
> For now, partialaggfuncs are needed by only postgres_fdw which is just an extension of PostgreSQL.
> In the future, when revising the specifications for BKI files and pg_aggregate when modifying existing PostgreSQL functions,
> It is necessary to align them with this patch's changes.
> I am concerned that this may be undesirable.
> So I am thinking that v17 should be modified to making partialaggfunc as user defined function.

I think we have three possible cases for aggregates pushdown to FDWs:

1) Postgres built-in aggregate functions
2) Postgres user-defined & extension aggregate functions
3) aggregate functions calls to non-PG FDWs

Your patch handles #1 by checking that the FDW Postgres version is the
same as the calling Postgres version. However, it doesn't check for
extension versions, and frankly, I don't see how we could implement that
cleanly without significant complexity.

I suggest we remove the version check requirement --- instead just
document that the FDW Postgres version should be the same or newer than
the calling Postgres server --- that way, we can assume that whatever is
in the system catalogs of the caller is in the receiving side. We
should add a GUC to turn off this optimization for cases where the FDW
Postgres version is older than the caller. This handles case 1-2.

For case 3, I don't even know how much pushdown those do of _any_
aggregates to non-PG servers, let along parallel FDW ones. Does anyone
know the details?

> 2. Automation of creating definition of partialaggfuncs
> In development of v17, I manually create definition of partialaggfuncs for avg, min, max, sum, count.
> I am concerned that this may be undesirable.
> So I am thinking that v17 should be modified to automate creating definition of partialaggfuncs
> for all built-in aggregate functions.

Are there any other builtin functions that need this? I think we can
just provide documention for extensions on how to do this.

> 3. Documentation
> I need add explanation of partialaggfunc to documents on postgres_fdw and other places.

I can help with that once we decide on the above.

I think 'partialaggfn' should be named 'aggpartialfn' to match other
columns in pg_aggregate.

I am confused by these changes to pg_aggegate:

+{ aggfnoid => 'sum_p_int8', aggtransfn => 'int8_avg_accum',
+ aggfinalfn => 'int8_avg_serialize', aggcombinefn => 'int8_avg_combine',
+ aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize',
+ aggtranstype => 'internal', aggtransspace => '48' },

...

+{ aggfnoid => 'sum_p_numeric', aggtransfn => 'numeric_avg_accum',
+ aggfinalfn => 'numeric_avg_serialize', aggcombinefn => 'numeric_avg_combine',
+ aggserialfn => 'numeric_avg_serialize',
+ aggdeserialfn => 'numeric_avg_deserialize',
+ aggtranstype => 'internal', aggtransspace => '128' },

Why are these marked as 'sum' but use 'avg' functions?

It would be good to explain exactly how this is diffent from background
worker parallelism.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Embrace your flaws. They make you human, rather than perfect,
which you will never be.

Attachment Content-Type Size
aggpush.diff text/x-diff 60.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-04-07 01:59:41 Re: Minimal logical decoding on standbys
Previous Message Andres Freund 2023-04-07 01:25:22 Re: Minimal logical decoding on standbys