From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)mitsubishielectric(dot)co(dot)jp> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Ilya Gladyshev <i(dot)gladyshev(at)postgrespro(dot)ru> |
Subject: | Re: Partial aggregates pushdown |
Date: | 2023-04-07 01:59:39 |
Message-ID: | ZC95C0+PVhVP3iax@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Mar 31, 2023 at 05:49:21AM +0000, Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp wrote:
> Hi Mr.Momjian
>
> > First, am I correct?
> Yes, you are correct. This patch uses new special aggregate functions for partial aggregate
> (then we call this partialaggfunc).
First, my apologies for not addressing this sooner. I was so focused on
my own tasks that I didn't realize this very important patch was not
getting attention. I will try my best to get it into PG 17.
What amazes me is that you didn't need to create _any_ actual aggregate
functions. Rather, you just needed to hook existing functions into the
aggregate tables for partial FDW execution.
> > Second, how far away is this from being committable
> > and/or what work needs to be done to get it committable, either for PG 16 or 17?
> I believe there are three: 1. and 2. are not clear if they are necessary or not; 3. are clearly necessary.
> I would like to hear the opinions of the development community on whether or not 1. and 2. need to be addressed.
>
> 1. Making partialaggfunc user-defined function
> In v17, I make partialaggfuncs as built-in functions.
> Because of this approach, v17 changes specification of BKI file and pg_aggregate.
> For now, partialaggfuncs are needed by only postgres_fdw which is just an extension of PostgreSQL.
> In the future, when revising the specifications for BKI files and pg_aggregate when modifying existing PostgreSQL functions,
> It is necessary to align them with this patch's changes.
> I am concerned that this may be undesirable.
> So I am thinking that v17 should be modified to making partialaggfunc as user defined function.
I think we have three possible cases for aggregates pushdown to FDWs:
1) Postgres built-in aggregate functions
2) Postgres user-defined & extension aggregate functions
3) aggregate functions calls to non-PG FDWs
Your patch handles #1 by checking that the FDW Postgres version is the
same as the calling Postgres version. However, it doesn't check for
extension versions, and frankly, I don't see how we could implement that
cleanly without significant complexity.
I suggest we remove the version check requirement --- instead just
document that the FDW Postgres version should be the same or newer than
the calling Postgres server --- that way, we can assume that whatever is
in the system catalogs of the caller is in the receiving side. We
should add a GUC to turn off this optimization for cases where the FDW
Postgres version is older than the caller. This handles case 1-2.
For case 3, I don't even know how much pushdown those do of _any_
aggregates to non-PG servers, let along parallel FDW ones. Does anyone
know the details?
> 2. Automation of creating definition of partialaggfuncs
> In development of v17, I manually create definition of partialaggfuncs for avg, min, max, sum, count.
> I am concerned that this may be undesirable.
> So I am thinking that v17 should be modified to automate creating definition of partialaggfuncs
> for all built-in aggregate functions.
Are there any other builtin functions that need this? I think we can
just provide documention for extensions on how to do this.
> 3. Documentation
> I need add explanation of partialaggfunc to documents on postgres_fdw and other places.
I can help with that once we decide on the above.
I think 'partialaggfn' should be named 'aggpartialfn' to match other
columns in pg_aggregate.
I am confused by these changes to pg_aggegate:
+{ aggfnoid => 'sum_p_int8', aggtransfn => 'int8_avg_accum',
+ aggfinalfn => 'int8_avg_serialize', aggcombinefn => 'int8_avg_combine',
+ aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize',
+ aggtranstype => 'internal', aggtransspace => '48' },
...
+{ aggfnoid => 'sum_p_numeric', aggtransfn => 'numeric_avg_accum',
+ aggfinalfn => 'numeric_avg_serialize', aggcombinefn => 'numeric_avg_combine',
+ aggserialfn => 'numeric_avg_serialize',
+ aggdeserialfn => 'numeric_avg_deserialize',
+ aggtranstype => 'internal', aggtransspace => '128' },
Why are these marked as 'sum' but use 'avg' functions?
It would be good to explain exactly how this is diffent from background
worker parallelism.
--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com
Embrace your flaws. They make you human, rather than perfect,
which you will never be.
Attachment | Content-Type | Size |
---|---|---|
aggpush.diff | text/x-diff | 60.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2023-04-07 01:59:41 | Re: Minimal logical decoding on standbys |
Previous Message | Andres Freund | 2023-04-07 01:25:22 | Re: Minimal logical decoding on standbys |