RE: Partial aggregates pushdown

From: "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Ilya Gladyshev <i(dot)gladyshev(at)postgrespro(dot)ru>, "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp>
Subject: RE: Partial aggregates pushdown
Date: 2023-04-07 09:25:52
Message-ID: OSZPR01MB6662875933E87F37C711B4FC95969@OSZPR01MB6662.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Mr.Momjian

> First, my apologies for not addressing this sooner. I was so focused on my
> own tasks that I didn't realize this very important patch was not getting
> attention. I will try my best to get it into PG 17.
Thank you very much for your comments.
I will improve this patch for PG17.
I believe that this patch will help us use PostgreSQL's built-in sharding for OLAP.

> What amazes me is that you didn't need to create _any_ actual aggregate
> functions. Rather, you just needed to hook existing functions into the
> aggregate tables for partial FDW execution.
Yes. This patch enables partial aggregate pushdown using
only existing functions which belong to existing aggregate function
and are needed by parallel query(such as state transition function and serialization function).
This patch does not need new types of function belonging to aggregate functions
and does not need new functions belonging to existing aggregate functions.

> I suggest we remove the version check requirement --- instead just document
> that the FDW Postgres version should be the same or newer than the calling
> Postgres server --- that way, we can assume that whatever is in the system
> catalogs of the caller is in the receiving side.
Thanks for the comment. I will modify this patch according to your comment.

> We should add a GUC to turn off
> this optimization for cases where the FDW Postgres version is older than the
> caller. This handles case 1-2.
Thanks for the advice here too.
I thought it would be more appropriate to add a foregin server option of
postgres_fdw rather than adding GUC.
Would you mind if I ask you what you think about it?

> > 2. Automation of creating definition of partialaggfuncs In development
> > of v17, I manually create definition of partialaggfuncs for avg, min, max, sum,
> count.
> > I am concerned that this may be undesirable.
> > So I am thinking that v17 should be modified to automate creating
> > definition of partialaggfuncs for all built-in aggregate functions.
>
> Are there any other builtin functions that need this? I think we can just
> provide documention for extensions on how to do this.
For practical purposes, it is sufficient
if partial aggregate for the above functions can be pushed down.
I think you are right, it would be sufficient to document how to achieve
partial aggregate pushdown for other built-in functions.

> > 3. Documentation
> > I need add explanation of partialaggfunc to documents on postgres_fdw and
> other places.
>
> I can help with that once we decide on the above.
Thank you. In the next verion of this patch, I will add documents on postgres_fdw
and other places.

> I think 'partialaggfn' should be named 'aggpartialfn' to match other columns in
> pg_aggregate.
Thanks for the comment. I will modify this patch according to your comment.

> For case 3, I don't even know how much pushdown those do of _any_
> aggregates to non-PG servers, let along parallel FDW ones. Does anyone
> know the details?
To allow partial aggregate pushdown for non-PG FDWs,
I think we need to add pushdown logic to their FDWs for each function.
For example, we need to add logic avg() -> sum()/count() to their FDWs for avg.
To allow parallel partial aggregate by non-PG FDWs,
I think we need to add FDW Routines for Asynchronous Execution to their FDWs[1].

> I am confused by these changes to pg_aggegate:
>
> +{ aggfnoid => 'sum_p_int8', aggtransfn => 'int8_avg_accum',
> + aggfinalfn => 'int8_avg_serialize', aggcombinefn =>
> +'int8_avg_combine',
> + aggserialfn => 'int8_avg_serialize', aggdeserialfn =>
> +'int8_avg_deserialize',
> + aggtranstype => 'internal', aggtransspace => '48' },
>
> ...
>
> +{ aggfnoid => 'sum_p_numeric', aggtransfn => 'numeric_avg_accum',
> + aggfinalfn => 'numeric_avg_serialize', aggcombinefn =>
> +'numeric_avg_combine',
> + aggserialfn => 'numeric_avg_serialize',
> + aggdeserialfn => 'numeric_avg_deserialize',
> + aggtranstype => 'internal', aggtransspace => '128' },
>
> Why are these marked as 'sum' but use 'avg' functions?
This reason is that sum(int8)/sum(numeric) shares some functions with avg(int8)/avg(numeric),
and sum_p_int8 is aggpartialfn of sum(int8) and sum_p_numeric is aggpartialfn of sum(numeric).

--Part of avg(int8) in BKI file in PostgreSQL15.0[2].
{ aggfnoid => 'avg(int8)', aggtransfn => 'int8_avg_accum',
aggfinalfn => 'numeric_poly_avg', aggcombinefn => 'int8_avg_combine',
aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize',
aggmtransfn => 'int8_avg_accum', aggminvtransfn => 'int8_avg_accum_inv',
aggmfinalfn => 'numeric_poly_avg', aggtranstype => 'internal',
aggtransspace => '48', aggmtranstype => 'internal', aggmtransspace => '48' },
--

--Part of sum(int8) in BKI file in PostgreSQL15.0[2].
{ aggfnoid => 'sum(int8)', aggtransfn => 'int8_avg_accum',
aggfinalfn => 'numeric_poly_sum', aggcombinefn => 'int8_avg_combine',
aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize',
aggmtransfn => 'int8_avg_accum', aggminvtransfn => 'int8_avg_accum_inv',
aggmfinalfn => 'numeric_poly_sum', aggtranstype => 'internal',
aggtransspace => '48', aggmtranstype => 'internal', aggmtransspace => '48' },
--

[1] https://www.postgresql.org/docs/15/fdw-callbacks.html#FDW-CALLBACKS-ASYNC
[2] https://github.com/postgres/postgres/blob/REL_15_0/src/include/catalog/pg_aggregate.dat

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-04-07 09:37:38 Re: Initial Schema Sync for Logical Replication
Previous Message Amit Kapila 2023-04-07 09:10:36 Re: CREATE SUBSCRIPTION -- add missing tab-completes