RE: Partial aggregates pushdown

From: "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp>
To: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Ilya Gladyshev <i(dot)gladyshev(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp>
Subject: RE: Partial aggregates pushdown
Date: 2022-08-01 05:55:31
Message-ID: OS3PR01MB66604C11E390FECBAA1B6518959A9@OS3PR01MB6660.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Mr.Vondra, Mr.Pyhalov.

I'm interesied in Mr.Pyhalov's patch due to the following background.
--Background
I develop postgresql's extension such as fdw in my work.
I'm interested in using postgresql for OLAP.
I think the function of a previous patch "Push aggregation down to base relations and joins"[1] is desiable. I rebased the previous patch and register the rebased patch on the next commitfest[2].
And I think it would be more useful if the previous patch works on a foreign table of postgres_fdw.
I realized the function of partial aggregation pushdown is necessary to make the previous patch work on a foreign table of postgres_fdw.
--

So I reviewed Mr.Pyhalov's patch and discussions on this thread.
I made a draft of approach to respond to Mr.Vondra's comments.
Would you check whether my draft is right or not?

--My draft
> 1) It's not clear to me how could this get extended to aggregates with
> more complex aggregate states, to support e.g. avg() and similar
> fairly common aggregates.
We add a special aggregate function every aggregate function (hereafter we call this src) which supports partial aggregation.
The followings are differences between the src and the special aggregate function.
difference1) result type
The result type is same with the src's transtype if the src's transtype is not internal.
Otherwise the result type is bytea.

difference2) final func
The final func does not exist if the src's transtype is not internal.
Otherwize the final func returns serialized value.

For example, let me call the special aggregate function of avg(float8) avg_p(float8).
The result value of avg_p is a float8 array which consists of count and summation.
avg_p does not have finalfunc.

We pushdown the special aggregate function instead of a src.
For example, we issue "select avg_p(c) from t" instead of "select avg(c) from t"
in the above example.

We add a new column partialaggfn to pg_aggregate to get the oid of the special aggregate function from the the src's oid.
This column is the oid of the special aggregate function which corresponds to the src.

If an aggregate function does not have any special aggregate function, then we does not pushdown any partial aggregation of the aggregate function.

> 2) I'm not sure relying on aggpartialpushdownsafe without any version
> checks etc. is sufficient. I mean, how would we know the remote node
> has the same idea of representing the aggregate state. I wonder how
> this aligns with assumptions we do e.g. for functions etc.
We add compatible server versions infomation to pg_aggregate and the set of options of postgres_fdw's foreign server.
We check compatibility of an aggregate function using this infomation.

An additional column of pg_aggregate is compatibleversonrange.
This column is a range of postgresql server versions which has compatible aggregate function.
An additional options of postgres_fdw's foreign server are serverversion and bwcompatibleverson.
serverversion is remote postgresql server version.
bwcompatibleverson is the maximum version in which any aggregate function is compatible with local noed's one.
Our version check passes if and only if at least one of the following conditions is true.
condition1) the option value of serverversion is in compatibleversonrange.
condition2) the local postgresql server version is between bwcompatibleverson and the option value of serverversion.

We can get the local postgresql server version from PG_VERSION_NUM macro.
We use condition1 if the local postgresql server version is not more than the remote one.
and use condition2 if the local postgresql server version is greater than the remote one.
--

Sincerely yours,
Yuuki Fujii

[1] https://commitfest.postgresql.org/32/1247/
[2] https://commitfest.postgresql.org/39/3764/

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2022-08-01 06:24:14 Re: Schema variables - new implementation for Postgres 15
Previous Message Dilip Kumar 2022-08-01 05:34:09 Re: making relfilenodes 56 bits