RE: [CAUTION!! freemail] Re: Partial aggregates pushdown

From: "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp>
To: Ted Yu <yuzhihong(at)gmail(dot)com>
Cc: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Ilya Gladyshev <i(dot)gladyshev(at)postgrespro(dot)ru>, "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp>
Subject: RE: [CAUTION!! freemail] Re: Partial aggregates pushdown
Date: 2022-11-22 09:11:13
Message-ID: OS3PR01MB6660AE8BB4D08125EB4D3279950D9@OS3PR01MB6660.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Mr.Yu.

Thank you for comments.

> + * Check that partial aggregate agg has compatibility
>
> If the `agg` refers to func parameter, the parameter name is aggform
I fixed the above typo and made the above comment easy to understand
New comment is "Check that partial aggregate function of aggform exsits in remote"

> + int32 partialagg_minversion = PG_VERSION_NUM;
> + if (aggform->partialagg_minversion ==
> PARTIALAGG_MINVERSION_DEFAULT) {
> + partialagg_minversion = PG_VERSION_NUM;
>
>
> I am curious why the same variable is assigned the same value twice. It seems
> the if block is redundant.
>
> + if ((fpinfo->server_version >= partialagg_minversion)) {
> + compatible = true;
>
>
> The above can be simplified as: return fpinfo->server_version >=
> partialagg_minversion;
I fixed according to your comment.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation

> -----Original Message-----
> From: Ted Yu <yuzhihong(at)gmail(dot)com>
> Sent: Tuesday, November 22, 2022 2:00 PM
> To: Fujii Yuki/藤井 雄規(MELCO/情報総研 DM最適G)
> <Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp>
> Cc: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>; Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com>; PostgreSQL-development
> <pgsql-hackers(at)postgresql(dot)org>; Andres Freund <andres(at)anarazel(dot)de>;
> Zhihong Yu <zyu(at)yugabyte(dot)com>; Julien Rouhaud <rjuju123(at)gmail(dot)com>;
> Daniel Gustafsson <daniel(at)yesql(dot)se>; Ilya Gladyshev
> <i(dot)gladyshev(at)postgrespro(dot)ru>
> Subject: [CAUTION!! freemail] Re: Partial aggregates pushdown
>
>
>
> On Mon, Nov 21, 2022 at 5:02 PM Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp
> <mailto:Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp>
> <Fujii(dot)Yuki(at)df(dot)mitsubishielectric(dot)co(dot)jp
> <mailto:Fujii(dot)Yuki(at)df(dot)mitsubishielectric(dot)co(dot)jp> > wrote:
>
>
> Hi Mr.Vondra, Mr.Pyhalov, Everyone.
>
> I discussed with Mr.Pyhalov about the above draft by directly sending
> mail to
> him(outside of pgsql-hackers). Mr.Pyhalov allowed me to update his
> patch
> along with the above draft. So I update Mr.Pyhalov's patch v10.
>
> I wrote my patch for discussion.
> My patch passes regression tests which contains additional basic
> postgres_fdw tests
> for my patch's feature. But my patch doesn't contain sufficient
> documents and tests.
> If reviewers accept my approach, I will add documents and tests to my
> patch.
>
> The following is a my patch's readme.
> # I simplified the above draft.
>
> --readme of my patch
> 1. interface
> 1) pg_aggregate
> There are the following additional columns.
> a) partialaggfn
> data type : regproc.
> default value: zero(means invalid).
> description : This field refers to the special aggregate
> function(then we call
> this partialaggfunc)
> corresponding to aggregation function(then we call src) which has
> aggfnoid.
> partialaggfunc is used for partial aggregation pushdown by
> postgres_fdw.
> The followings are differences between the src and the special
> aggregate function.
> difference1) result type
> The result type is same as the src's transtype if the src's
> transtype
> is not internal.
> Otherwise the result type is bytea.
> difference2) final func
> The final func does not exist if the src's transtype is not
> internal.
> Otherwize the final func returns serialized value.
> For example, there is a partialaggfunc avg_p_int4 which
> corresponds to avg(int4)
> whose aggtranstype is _int4.
> The result value of avg_p_int4 is a float8 array which consists of
> count and
> summation. avg_p_int4 does not have finalfunc.
> For another example, there is a partialaggfunc avg_p_int8 which
> corresponds to
> avg(int8) whose aggtranstype is internal.
> The result value of avg_p_int8 is a bytea serialized array which
> consists of count
> and summation. avg_p_int8 has finalfunc int8_avg_serialize
> which is serialize function
> of avg(int8). This field is zero if there is no partialaggfunc.
>
> b) partialagg_minversion
> data type : int4.
> default value: zero(means current version).
> description : This field is the minimum PostgreSQL server version
> which has
> partialaggfunc. This field is used for checking compatibility of
> partialaggfunc.
>
> The above fields are valid in tuples for builtin avg, sum, min, max,
> count.
> There are additional records which correspond to partialaggfunc for
> avg, sum, min, max,
> count.
>
> 2) pg_proc
> There are additional records which correspond to partialaggfunc for
> avg, sum, min, max,
> count.
>
> 3) postgres_fdw
> postgres_fdw has an additional foreign server option server_version.
> server_version is
> integer value which means remote server version number. Default
> value of server_version
> is zero. server_version is used for checking compatibility of
> partialaggfunc.
>
> 2. feature
> postgres_fdw can pushdown partial aggregation of avg, sum, min, max,
> count.
> Partial aggregation pushdown is fine when the following two
> conditions are both true.
> condition1) partialaggfn is valid.
> condition2) server_version is not less than partialagg_minversion
> postgres_fdw executes pushdown the patialaggfunc instead of a src.
> For example, we issue "select avg_p_int4(c) from t" instead of "select
> avg(c) from t"
> in the above example.
>
> postgres_fdw can pushdown every aggregate function which supports
> partial aggregation
> if you add a partialaggfunc corresponding to the aggregate function by
> create aggregate
> command.
>
> 3. difference between my patch and Mr.Pyhalov's v10 patch.
> 1) In my patch postgres_fdw can pushdown partial aggregation of avg
> 2) In my patch postgres_fdw can pushdown every aggregate function
> which supports partial
> aggregation if you add a partialaggfunc corresponding to the
> aggregate function.
>
> 4. sample commands in psql
> \c postgres
> drop database tmp;
> create database tmp;
> \c tmp
> create extension postgres_fdw;
> create server server_01 foreign data wrapper postgres_fdw
> options(host 'localhost', dbname 'tmp', server_version '160000', async_capable
> 'true');
> create user mapping for postgres server server_01 options(user
> 'postgres', password 'postgres');
> create server server_02 foreign data wrapper postgres_fdw
> options(host 'localhost', dbname 'tmp', server_version '160000', async_capable
> 'true');
> create user mapping for postgres server server_02 options(user
> 'postgres', password 'postgres');
>
> create table t(dt timestamp, id int4, name text, total int4, val float4, type
> int4, span interval) partition by list (type);
>
> create table t1(dt timestamp, id int4, name text, total int4, val float4,
> type int4, span interval);
> create table t2(dt timestamp, id int4, name text, total int4, val float4,
> type int4, span interval);
>
> truncate table t1;
> truncate table t2;
> insert into t1 select timestamp'2020-01-01' + cast(t || ' seconds' as
> interval), t % 100, 'hoge' || t, 1, 1.1, 1, cast('1 seconds' as interval) from
> generate_series(1, 100000, 1) t;
> insert into t1 select timestamp'2020-01-01' + cast(t || ' seconds' as
> interval), t % 100, 'hoge' || t, 2, 2.1, 1, cast('2 seconds' as interval) from
> generate_series(1, 100000, 1) t;
> insert into t2 select timestamp'2020-01-01' + cast(t || ' seconds' as
> interval), t % 100, 'hoge' || t, 1, 1.1, 2, cast('1 seconds' as interval) from
> generate_series(1, 100000, 1) t;
> insert into t2 select timestamp'2020-01-01' + cast(t || ' seconds' as
> interval), t % 100, 'hoge' || t, 2, 2.1, 2, cast('2 seconds' as interval) from
> generate_series(1, 100000, 1) t;
>
> create foreign table f_t1 partition of t for values in (1) server server_01
> options(table_name 't1');
> create foreign table f_t2 partition of t for values in (2) server server_02
> options(table_name 't2');
>
> set enable_partitionwise_aggregate = on;
> explain (verbose, costs off) select avg(total::int4), avg(total::int8) from
> t;
> select avg(total::int4), avg(total::int8) from t;
>
> Sincerely yours,
> Yuuki Fujii
> --
> Yuuki Fujii
> Information Technology R&D Center Mitsubishi Electric Corporation
>
>
>
> Hi,
> For partial_agg_compatible :
>
> + * Check that partial aggregate agg has compatibility
>
> If the `agg` refers to func parameter, the parameter name is aggform
>
> + int32 partialagg_minversion = PG_VERSION_NUM;
> + if (aggform->partialagg_minversion ==
> PARTIALAGG_MINVERSION_DEFAULT) {
> + partialagg_minversion = PG_VERSION_NUM;
>
>
> I am curious why the same variable is assigned the same value twice. It seems
> the if block is redundant.
>
> + if ((fpinfo->server_version >= partialagg_minversion)) {
> + compatible = true;
>
>
> The above can be simplified as: return fpinfo->server_version >=
> partialagg_minversion;
>
> Cheers

Attachment Content-Type Size
0001-Partial-aggregates-push-down-v12.patch application/octet-stream 106.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2022-11-22 09:15:26 Re: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message Ronan Dunklau 2022-11-22 09:07:27 Re: Asynchronous execution support for Custom Scan