Re: Partial aggregates pushdown

From: Ted Yu <yuzhihong(at)gmail(dot)com>
To: "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)mitsubishielectric(dot)co(dot)jp>
Cc: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Ilya Gladyshev <i(dot)gladyshev(at)postgrespro(dot)ru>
Subject: Re: Partial aggregates pushdown
Date: 2022-11-22 05:00:01
Message-ID: CALte62x3WSPFptrw+wHQaQ0HR-LGT0XMSzf2Jfpd+inDDkxuyA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 21, 2022 at 5:02 PM Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp <
Fujii(dot)Yuki(at)df(dot)mitsubishielectric(dot)co(dot)jp> wrote:

> Hi Mr.Vondra, Mr.Pyhalov, Everyone.
>
> I discussed with Mr.Pyhalov about the above draft by directly sending mail
> to
> him(outside of pgsql-hackers). Mr.Pyhalov allowed me to update his patch
> along with the above draft. So I update Mr.Pyhalov's patch v10.
>
> I wrote my patch for discussion.
> My patch passes regression tests which contains additional basic
> postgres_fdw tests
> for my patch's feature. But my patch doesn't contain sufficient documents
> and tests.
> If reviewers accept my approach, I will add documents and tests to my
> patch.
>
> The following is a my patch's readme.
> # I simplified the above draft.
>
> --readme of my patch
> 1. interface
> 1) pg_aggregate
> There are the following additional columns.
> a) partialaggfn
> data type : regproc.
> default value: zero(means invalid).
> description : This field refers to the special aggregate function(then
> we call
> this partialaggfunc)
> corresponding to aggregation function(then we call src) which has
> aggfnoid.
> partialaggfunc is used for partial aggregation pushdown by
> postgres_fdw.
> The followings are differences between the src and the special
> aggregate function.
> difference1) result type
> The result type is same as the src's transtype if the src's
> transtype
> is not internal.
> Otherwise the result type is bytea.
> difference2) final func
> The final func does not exist if the src's transtype is not
> internal.
> Otherwize the final func returns serialized value.
> For example, there is a partialaggfunc avg_p_int4 which corresponds to
> avg(int4)
> whose aggtranstype is _int4.
> The result value of avg_p_int4 is a float8 array which consists of
> count and
> summation. avg_p_int4 does not have finalfunc.
> For another example, there is a partialaggfunc avg_p_int8 which
> corresponds to
> avg(int8) whose aggtranstype is internal.
> The result value of avg_p_int8 is a bytea serialized array which
> consists of count
> and summation. avg_p_int8 has finalfunc int8_avg_serialize which is
> serialize function
> of avg(int8). This field is zero if there is no partialaggfunc.
>
> b) partialagg_minversion
> data type : int4.
> default value: zero(means current version).
> description : This field is the minimum PostgreSQL server version which
> has
> partialaggfunc. This field is used for checking compatibility of
> partialaggfunc.
>
> The above fields are valid in tuples for builtin avg, sum, min, max, count.
> There are additional records which correspond to partialaggfunc for avg,
> sum, min, max,
> count.
>
> 2) pg_proc
> There are additional records which correspond to partialaggfunc for avg,
> sum, min, max,
> count.
>
> 3) postgres_fdw
> postgres_fdw has an additional foreign server option server_version.
> server_version is
> integer value which means remote server version number. Default value of
> server_version
> is zero. server_version is used for checking compatibility of
> partialaggfunc.
>
> 2. feature
> postgres_fdw can pushdown partial aggregation of avg, sum, min, max, count.
> Partial aggregation pushdown is fine when the following two conditions are
> both true.
> condition1) partialaggfn is valid.
> condition2) server_version is not less than partialagg_minversion
> postgres_fdw executes pushdown the patialaggfunc instead of a src.
> For example, we issue "select avg_p_int4(c) from t" instead of "select
> avg(c) from t"
> in the above example.
>
> postgres_fdw can pushdown every aggregate function which supports partial
> aggregation
> if you add a partialaggfunc corresponding to the aggregate function by
> create aggregate
> command.
>
> 3. difference between my patch and Mr.Pyhalov's v10 patch.
> 1) In my patch postgres_fdw can pushdown partial aggregation of avg
> 2) In my patch postgres_fdw can pushdown every aggregate function which
> supports partial
> aggregation if you add a partialaggfunc corresponding to the aggregate
> function.
>
> 4. sample commands in psql
> \c postgres
> drop database tmp;
> create database tmp;
> \c tmp
> create extension postgres_fdw;
> create server server_01 foreign data wrapper postgres_fdw options(host
> 'localhost', dbname 'tmp', server_version '160000', async_capable 'true');
> create user mapping for postgres server server_01 options(user 'postgres',
> password 'postgres');
> create server server_02 foreign data wrapper postgres_fdw options(host
> 'localhost', dbname 'tmp', server_version '160000', async_capable 'true');
> create user mapping for postgres server server_02 options(user 'postgres',
> password 'postgres');
>
> create table t(dt timestamp, id int4, name text, total int4, val float4,
> type int4, span interval) partition by list (type);
>
> create table t1(dt timestamp, id int4, name text, total int4, val float4,
> type int4, span interval);
> create table t2(dt timestamp, id int4, name text, total int4, val float4,
> type int4, span interval);
>
> truncate table t1;
> truncate table t2;
> insert into t1 select timestamp'2020-01-01' + cast(t || ' seconds' as
> interval), t % 100, 'hoge' || t, 1, 1.1, 1, cast('1 seconds' as interval)
> from generate_series(1, 100000, 1) t;
> insert into t1 select timestamp'2020-01-01' + cast(t || ' seconds' as
> interval), t % 100, 'hoge' || t, 2, 2.1, 1, cast('2 seconds' as interval)
> from generate_series(1, 100000, 1) t;
> insert into t2 select timestamp'2020-01-01' + cast(t || ' seconds' as
> interval), t % 100, 'hoge' || t, 1, 1.1, 2, cast('1 seconds' as interval)
> from generate_series(1, 100000, 1) t;
> insert into t2 select timestamp'2020-01-01' + cast(t || ' seconds' as
> interval), t % 100, 'hoge' || t, 2, 2.1, 2, cast('2 seconds' as interval)
> from generate_series(1, 100000, 1) t;
>
> create foreign table f_t1 partition of t for values in (1) server
> server_01 options(table_name 't1');
> create foreign table f_t2 partition of t for values in (2) server
> server_02 options(table_name 't2');
>
> set enable_partitionwise_aggregate = on;
> explain (verbose, costs off) select avg(total::int4), avg(total::int8)
> from t;
> select avg(total::int4), avg(total::int8) from t;
>
> Sincerely yours,
> Yuuki Fujii
> --
> Yuuki Fujii
> Information Technology R&D Center Mitsubishi Electric Corporation
>

Hi,
For partial_agg_compatible :

+ * Check that partial aggregate agg has compatibility

If the `agg` refers to func parameter, the parameter name is aggform

+ int32 partialagg_minversion = PG_VERSION_NUM;
+ if (aggform->partialagg_minversion ==
PARTIALAGG_MINVERSION_DEFAULT) {
+ partialagg_minversion = PG_VERSION_NUM;

I am curious why the same variable is assigned the same value twice. It
seems the if block is redundant.

+ if ((fpinfo->server_version >= partialagg_minversion)) {
+ compatible = true;

The above can be simplified as: return fpinfo->server_version >=
partialagg_minversion;

Cheers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2022-11-22 05:19:45 Re: Perform streaming logical transactions by background workers and parallel apply
Previous Message Nathan Bossart 2022-11-22 04:39:16 Re: wake up logical workers after ALTER SUBSCRIPTION