RE: Partial aggregates pushdown

From: "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp>
To: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, vignesh C <vignesh21(at)gmail(dot)com>, "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp>
Subject: RE: Partial aggregates pushdown
Date: 2024-03-16 02:28:50
Message-ID: TYAPR01MB55141D18188AC86ADCE35FCB952F2@TYAPR01MB5514.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi. Mr.Pyhalov.

> From: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
> Sent: Wednesday, February 28, 2024 10:43 PM
> > 1. Transmitting state value safely between machines
> >> From: Robert Haas <robertmhaas(at)gmail(dot)com>
> >> Sent: Wednesday, December 6, 2023 10:25 PM the problems around
> >> transmitting serialized bytea blobs between machines that can't be
> >> assumed to fully trust each other will need to be addressed in some
> >> way, which seems like it will require a good deal of design work,
> >> forming some kind of consensus, and then implementation work to
> >> follow.
> > I have considered methods for safely transmitting state values between
> > different machines.
> > I have taken into account the version policy of PostgreSQL (5 years of
> > support) and the major version release cycle over the past 10 years (1
> > year), and as a result, I have made the assumption that transmission
> > is allowed only when the difference between the local version and the
> > remote version is 5 or less.
> > I believe that by adding new components, "export function" and "import
> > function", to the aggregate functions, and further introducing a new
> > SQL keyword to the query syntax of aggregate expressions, we can
> > address this issue.
> >
...
>
> I honestly think that achieving cross-version compatibility in this way puts a significant burden on developers. Can we
> instead always use the more or less universal export and import function to fix possible issues with binary representations
> on different architectures and just refuse to push down partial aggregates on server version mismatch? At least at the first
> step?
Thank you for your comment. I agree with your point that the proposed method would impose a significant burden on developers. In order to ensure cross-version compatibility, it is necessary to impose constraints on the format of the state values exchanged between servers, which would indeed burden developers. As you mentioned, I think that it is realistic to allow partial aggregation pushdown only when coordinating between the same versions in the first step.

> From: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
> Sent: Wednesday, February 28, 2024 10:43 PM
> > 3. Fixing the behavior when the HAVING clause is present
> >> From: Robert Haas <robertmhaas(at)gmail(dot)com>
> >> Sent: Tuesday, November 28, 2023 4:08 AM
> >>
> >> On Wed, Nov 22, 2023 at 1:32 AM Alexander Pyhalov
> >> <a(dot)pyhalov(at)postgrespro(dot)ru> wrote:
> >> > Hi. HAVING is also a problem. Consider the following query
> >> >
> >> > SELECT count(a) FROM t HAVING count(a) > 10 - we can't push it down
> >> > to foreign server as HAVING needs full aggregate result, but
> >> > foreign server don't know it.
> >>
> >> I don't see it that way. What we would push to the foreign server
> >> would be something like SELECT count(a) FROM t. Then, after we get
> >> the results back and combine the various partial counts locally, we
> >> would locally evaluate the HAVING clause afterward. That is, partial
> >> aggregation is a barrier to pushing down HAVING clause itself, but it
> >> doesn't preclude pushing down the aggregation.
> > I have made modifications in the attached patch to ensure that when
> > the HAVING clause is present, the HAVING clause is executed locally
> > while the partial aggregations are pushed down.
> >
> >
>
> Sorry, I don't see how it works. When we have partial aggregates and having clause, foreign_grouping_ok() returns false and
> add_foreign_grouping_paths() adds no paths.
> I'm not saying it's necessary to fix this in the first patch version.
Our sincere apologies. I had attached an older version before this modification.

> From: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
> Sent: Wednesday, February 28, 2024 10:43 PM
> contrib/postgres_fdw/deparse.c: comment before appendFunctionName() has gone, this seems to be wrong.
Fixed.

> From: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
> Sent: Wednesday, February 28, 2024 10:43 PM
> In finalize_aggregate()
>
> 1079 /*
> 1080 * Apply the agg's finalfn if one is provided, else return
> transValue.
> 1081 */
>
> Comment should be updated to note behavior for agg_partial aggregates.
Fixed.

> From: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
> Sent: Wednesday, February 28, 2024 10:43 PM
> In this if branch, should we check just for peragg->aggref->agg_partial and peragg->aggref->aggtranstype ==
> INTERNALOID? It seems that if
> peragg->aggref->aggtranstype == INTERNALOID and there's no
> serialfn_oid, it's likely an error (and one should be generated).
As you pointed out, I have made modifications to the source code so that it terminates with an error if serialfn is invalid.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation

Attachment Content-Type Size
0001-Partial-aggregates-push-down-SQL-keyword-v2.patch application/octet-stream 143.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2024-03-16 03:59:01 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Peter Geoghegan 2024-03-16 00:12:02 Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan