Re: Partial aggregates pushdown

From: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
To: "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)mitsubishielectric(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: Partial aggregates pushdown
Date: 2024-02-28 13:43:07
Message-ID: bf816db00fb5eda86358e10a18a330e5@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi.

Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp писал(а) 2024-02-22 10:20:
> Hi. Mr.Haas, hackers.
>
> I apologize for the significant delay since my last post.
> I have conducted investigations and considerations regarding the
> remaining tasks as follows.
> Would it be possible for you to review them?
> In particular, could you please confirm if the approach mentioned in 1.
> is acceptable?
> If there are no issues with the direction outlined in 1., I plan to
> make a simple prototype based on this approach.
>
> 1. Transmitting state value safely between machines
>> From: Robert Haas <robertmhaas(at)gmail(dot)com>
>> Sent: Wednesday, December 6, 2023 10:25 PM
>> the problems around transmitting
>> serialized bytea blobs between machines that can't be assumed to fully
>> trust each other will need to be addressed in some
>> way, which seems like it will require a good deal of design work,
>> forming some kind of consensus, and then implementation
>> work to follow.
> I have considered methods for safely transmitting state values between
> different machines.
> I have taken into account the version policy of PostgreSQL (5 years of
> support) and the major version release cycle over the past 10 years (1
> year), and as a result, I have made the assumption that transmission is
> allowed only when the difference between the local version and the
> remote version is 5 or less.
> I believe that by adding new components, "export function" and "import
> function", to the aggregate functions, and further introducing a new
> SQL keyword to the query syntax of aggregate expressions, we can
> address this issue.
> If the version of the local server is higher than or equal to the
> version of the remote server, the proposed method can be simplified.
> The export version mentioned later in (1) would not be necessary.
> Furthermore, if the version of the local server matches the version of
> the remote server, the proposed method can be further simplified.
> I would appreciate your input on reasonable assumptions regarding the
> differences in versions between the local server and the remote server.
> I will explain the specifications of the export function, import
> function, the new SQL keyword for aggregate expressions, and the
> behavior of query processing for partial aggregation separately.
> (1) Export Function Specification
> This function is another final function for partial aggregate.
> This function converts the state value that represents the result of
> partial aggregation into a format that can be read by the local server.
> This function is called instead of the existing finalfunc during the
> final stage of aggregation when performing partial aggregation.
> The conversion process described above will be referred to as "export".
> The argument of an export function is the version of the server that
> will receive the return value.
> Hereafter, this version will be referred to as the export version.
> The concept of an export version is necessary to handle cases where the
> version of the local server is smaller than the version of the remote
> server.
> The return value of the export function is the transformed state value,
> and its data type is bytea.
> For backward compatibility, the developer of the export function must
> ensure that the export can be performed for major versions up to five
> versions prior to the major version of PostgreSQL that the export
> function is being developed for.
> For built-in functions, I believe it is necessary to allow for the
> possibility of not developing the export functionality for specific
> versions in the future (due to reasons such as development burden)
> after the export function is developed for a certain version.
> To achieve this, for built-in functions, we will add a column to the
> pg_aggregate catalog that indicates the presence or absence of export
> functionality for each major version, including the major version being
> developed and the previous five major versions. This column will be
> named safety_export_versions and will have a data type of boolean[6].
> For user-defined functions, we will refer to the extensions option and
> add an external server option called safety_export_extensions, which
> will maintain a list of extensions that include only the aggregate
> functions that can be exported to the local server version.
> ...

I honestly think that achieving cross-version compatibility in this way
puts a significant burden on developers. Can we instead always use the
more or less universal export and import function to fix possible issues
with binary representations on different architectures and just refuse
to push down partial aggregates on server version mismatch? At least at
the first step?

>
> 3. Fixing the behavior when the HAVING clause is present
>> From: Robert Haas <robertmhaas(at)gmail(dot)com>
>> Sent: Tuesday, November 28, 2023 4:08 AM
>>
>> On Wed, Nov 22, 2023 at 1:32 AM Alexander Pyhalov
>> <a(dot)pyhalov(at)postgrespro(dot)ru> wrote:
>> > Hi. HAVING is also a problem. Consider the following query
>> >
>> > SELECT count(a) FROM t HAVING count(a) > 10 - we can't push it down to
>> > foreign server as HAVING needs full aggregate result, but foreign
>> > server don't know it.
>>
>> I don't see it that way. What we would push to the foreign server
>> would be something like SELECT count(a) FROM t. Then,
>> after we get the results back and combine the various partial counts
>> locally, we would locally evaluate the HAVING clause
>> afterward. That is, partial aggregation is a barrier to pushing down
>> HAVING clause itself, but it doesn't preclude pushing
>> down the aggregation.
> I have made modifications in the attached patch to ensure that when the
> HAVING clause is present, the HAVING clause is executed locally while
> the partial aggregations are pushed down.
>
>

Sorry, I don't see how it works. When we have partial aggregates and
having clause, foreign_grouping_ok() returns false and
add_foreign_grouping_paths() adds no paths.
I'm not saying it's necessary to fix this in the first patch version.

explain verbose select sum(a) from pagg_tab having sum(a)>10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=2282.49..2282.50 rows=1 width=8)
Output: sum(pagg_tab.a)
Filter: (sum(pagg_tab.a) > 10)
-> Append (cost=760.81..2282.48 rows=3 width=8)
-> Partial Aggregate (cost=760.81..760.82 rows=1 width=8)
Output: PARTIAL sum(pagg_tab.a)
-> Foreign Scan on public.fpagg_tab_p1 pagg_tab
(cost=100.00..753.50 rows=2925 width=4)
Output: pagg_tab.a
Remote SQL: SELECT a FROM public.pagg_tab_p1
-> Partial Aggregate (cost=760.81..760.82 rows=1 width=8)
Output: PARTIAL sum(pagg_tab_1.a)
-> Foreign Scan on public.fpagg_tab_p2 pagg_tab_1
(cost=100.00..753.50 rows=2925 width=4)
Output: pagg_tab_1.a
Remote SQL: SELECT a FROM public.pagg_tab_p2
-> Partial Aggregate (cost=760.81..760.82 rows=1 width=8)
Output: PARTIAL sum(pagg_tab_2.a)
-> Foreign Scan on public.fpagg_tab_p3 pagg_tab_2
(cost=100.00..753.50 rows=2925 width=4)
Output: pagg_tab_2.a
Remote SQL: SELECT a FROM public.pagg_tab_p3

Also I have some minor notices on the code.

contrib/postgres_fdw/deparse.c: comment before appendFunctionName() has
gone, this seems to be wrong.

In finalize_aggregate()

1079 /*
1080 * Apply the agg's finalfn if one is provided, else return
transValue.
1081 */

Comment should be updated to note behavior for agg_partial aggregates.

1129 else if (peragg->aggref->agg_partial
1130 && (peragg->aggref->aggtranstype ==
INTERNALOID)
1131 && OidIsValid(peragg->serialfn_oid))

In this if branch, should we check just for peragg->aggref->agg_partial
and peragg->aggref->aggtranstype == INTERNALOID? It seems that if
peragg->aggref->aggtranstype == INTERNALOID and there's no
serialfn_oid, it's likely an error (and one should be generated).

Overall patch seems nicer. Will look at it more this week.
--
Best regards,
Alexander Pyhalov,
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2024-02-28 13:46:11 Re: An improved README experience for PostgreSQL
Previous Message Daniel Gustafsson 2024-02-28 13:29:25 Re: Wrong description in server_ca.config and client_ca.config