Re: Partial aggregates pushdown

From: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
To: "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)mitsubishielectric(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: Partial aggregates pushdown
Date: 2024-03-25 07:00:51
Message-ID: 31ab8f7b91cef6e837b3e31ea1e35a9a@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp писал(а) 2024-03-16 05:28:
> Hi. Mr.Pyhalov.
>> >>
>> >> I don't see it that way. What we would push to the foreign server
>> >> would be something like SELECT count(a) FROM t. Then, after we get
>> >> the results back and combine the various partial counts locally, we
>> >> would locally evaluate the HAVING clause afterward. That is, partial
>> >> aggregation is a barrier to pushing down HAVING clause itself, but it
>> >> doesn't preclude pushing down the aggregation.
>> > I have made modifications in the attached patch to ensure that when
>> > the HAVING clause is present, the HAVING clause is executed locally
>> > while the partial aggregations are pushed down.
>> >
>> >
>>
>> Sorry, I don't see how it works. When we have partial aggregates and
>> having clause, foreign_grouping_ok() returns false and
>> add_foreign_grouping_paths() adds no paths.
>> I'm not saying it's necessary to fix this in the first patch version.
> Our sincere apologies. I had attached an older version before this
> modification.
>

Hi.

In foreign_grouping_ok() having qual is added to local conds here:

6635 if (is_foreign_expr(root, grouped_rel,
expr) && !partial)
6636 fpinfo->remote_conds =
lappend(fpinfo->remote_conds, rinfo);
6637 else
6638 fpinfo->local_conds =
lappend(fpinfo->local_conds, rinfo);
6639 }
6640 }

This is incorrect. If you look at plan for query in postgres_fdw.sql

-- Partial aggregates are safe to push down when there is a HAVING
clause
EXPLAIN (VERBOSE, COSTS OFF)
SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING
sum(a) < 700 ORDER BY 1;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate
Output: pagg_tab.b, avg(pagg_tab.a), max(pagg_tab.a), count(*)
Group Key: pagg_tab.b
Filter: (sum(pagg_tab.a) < 700)
-> Sort
Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)), (PARTIAL
max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a))
Sort Key: pagg_tab.b
-> Append
-> Foreign Scan
Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)),
(PARTIAL max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a))
Filter: ((sum(pagg_tab.a)) < 700)
Relations: Aggregate on (public.fpagg_tab_p1
pagg_tab)
Remote SQL: SELECT b, avg(PARTIAL_AGGREGATE a),
max(a), count(*), sum(a), sum(a) FROM public.pagg_tab_p1 GROUP BY 1
-> Foreign Scan
Output: pagg_tab_1.b, (PARTIAL avg(pagg_tab_1.a)),
(PARTIAL max(pagg_tab_1.a)), (PARTIAL count(*)), (PARTIAL
sum(pagg_tab_1.a))
Filter: ((sum(pagg_tab_1.a)) < 700)
Relations: Aggregate on (public.fpagg_tab_p2
pagg_tab_1)
Remote SQL: SELECT b, avg(PARTIAL_AGGREGATE a),
max(a), count(*), sum(a), sum(a) FROM public.pagg_tab_p2 GROUP BY 1
-> Foreign Scan
Output: pagg_tab_2.b, (PARTIAL avg(pagg_tab_2.a)),
(PARTIAL max(pagg_tab_2.a)), (PARTIAL count(*)), (PARTIAL
sum(pagg_tab_2.a))
Filter: ((sum(pagg_tab_2.a)) < 700)
Relations: Aggregate on (public.fpagg_tab_p3
pagg_tab_2)
Remote SQL: SELECT b, avg(PARTIAL_AGGREGATE a),
max(a), count(*), sum(a), sum(a) FROM public.pagg_tab_p3 GROUP BY 1

You can see that filter is applied before append. The result is correct
only by chance, as sum in every partition is actually < 700. If you
lower this bound, let's say, to 200, you'll start getting wrong results
as data is filtered prior to aggregation.

It seems, however, that in partial case you should just avoid pulling
conditions from having qual at all, all filters will be applied on upper
level. Something like

diff --git a/contrib/postgres_fdw/postgres_fdw.c
b/contrib/postgres_fdw/postgres_fdw.c
index 42eb17ae7c0..54918b9f1a4 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6610,7 +6610,7 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo
*grouped_rel,
* Classify the pushable and non-pushable HAVING clauses and
save them in
* remote_conds and local_conds of the grouped rel's fpinfo.
*/
- if (extra->havingQual)
+ if (extra->havingQual && !partial)
{
foreach(lc, (List *) extra->havingQual)
{
@@ -6632,7 +6632,7 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo
*grouped_rel,

grouped_rel->relids,

NULL,

NULL);
- if (is_foreign_expr(root, grouped_rel, expr) &&
!partial)
+ if (is_foreign_expr(root, grouped_rel, expr))
fpinfo->remote_conds =
lappend(fpinfo->remote_conds, rinfo);
else
fpinfo->local_conds =
lappend(fpinfo->local_conds, rinfo);

>> From: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
>> Sent: Wednesday, February 28, 2024 10:43 PM
>> contrib/postgres_fdw/deparse.c: comment before appendFunctionName()
>> has gone, this seems to be wrong.
> Fixed.
>
>> From: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
>> Sent: Wednesday, February 28, 2024 10:43 PM
>> In finalize_aggregate()
>>
>> 1079 /*
>> 1080 * Apply the agg's finalfn if one is provided, else
>> return
>> transValue.
>> 1081 */
>>
>> Comment should be updated to note behavior for agg_partial aggregates.
> Fixed.

Comment in nodeAgg.c seems to be strange:

1079 /*
1080 * If the agg's finalfn is provided and PARTIAL_AGGREGATE
keyword is
1081 * not specified, apply the agg's finalfn.
1082 * If PARTIAL_AGGREGATE keyword is specified and the
transValue type
1083 * is internal, apply the agg's serialfn. In this case, if
the agg's
1084 * serialfn must not be invalid. Otherwise return
transValue.
1085 */

Likely, you mean:

... In this case the agg'ss serialfn must not be invalid...

Lower, in the same file, please, correct error message:

1136 if(!OidIsValid(peragg->serialfn_oid))
1137 elog(ERROR, "serialfunc is note provided
for partial aggregate");

it should be "serialfunc is not provided for partial aggregate"

Also something is wrong with the following test :

SELECT /* aggregate <> partial aggregate */
array_agg(c_int4array), array_agg(b),
avg(b::int2), avg(b::int4), avg(b::int8), avg(c_interval),
avg(b::float4), avg(b::float8),
corr(b::float8, (b * b)::float8),
covar_pop(b::float8, (b * b)::float8),
covar_samp(b::float8, (b * b)::float8),
regr_avgx((2 * b)::float8, b::float8),
.....

Its results have changed since last patch. Do they depend on daylight
saving time?

--
Best regards,
Alexander Pyhalov,
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2024-03-25 07:06:29 Re: Add Index-level REINDEX with multiple jobs
Previous Message Bharath Rupireddy 2024-03-25 06:55:21 Re: Introduce XID age and inactive timeout based replication slot invalidation