Re: Partial aggregates pushdown

From: "Finnerty, Jim" <jfinnert(at)amazon(dot)com>
To: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)mitsubishielectric(dot)co(dot)jp>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "Julien Rouhaud" <rjuju123(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, "Ilya Gladyshev" <i(dot)gladyshev(at)postgrespro(dot)ru>
Subject: Re: Partial aggregates pushdown
Date: 2023-08-01 20:25:55
Message-ID: 69F8631B-446A-42FB-ACF4-E70C7EF73205@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

When it is valid to filter based on a HAVING clause predicate, it should already have been converted into a WHERE clause predicate, except in the special case of an LIMIT TO .k .. ORDER BY case where the HAVING clause predicate can be determined approximately after having found k fully qualified tuples and then that predicate is successively tightened as more qualified records are found.

*that*, by the way, is a very powerful optimization.

/Jim F

On 7/20/23, 6:24 AM, "Alexander Pyhalov" <a(dot)pyhalov(at)postgrespro(dot)ru <mailto:a(dot)pyhalov(at)postgrespro(dot)ru>> wrote:

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp <mailto:Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp> писал 2023-07-19 03:43:
> Hi Mr.Pyhalov, hackers.

> 3)
> I modified the patch to safely do a partial aggregate pushdown for
> queries which contain having clauses.
>

Hi.
Sorry, but I don't see how it could work.
For example, the attached test returns wrong result:

CREATE FUNCTION f() RETURNS INT AS $$
begin
return 10;
end
$$ LANGUAGE PLPGSQL;

SELECT b, sum(a) FROM pagg_tab GROUP BY b HAVING sum(a) < f() ORDER BY
1;
b | sum
----+-----
0 | 0
10 | 0
20 | 0
30 | 0
40 | 0
+(5 rows)

In fact the above query should have returned 0 rows, as

SELECT b, sum(a) FROM pagg_tab GROUP BY b ORDER BY 1;
b | sum
----+------
0 | 600
1 | 660
2 | 720
3 | 780
4 | 840
5 | 900
6 | 960
7 | 1020
8 | 1080
9 | 1140
10 | 600
11 | 660
12 | 720
....
shows no such rows.

Or, on the same data

SELECT b, sum(a) FROM pagg_tab GROUP BY b HAVING sum(a) > 660 ORDER BY
1;

You'll get 0 rows.

But
SELECT b, sum(a) FROM pagg_tab GROUP BY b;
b | sum
----+------
42 | 720
29 | 1140
4 | 840
34 | 840
41 | 660
0 | 600
40 | 600
gives.

The issue is that you can't calculate "partial" having. You should
compare full aggregate in filter, but it's not possible on the level of
one partition.
And you have this in plans

Finalize GroupAggregate
Output: pagg_tab.b, avg(pagg_tab.a), max(pagg_tab.a), count(*)
Group Key: pagg_tab.b
Filter: (sum(pagg_tab.a) < 700)
-> Sort
Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)), (PARTIAL
max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a))
Sort Key: pagg_tab.b
-> Append
-> Foreign Scan
Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)),
(PARTIAL max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a))
Filter: ((PARTIAL sum(pagg_tab.a)) < 700) !!!!
<--- here we can't compare anything yet, sum is incomplete.
Relations: Aggregate on (public.fpagg_tab_p1
pagg_tab)
Remote SQL: SELECT b, avg_p_int4(a), max(a),
count(*), sum(a) FROM public.pagg_tab_p1 GROUP BY 1
-> Foreign Scan
Output: pagg_tab_1.b, (PARTIAL avg(pagg_tab_1.a)),
(PARTIAL max(pagg_tab_1.a)), (PARTIAL count(*)), (PARTIAL
sum(pagg_tab_1.a))
Filter: ((PARTIAL sum(pagg_tab_1.a)) < 700)
Relations: Aggregate on (public.fpagg_tab_p2
pagg_tab_1)
Remote SQL: SELECT b, avg_p_int4(a), max(a),
count(*), sum(a) FROM public.pagg_tab_p2 GROUP BY 1
-> Foreign Scan
Output: pagg_tab_2.b, (PARTIAL avg(pagg_tab_2.a)),
(PARTIAL max(pagg_tab_2.a)), (PARTIAL count(*)), (PARTIAL
sum(pagg_tab_2.a))
Filter: ((PARTIAL sum(pagg_tab_2.a)) < 700)
Relations: Aggregate on (public.fpagg_tab_p3
pagg_tab_2)
Remote SQL: SELECT b, avg_p_int4(a), max(a),
count(*), sum(a) FROM public.pagg_tab_p3 GROUP BY 1

In foreign_grouping_ok()
6586 if (IsA(expr, Aggref))
6587 {
6588 if (partial)
6589 {
6590 mark_partial_aggref((Aggref
*) expr, AGGSPLIT_INITIAL_SERIAL);
6591 continue;
6592 }
6593 else if (!is_foreign_expr(root,
grouped_rel, expr))
6594 return false;
6595
6596 tlist = add_to_flat_tlist(tlist,
list_make1(expr));
6597 }

at least you shouldn't do anything with expr, if is_foreign_expr()
returned false. If we restrict pushing down queries with havingQuals,
I'm not quite sure how Aggref can appear in local_conds.

As for changes in planner.c (setGroupClausePartial()) I have several
questions.

1) Why don't we add non_group_exprs to pathtarget->exprs when
partial_target->exprs is not set?

2) We replace extra->partial_target->exprs with partial_target->exprs
after processing. Why are we sure that after this tleSortGroupRef is
correct?

--
Best regards,
Alexander Pyhalov,
Postgres Professional

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2023-08-01 20:29:05 Re: explain analyze rows=%.0f
Previous Message Daniel Gustafsson 2023-08-01 20:24:54 Re: document the need to analyze partitioned tables