Re: Eager aggregation, take 3

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tender Wang <tndrwang(at)gmail(dot)com>, Paul George <p(dot)a(dot)george19(at)gmail(dot)com>, Andy Fan <zhihuifan1213(at)163(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>
Subject: Re: Eager aggregation, take 3
Date: 2025-10-06 13:59:44
Message-ID: CAApHDvrxyGSLv3Sbg9fpmz6yYri_7M6SaKYnqQQv59uZfQdduA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 6 Oct 2025 at 13:59, Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> Barring any objections, I'll plan to push v23 in a couple of days.

Not a complete review, but a customary look:

1. setup_base_grouped_rels() by name and the header comment claim to
operate on base relations, but the code seems to be coded to handle
OTHER_MEMBER rels too.

Note that set_base_rel_pathlists() explicitly skips anything that's
not RELOPT_BASEREL, so if you're not doing that, then you shouldn't
use "base" in the function name. It's confusing.

2. All the calls to generate_grouped_paths() pass the grouped_rel
RelOptInfo and also grouped_rel->agg_info. Is there a reason to keep
it that way rather than access the agg_info from the given grouped_rel
from within the function?

3. " * The information needed are provided by the RelAggInfo
structure." This should use "is" rather than "are"

4. standard_join_search(). I think it's worth getting rid of the
duplicate "if (!bms_equal(rel->relids, root->all_query_rels))" check.
How about setting that in a local variable rather than recalling
bms_equal(). I don't believe the compiler will optimise the extra one
away as it can't know set_cheapest() doesn't change the relids. Also,
wouldn't it be better to check rel->grouped_rel != NULL first? Won't
that be NULL in most cases, where as !bms_equal(rel->relids,
root->all_query_rels) will be true in most cases? Likewise in
generate_partitionwise_join_paths().

5. Wouldn't it be better to do 0002 first and get that into core so
you don't have to do the hacky stuff in is_partial_agg_memory_risky()?

6. Shouldn't this be using lappend()?

agg_clause_list = list_append_unique(agg_clause_list, ac_info);

I don't understand why ac_info could already be in the list. You've
just done: ac_info = makeNode(AggClauseInfo);

7. The following comment talks about "base" relations. I don't think
it should be as the RelOptInfo can be an OTHER_MEMBER rel.

* build_simple_grouped_rel
* Construct a new RelOptInfo representing a grouped version of the input
* base relation.
*/

8. Normally we check the List is NIL instead of:

if (list_length(group_clauses) == 0)

9. In get_expression_sortgroupref(), a comment claims "We ignore child
members here.". I think that's outdated since ec_members no longer has
child members.

10. I don't think this comment quite makes sense:

* "apply_at" tracks the lowest join level at which partial aggregation is
* applied.

maybe "minimum set of rels to join before partial aggregation can be applied"?

or at least swap "is" for "can be".

My confusion comes from the fact you're stating "lowest join level",
which seems to indicate that it could be applied after further
relations have been joined, but then you're saying "is applied" to
indicate that it can only be applied at that level.

11. The way you've written the header comments for typedef struct
RelAggInfo seems weird. I've only ever seen extra details in the
header comment when the inline comments have been kept to a single
line. You're spanning multiple lines, so why have the out of line
comments in the header at all?

12. This just doesn't feel like the right name for this field:

/* lowest level partial aggregation is applied at */
Relids apply_at;

I can't help think that it should be something like "agg_relids" or
"required_relids". I understand you're currently only applying the
partial grouping when you get exactly the minimum set of relids in the
join search, but if this can be made fast enough, I expect that could
be changed in the future. If you do change it, then "apply_at" is a
pretty confusing name. Perhaps I've misunderstood here and if you did
that, you'd need to create another RelAggInfo to represent that?

13. Parameter names mismatch between definition and declaration in:

extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
RelOptInfo *rel_plain);
extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
RelOptInfo *rel_plain);

extern void generate_grouped_paths(PlannerInfo *root,
RelOptInfo *rel_grouped,
RelOptInfo *rel_plain,
RelAggInfo *agg_info);

14. Do all the regression tests need VERBOSE in EXPLAIN? It's making
the output kinda huge. It might also be nice to wrap the long queries
onto multiple lines to make them easier to read.

David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2025-10-06 14:18:19 Re: Lock tag of relation extend lock
Previous Message Amit Langote 2025-10-06 13:58:44 Re: sql/json query function JsonBehavior default expression's collation may differ from returning type's collation