From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | Matheus Alcantara <matheusssilv97(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tender Wang <tndrwang(at)gmail(dot)com>, Paul George <p(dot)a(dot)george19(at)gmail(dot)com>, Andy Fan <zhihuifan1213(at)163(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Eager aggregation, take 3 |
Date: | 2025-08-09 01:32:21 |
Message-ID: | CAMbWs4_VtGu18P-jWMXAp3Q+mzGDCaT8AhxQDyWv2_rUxsjv8A@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Aug 6, 2025 at 10:44 PM Matheus Alcantara
<matheusssilv97(at)gmail(dot)com> wrote:
> On Wed Aug 6, 2025 at 4:52 AM -03, Richard Guo wrote:
> > * The has_internal_aggtranstype() check.
> >
> > To avoid potential memory blowout risks from large partial aggregation
> > values, v18 avoids applying eager aggregation if any aggregate uses an
> > INTERNAL transition type, as this typically indicates a large internal
> > data structure (as in string_agg or array_agg). However, this also
> > excludes aggregates like avg(numeric) and sum(numeric), which are
> > actually safe to use with eager aggregation.
> >
> > What we really want to exclude are aggregate functions that can
> > produce large transition values by accumulating or concatenating input
> > rows. So I'm wondering if we could instead check the transfn_oid
> > directly and explicitly exclude only F_ARRAY_AGG_TRANSFN and
> > F_STRING_AGG_TRANSFN. We don't need to worry about json_agg,
> > jsonb_agg, or xmlagg, since they don't support partial aggregation
> > anyway.
> I think it makes sense to me. I just wondering if we should follow an
> "allow" or "don't-allow" strategy. I mean, instead of a list aggregate
> functions that are not allowed we could list functions that are actually
> allowed to use eager aggregation, so in this case we ensure that for the
> functions that are enabled the eager aggregation can work properly.
I ended up still checking for INTERNAL transition types, but
explicitly excluded aggregates that use F_NUMERIC_AVG_ACCUM transition
function, assuming that avg(numeric) and sum(numeric) are safe in this
context. This might still be overly strict, but I prefer to be on the
safe side for now.
> > * The EAGER_AGG_MIN_GROUP_SIZE threshold
> >
> > This threshold defines the minimum average group size required to
> > consider applying eager aggregation. It was previously set to 2, but
> > in v18 it was increased to 20 to be cautious about planning overhead.
> > This change was a snap decision though, without any profiling or data
> > to back it.
> >
> > Looking at TPC-DS queries 4 and 11, a threshold of 10 is the minimum
> > needed to consider eager aggregation for them. The resulting plans
> > show nice performance improvements without any measurable increase in
> > planning time. So, I'm inclined to lower the threshold to 10 for now.
> > (Wondering whether we should make this threshold a GUC, so users can
> > adjust it based on their needs.)
> Having a GUC may sound like a good idea to me TBH. This threshold may
> vary from workload to workload (?).
I've made this threshold a GUC, with a default value of 8 (further
benchmark testing showed that a value of 10 is still too strict for
TPC-DS query 4).
> > Any comments on these two changes?
> It sounds like a good way to go for me, looking forward to the next
> patch version to perform some other tests.
OK. Here it is.
Thanks
Richard
Attachment | Content-Type | Size |
---|---|---|
v19-0001-Implement-Eager-Aggregation.patch | application/octet-stream | 174.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-08-09 07:02:24 | Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem) |
Previous Message | Tom Lane | 2025-08-09 01:14:08 | Re: Making type Datum be 8 bytes everywhere |