Re: Eager aggregation, take 3

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tender Wang <tndrwang(at)gmail(dot)com>, Paul George <p(dot)a(dot)george19(at)gmail(dot)com>, Andy Fan <zhihuifan1213(at)163(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>
Subject: Re: Eager aggregation, take 3
Date: 2025-09-13 08:27:41
Message-ID: CAMbWs484dnecwXT2WzWFvzEmWPzC3U9F8SRDXg-SEegTYUFyXQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 13, 2025 at 3:48 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Sep 12, 2025 at 5:34 AM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> > I really like this idea. Currently, aggtransspace represents an
> > estimate of the transition state size provided by the aggregate
> > definition. If it's set to zero, a default estimate based on the
> > state data type is used. Negative values currently have no defined
> > meaning. I think it makes perfect sense to reuse this field so that
> > a negative value indicates that the transition state data can grow
> > unboundedly in size.
> >
> > Attached 0002 implements this idea. It requires fewer code changes
> > than I expected. This is mainly because that our current code uses
> > aggtransspace in such a way that if it's a positive value, that value
> > is used as it's provided by the aggregate definition; otherwise, some
> > heuristics are applied to estimate the size. For the aggregates that
> > accumulate input rows (e.g., array_agg, string_agg), I don't currently
> > have a better heuristic for estimating their size, so I've chosen to
> > keep the current logic. This won't regress anything in estimating
> > transition state data size.

> This might be OK, but it's not what I was suggesting: I was suggesting
> trying to do a calculation like space_used = -aggtransspace *
> rowcount, not just using a <0 value as a sentinel.

I've considered your suggestion, but I'm not sure I'll adopt it in the
end. Here's why:

1) At the point where we check whether any aggregates might pose a
risk of excessive memory usage during partial aggregation, row count
information is not yet available. You could argue that we could
reorganize the logic to perform this check after we've had the row
count, but that seems quite tricky. If I understand correctly, the
"rowcount" in this context actually means the number of rows within
one partial group. That would require us to first decide on the
grouping expressions for the partial aggregation, then compute the
group row counts, then estimate space usage, and only then decide
whether memory usage is excessive and fall back. This would come
quite late in planning and adds nontrivial overhead, compared to the
current approach which checks at the very beginning.

2) Even if we were able to estimate space usage based on the number of
rows per partial group and determined that memory usage seems
acceptable, we still couldn't guarantee that the transition state data
won't grow excessively after further joins. Joins can multiply
partial aggregates, potentially causing a blowup in memory usage even
if the initial estimate seemed safe.

3) I don't think "-aggtransspace * rowcount" reflects the true memory
footprint for aggregates that accumulate input rows. For example,
what if we have an aggregate like string_agg(somecolumn, 'a very long
delimiter')?

4) AFAICS, the main downside of the current approach compared to yours
is that it avoids pushing down aggregates like string_agg() that
accumulate input rows, whereas your suggestion might allow pushing
them down in some cases where we *think* it wouldn't blow up memory.
You might argue that the current implementation is over-conservative.
But I prefer to start safe.

That said, I appreciate you proposing the idea of reusing
aggtransspace, although I ended up using it in a different way than
you suggested.

- Richard

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2025-09-13 08:46:09 Re: plan shape work
Previous Message Shlok Kyal 2025-09-13 07:15:10 Re: How can end users know the cause of LR slot sync delays?