Re: Memory-Bounded Hash Aggregation

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Adam Lee <ali(at)pivotal(dot)io>
Cc: Melanie Plageman <mplageman(at)pivotal(dot)io>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Memory-Bounded Hash Aggregation
Date: 2019-08-02 15:11:19
Message-ID: 617fea8258ba3df7bc8b21cd7f3afc219f2383d9.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2019-08-02 at 14:44 +0800, Adam Lee wrote:
> I'm late to the party.

You are welcome to join any time!

> These two approaches both spill the input tuples, what if the skewed
> groups are not encountered before the hash table fills up? The spill
> files' size and disk I/O could be downsides.

Let's say the worst case is that we encounter 10 million groups of size
one first; just enough to fill up memory. Then, we encounter a single
additional group of size 20 million, and need to write out all of those
20 million raw tuples. That's still not worse than Sort+GroupAgg which
would need to write out all 30 million raw tuples (in practice Sort is
pretty fast so may still win in some cases, but not by any huge
amount).

> Greenplum spills all the groups by writing the partial aggregate
> states,
> reset the memory context, process incoming tuples and build in-memory
> hash table, then reload and combine the spilled partial states at
> last,
> how does this sound?

That can be done as an add-on to approach #1 by evicting the entire
hash table (writing out the partial states), then resetting the memory
context.

It does add to the complexity though, and would only work for the
aggregates that support serializing and combining partial states. It
also might be a net loss to do the extra work of initializing and
evicting a partial state if we don't have large enough groups to
benefit.

Given that the worst case isn't worse than Sort+GroupAgg, I think it
should be left as a future optimization. That would give us time to
tune the process to work well in a variety of cases.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-08-02 15:21:23 Re: Memory-Bounded Hash Aggregation
Previous Message Karl O. Pinc 2019-08-02 15:09:36 Re: Patch to document base64 encoding