Re: Memory-Bounded Hash Aggregation

From: Adam Lee <ali(at)pivotal(dot)io>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Melanie Plageman <mplageman(at)pivotal(dot)io>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Memory-Bounded Hash Aggregation
Date: 2019-08-02 06:44:05
Message-ID: 20190802064405.GA94036@mars.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> High-level approaches:
>
> 1. When the in-memory hash table fills, keep existing entries in the
> hash table, and spill the raw tuples for all new groups in a
> partitioned fashion. When all input tuples are read, finalize groups
> in memory and emit. Now that the in-memory hash table is cleared (and
> memory context reset), process a spill file the same as the original
> input, but this time with a fraction of the group cardinality.
>
> 2. When the in-memory hash table fills, partition the hash space, and
> evict the groups from all partitions except one by writing out their
> partial aggregate states to disk. Any input tuples belonging to an
> evicted partition get spilled to disk. When the input is read
> entirely, finalize the groups remaining in memory and emit. Now that
> the in-memory hash table is cleared, process the next partition by
> loading its partial states into the hash table, and then processing
> its spilled tuples.

I'm late to the party.

These two approaches both spill the input tuples, what if the skewed
groups are not encountered before the hash table fills up? The spill
files' size and disk I/O could be downsides.

Greenplum spills all the groups by writing the partial aggregate states,
reset the memory context, process incoming tuples and build in-memory
hash table, then reload and combine the spilled partial states at last,
how does this sound?

--
Adam Lee

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Migowski 2019-08-02 07:01:26 Proposal: Clean up RangeTblEntry nodes after query preparation
Previous Message Shawn Wang 2019-08-02 06:40:56 Re: WIP: Data at rest encryption