Re: Memory-Bounded Hash Aggregation

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Adam Lee <ali(at)pivotal(dot)io>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Taylor Vesely <tvesely(at)pivotal(dot)io>, Melanie Plageman <mplageman(at)pivotal(dot)io>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Memory-Bounded Hash Aggregation
Date: 2019-12-22 23:05:46
Message-ID: 2370ca86d4c3b46befc031d8421de02a58df20f1.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2019-12-10 at 13:34 -0800, Adam Lee wrote:
> Melanie and I tried this, had a installcheck passed patch. The way
> how
> we verify it is composing a wide table with long unnecessary text
> columns, then check the size it writes on every iteration.
>
> Please check out the attachment, it's based on your 1204 version.

Thank you. Attached a new patch that incorporates your projection work.

A few comments:

* You are only nulling out up to tts_nvalid, which means that you can
still end up storing more on disk if the wide column comes at the end
of the table and hasn't been deserialized yet. I fixed this by copying
needed attributes to the hash_spill_slot and making it virtual.

* aggregated_columns does not need to be a member of AggState; nor does
it need to be computed inside of the perhash loop. Aside: if adding a
field to AggState is necessary, you need to bump the field numbers of
later fields that are labeled for JIT use, otherwise it will break JIT.

* I used an array rather than a bitmapset. It makes it easier to find
the highest column (to do a slot_getsomeattrs), and it might be a
little more efficient for wide tables with mostly useless columns.

* Style nitpick: don't mix code and declarations

The updated patch also saves the transitionSpace calculation in the Agg
node for better hash table size estimating. This is a good way to
choose an initial number of buckets for the hash table, and also to cap
the number of groups we permit in the hash table when we expect the
groups to grow.

Regards,
Jeff Davis

Attachment Content-Type Size
hashagg-20191220.patch text/x-patch 97.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2019-12-22 23:33:41 Re: Memory-Bounded Hash Aggregation
Previous Message Tom Lane 2019-12-22 21:51:15 Re: unsupportable composite type partition keys