Re: Memory-Bounded Hash Aggregation

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Taylor Vesely <tvesely(at)pivotal(dot)io>, Adam Lee <ali(at)pivotal(dot)io>, Melanie Plageman <mplageman(at)pivotal(dot)io>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Memory-Bounded Hash Aggregation
Date: 2019-12-13 02:10:50
Message-ID: c7ccc942315cc5a6af532e9afbb6a0924cf5e3e3.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2019-11-28 at 18:46 +0100, Tomas Vondra wrote:
> 13) As for this:
>
> /* make sure that we don't exhaust the hash bits */
> if (partition_bits + input_bits >= 32)
> partition_bits = 32 - input_bits;
>
> We already ran into this issue (exhausting bits in a hash value) in
> hashjoin batching, we should be careful to use the same approach in
> both
> places (not the same code, just general approach).

I assume you're talking about ExecHashIncreaseNumBatches(), and in
particular, commit 8442317b. But that's a 10-year-old commit, so
perhaps you're talking about something else?

It looks like that code in HJ is protecting against having a very large
number of batches, such that we can't allocate an array of pointers for
each batch. And it seems like the concern is more related to a planner
error causing such a large nbatch.

I don't quite see the analogous case in HashAgg. npartitions is already
constrained to a maximum of 256. And the batches are individually
allocated, held in a list, not an array.

It could perhaps use some defensive programming to make sure that we
don't run into problems if the max is set very high.

Can you clarify what you're looking for here?

Perhaps I can also add a comment saying that we can have less than
HASH_MIN_PARTITIONS when running out of bits.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2019-12-13 02:13:22 Re: shared tempfile was not removed on statement_timeout (unreproducible)
Previous Message Thomas Munro 2019-12-13 02:03:47 Re: shared tempfile was not removed on statement_timeout (unreproducible)