Re: Memory-Bounded Hash Aggregation

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Taylor Vesely <tvesely(at)pivotal(dot)io>, Adam Lee <ali(at)pivotal(dot)io>, Melanie Plageman <mplageman(at)pivotal(dot)io>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Memory-Bounded Hash Aggregation
Date: 2019-12-13 16:17:43
Message-ID: 20191213161743.kougnbxkjqmgiti6@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 12, 2019 at 06:10:50PM -0800, Jeff Davis wrote:
>On Thu, 2019-11-28 at 18:46 +0100, Tomas Vondra wrote:
>> 13) As for this:
>>
>> /* make sure that we don't exhaust the hash bits */
>> if (partition_bits + input_bits >= 32)
>> partition_bits = 32 - input_bits;
>>
>> We already ran into this issue (exhausting bits in a hash value) in
>> hashjoin batching, we should be careful to use the same approach in
>> both
>> places (not the same code, just general approach).
>
>I assume you're talking about ExecHashIncreaseNumBatches(), and in
>particular, commit 8442317b. But that's a 10-year-old commit, so
>perhaps you're talking about something else?
>
>It looks like that code in HJ is protecting against having a very large
>number of batches, such that we can't allocate an array of pointers for
>each batch. And it seems like the concern is more related to a planner
>error causing such a large nbatch.
>
>I don't quite see the analogous case in HashAgg. npartitions is already
>constrained to a maximum of 256. And the batches are individually
>allocated, held in a list, not an array.
>
>It could perhaps use some defensive programming to make sure that we
>don't run into problems if the max is set very high.
>
>Can you clarify what you're looking for here?
>

I'm talking about this recent discussion on pgsql-bugs:

https://www.postgresql.org/message-id/CA%2BhUKGLyafKXBMFqZCSeYikPbdYURbwr%2BjP6TAy8sY-8LO0V%2BQ%40mail.gmail.com

I.e. when number of batches/partitions and buckets is high enough, we
may end up with very few bits in one of the parts.

>Perhaps I can also add a comment saying that we can have less than
>HASH_MIN_PARTITIONS when running out of bits.
>

Maybe.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-12-13 16:18:09 Re: BUG #16059: Tab-completion of filenames in COPY commands removes required quotes
Previous Message Robert Haas 2019-12-13 15:43:39 Re: Remove configure --disable-float4-byval and --disable-float8-byval