Re: significant slowdown of HashAggregate between 9.6 and 10

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Jeff Davis <jdavis(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: significant slowdown of HashAggregate between 9.6 and 10
Date: 2020-06-05 13:25:26
Message-ID: 20200605132526.rsvtsp24jymuaowx@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 04, 2020 at 06:57:58PM -0700, Andres Freund wrote:
>Hi,
>
>On 2020-06-04 18:22:03 -0700, Jeff Davis wrote:
>> On Thu, 2020-06-04 at 11:41 -0700, Andres Freund wrote:
>> > +/* minimum number of initial hash table buckets */
>> > +#define HASHAGG_MIN_BUCKETS 256
>> >
>> >
>> > I don't really see much explanation for that part in the commit,
>> > perhaps
>> > Jeff can chime in?
>>
>> I did this in response to a review comment (point #5):
>>
>>
>> https://www.postgresql.org/message-id/20200219191636.gvdywx32kwbix6kd@development
>>
>> Tomas suggested a min of 1024, and I thought I was being more
>> conservative choosing 256. Still too high, I guess?
>
>> I can lower it. What do you think is a reasonable minimum?
>
>I don't really see why there needs to be a minimum bigger than 1?
>

I think you're right. I think I was worried about having to resize the
hash table in case of an under-estimate, and it seemed fine to waste a
tiny bit more memory to prevent that. But this example shows we may need
to scan the hash table sequentially, which means it's not just about
memory consumption. So in hindsight we either don't need the limit at
all, or maybe it could be much lower (IIRC it reduces probability of
collision, but maybe dynahash does that anyway internally).

I wonder if hashjoin has the same issue, but probably not - I don't
think we'll ever scan that internal hash table sequentially.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2020-06-05 13:34:54 Re: Internal key management system
Previous Message Inoue, Hiroshi 2020-06-05 13:25:00 Re: Removal of currtid()/currtid2() and some table AM cleanup