Re: Default setting for enable_hashagg_disk

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-07-25 23:56:51
Message-ID: a856635f9284bc36f7a77d02f47bbb6aaf7b59b3.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Sat, 2020-07-25 at 13:27 -0700, Peter Geoghegan wrote:
> It's not clear to me that overpartitioning is a real problem in
> > this
> > case -- but I think the fact that it's causing confusion is enough
> > reason to see if we can fix it.
>
> I'm not sure about that either.
>
> FWIW I notice that when I reduce work_mem a little further (to 3MB)
> with the same query, the number of partitions is still 128, while the
> number of run time batches is 16,512 (an increase from 11,456 from
> 6MB
> work_mem). I notice that 16512/128 is 129, which hints at the nature
> of what's going on with the recursion. I guess it would be ideal if
> the growth in batches was more gradual as I subtract memory.

I wrote a quick patch to use HyperLogLog to estimate the number of
groups contained in a spill file. It seems to reduce the
overpartitioning effect, and is a more principled approach than what I
was doing before.

It does seem to hurt the runtime slightly when spilling to disk in some
cases. I haven't narrowed down whether this is because we end up
recursing multiple times, or if it's just more efficient to
overpartition, or if the cost of doing the HLL itself is significant.

Regards,
Jeff Davis

Attachment Content-Type Size
0001-HLL.patch text/x-patch 9.0 KB

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Tomas Vondra 2020-07-26 00:05:09 Re: Default setting for enable_hashagg_disk
Previous Message Peter Geoghegan 2020-07-25 20:27:19 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-07-26 00:05:09 Re: Default setting for enable_hashagg_disk
Previous Message Peter Geoghegan 2020-07-25 22:08:28 Re: hashagg slowdown due to spill changes