Re: Default setting for enable_hashagg_disk

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-07-25 20:27:19
Message-ID: CAH2-Wz=suE4PcyC4eeqNkubfa9w8CfU92b3_YtSq0U+muvxmRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Sat, Jul 25, 2020 at 1:10 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Sat, 2020-07-25 at 11:05 -0700, Peter Geoghegan wrote:
> > What worries me a bit is the sharp discontinuities when spilling with
> > significantly less work_mem than the "optimal" amount. For example,
> > with Tomas' TPC-H query (against my smaller TPC-H dataset), I find
> > that setting work_mem to 6MB looks like this:
>
> ...
>
> > Planned Partitions: 128 Peak Memory Usage: 6161kB Disk
> > Usage: 2478080kB HashAgg Batches: 128
>
> ...
>
> > Planned Partitions: 128 Peak Memory Usage: 5393kB Disk
> > Usage: 2482152kB HashAgg Batches: 11456

> It's not clear to me that overpartitioning is a real problem in this
> case -- but I think the fact that it's causing confusion is enough
> reason to see if we can fix it.

I'm not sure about that either.

FWIW I notice that when I reduce work_mem a little further (to 3MB)
with the same query, the number of partitions is still 128, while the
number of run time batches is 16,512 (an increase from 11,456 from 6MB
work_mem). I notice that 16512/128 is 129, which hints at the nature
of what's going on with the recursion. I guess it would be ideal if
the growth in batches was more gradual as I subtract memory.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Jeff Davis 2020-07-25 23:56:51 Re: Default setting for enable_hashagg_disk
Previous Message Jeff Davis 2020-07-25 20:10:36 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-07-25 21:06:03 Re: [PATCH] Performance Improvement For Copy From Binary Files
Previous Message Jeff Davis 2020-07-25 20:10:36 Re: Default setting for enable_hashagg_disk