Re: Default setting for enable_hashagg_disk

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-06-25 16:14:56
Message-ID: 1e75737369e1b6182ebaef7d6bbe098ad620bbb6.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Wed, 2020-06-24 at 15:28 -0400, Robert Haas wrote:
> On Wed, Jun 24, 2020 at 3:14 PM Andres Freund <andres(at)anarazel(dot)de>
> wrote:
> > FWIW, my gut feeling is that we'll end up have to separate the
> > "execution time" spilling from using plain work mem, because it'll
> > trigger spilling too often. E.g. if the plan isn't expected to
> > spill,
> > only spill at 10 x work_mem or something like that. Or we'll need
> > better management of temp file data when there's plenty memory
> > available.

...

> I think that's actually pretty appealing. Separating the memory we
> plan to use from the memory we're willing to use before spilling
> seems
> like a good idea in general, and I think we should probably also do
> it
> in other places - like sorts.

I'm trying to make sense of this. Let's say there are two GUCs:
planner_work_mem=16MB and executor_work_mem=32MB.

And let's say a query comes along and generates a HashAgg path, and the
planner (correctly) thinks if you put all the groups in memory at once,
it would be 24MB. Then the planner, using planner_work_mem, would think
spilling was necessary, and generate a cost that involves spilling.

Then it's going to generate a Sort+Group path, as well. And perhaps it
estimates that sorting all of the tuples in memory would also take
24MB, so it generates a cost that involves spilling to disk.

But it has to choose one of them. We've penalized plans at risk of
spilling to disk, but what's the point? The planner needs to choose one
of them, and both are at risk of spilling to disk.

Regards,
Jeff Davis

In response to

Browse pgsql-docs by date

  From Date Subject
Next Message Jeff Davis 2020-06-25 16:24:52 Re: Default setting for enable_hashagg_disk
Previous Message Robert Haas 2020-06-25 15:46:54 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2020-06-25 16:24:52 Re: Default setting for enable_hashagg_disk
Previous Message Tom Lane 2020-06-25 16:07:05 Re: Why forbid "INSERT INTO t () VALUES ();"