Re: Default setting for enable_hashagg_disk (hash_mem)

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, David Rowley <dgrowleyml(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Default setting for enable_hashagg_disk (hash_mem)
Date: 2020-07-03 02:46:49
Message-ID: 20200703024649.GJ4107@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Thu, Jul 02, 2020 at 07:05:48PM -0700, Peter Geoghegan wrote:
> On Mon, Jun 29, 2020 at 2:46 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > On Mon, Jun 29, 2020 at 2:22 PM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> > > I agree with this, and I'm mostly OK with having hash_mem. In fact, from
> > > the proposals in this thread I like it the most - as long as it's used
> > > both during planning and execution. It's a pretty clear solution.
> >
> > Great.
> >
> > It's not trivial to write the patch, since there are a few tricky
> > cases. For example, maybe there is some subtlety in nodeAgg.c with
> > AGG_MIXED cases.
>
> Attached is an attempt at this. I have not been particularly thorough,

Thanks for putting it together, I agree that hash_mem seems to be an obvious
"escape hatch" that generalizes existing GUCs and independently useful.

> anything else). I still think that the new GUC should work as a
> multiplier of work_mem, or something else along those lines, though
> for now it's just an independent work_mem used for hashing. I bring it
> up again because I'm concerned about users that upgrade to Postgres 13
> incautiously, and find that hashing uses *less* memory than before.
> Many users probably get away with setting work_mem quite high across
> the board. At the very least, hash_mem should be ignored when it's set
> to below work_mem (which isn't what the patch does).

I feel it should same as work_mem, as it's written, and not a multiplier.

And actually I don't think a lower value should be ignored: "mechanism not
policy". Do we refuse atypical values of maintenance_work_mem < work_mem ?

I assumed that hash_mem would default to -1, which would mean "fall back to
work_mem". We'd then advise users to increase it if they have an issue in v13
with performance of hashes spilled to disk. (And maybe in other cases, too.)

I read the argument that hash tables are a better use of RAM than sort.
However it seems like setting the default to greater than work_mem is a
separate change than providing the mechanism allowing user to do so. I guess
the change in default is intended to mitigate the worst possible behavior
change someone might experience in v13 hashing, and might be expected to
improve "out of the box" performance. I'm not opposed to it, but it's not an
essential part of the patch.

In nodeHash.c, you missed an underscore:
+ * Target in-memory hashtable size is hashmem kilobytes.

--
Justin

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Bruce Momjian 2020-07-03 02:58:34 Re: Default setting for enable_hashagg_disk (hash_mem)
Previous Message Peter Geoghegan 2020-07-03 02:05:48 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-07-03 02:48:58 Re: track_planning causing performance regression
Previous Message torikoshia 2020-07-03 02:45:52 Re: Creating a function for exposing memory usage of backend process