Re: Default setting for enable_hashagg_disk

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-07-19 16:23:41
Message-ID: CAKFQuwafWCNrUKQXZUOf4TpeLC99XU6YNoEyfFqTMTYxnbBp-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Sun, Jul 19, 2020 at 4:38 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:

> > (The only reason I'm in favor of heap_mem[_multiplier] is that it
> > seems like it might be possible to use it to get *better* plans
> > than before. I do not see it as a backwards-compatibility knob.)
>
> I still don't think a hash_mem-type thing is really the right direction
> to go in, even if making a distinction between memory used for sorting
> and memory used for hashing is, and I'm of the general opinion that we'd
> be thinking about doing something better and more appropriate- except
> for the fact that we're talking about adding this in during beta.
>
> In other words, if we'd stop trying to shoehorn something in, which
> we're doing because we're in beta, we'd very likely be talking about all
> of this in a very different way and probably be contemplating something
> like a query_mem that provides for an overall memory limit and which
> favors memory for hashing over memory for sorting, etc.
>

At minimum we'd need a patch we would be happy with dropping in should
there be user complaints. And once this conversation ends with that in
hand I have my doubts whether there will be interest, or even project
desirability, in working toward a "better" solution should this one prove
itself "good enough". And as it seems unlikely that this patch would
foreclose on other promising solutions, combined with there being a
non-trivial behavioral change that we've made, suggests to me that we might
as well just deploy whatever short-term solution we come up with now.

As for hashagg_avoid_disk_plan...

The physical processes we are modelling here:
1. Processing D amount of records takes M amount of memory
2. Processing D amount of records in-memory takes T time per record while
doing the same on-disk takes V time per record
3. Processing D amount of records via some other plan has an effective cost
U
3. V >> T (is strictly greater than)
4. Having chosen a value for M that ensures T it is still possible for V to
end up used

Thus:

If we get D wrong the user can still tweak the system by changing the
hash_mem_multiplier (this is strictly better than v12 which used work_mem)

Setting hashagg_avoid_disk_plan = off provides a means to move V infinitely
far away from T (set to on by default, off reverts to v12 behavior).

There is no way for the user to move V's relative position toward T (n/a in
v12)

The only way to move T is to make it infinitely large by setting
enable_hashagg = off (same as in v12)

Is hashagg_disk_cost_multiplier = [0.0, 1,000,000,000.0] i.e., (T *
hashagg_disk_cost_multiplier == V) doable?

It has a nice symmetry with hash_mem_multiplier and can move V both toward
and away from T. To the extent T is tunable or not in v12 it can remain
the same in v13.

David J.

In response to

Browse pgsql-docs by date

  From Date Subject
Next Message Jeff Davis 2020-07-19 21:17:15 Re: Default setting for enable_hashagg_disk
Previous Message Tom Lane 2020-07-19 14:43:49 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-07-19 18:03:20 Fix initdb's unsafe not-null-marking rule
Previous Message Tomas Vondra 2020-07-19 15:19:45 Re: WIP: BRIN multi-range indexes