Re: Treating work_mem as a shared resource (Was: Parallel Hash take II)

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Serge Rielau <serge(at)rielau(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Oleg Golovanov <rentech(at)mail(dot)ru>
Subject: Re: Treating work_mem as a shared resource (Was: Parallel Hash take II)
Date: 2017-11-17 18:22:22
Message-ID: CAH2-Wz=bXm-mc_vJ9q42Yns2PcG2dSBPG4H+30eELuDLV93Yfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 17, 2017 at 7:31 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Nov 16, 2017 at 11:50 AM, Serge Rielau <serge(at)rielau(dot)com> wrote:
>>
>> Just as you have, we have also considered holistic provisioning of work_mem across all consumers, but we find that to be too complex.
>> Having an “emergency fund” in shared memory is also an option, but I find it too limiting.
>
>
> I agree.

Yeah. I suspect that that idea is not ambitious enough to do a lot of
what we want, and yet is too ambitious to justify working on given its
limited shelf life.

> I think this is basically a planning problem. For example, say we wanted to have work_mem_per_query instead of work_mem_per_node. There is an obvious design: consider memory use as an independent dimension of merit during path generation and comparison (less is better). Discard candidate paths whose memory use exceeds the work_mem_per_query budget unless there are no other alternatives. At the end of planning, pick the cheapest path that survived the memory-budget filter. Now, this has the problem that it would make planning more expensive (because we'd hang on to more paths for longer) but it solves a lot of other problems. If there's no memory pressure, we can use memory like mad even when it doesn't save much, but when we have to pick between using more memory for one part of the plan and using more memory for another part of the plan, the choice that does the best job reducing overall execution time will win. Awesome.

I'd like to hear some opinions on the feasibility of this approach.
Does David have anything to say about it, for example?

> We could also do more localized variants of this that don't provide hard guarantees but do tend to avoid squandering resources.

That sounds like independent work, though it could be very useful.

> Yet another thing we could do is to try to get nodes to voluntarily use less than work_mem when possible. This is particularly an issue for sorts. A 2-batch hash join is so much more expensive than a single-batch hash join that it's almost never going to make sense unless we have no realistic alternative, although I suppose a 64-batch hash join might be not that different from a 32-batch hash join. But for sorts, given all Peter's work in this area, I bet there are a lot of sorts that could budget a quarter or less of work_mem and really not be hurt very much. It depends somewhat on how fast and how contended your I/O is, though, which we don't have an especially good way to model. I'm starting to wonder if that sort_mem GUC might be a good idea... use that for sorts, and keep work_mem for everything else.

Right. The ability for sorts to do well with less memory is really
striking these days. And though I didn't mean to seriously suggest it,
a hash_mem GUC does seem like it solves some significant problems
without much risk. I think it should be hash_mem, not sort_mem,
because hashing seems more like the special case among operations that
consume work_mem, and because sort_mem is already the old name for
work_mem that is still accepted as a work_mem alias, and because
hash_mem avoids any confusion about whether or not CREATE INDEX uses
the new GUC (it clearly does not).

Since I am primarily concerned about the difference in sensitivity to
the availability of memory that exists when comparing sorting and
hashing, and since a new GUC seems like it would noticeably improve
matters, I am beginning to take the idea of writing a hash_mem patch
for v11 seriously.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-11-17 19:31:06 Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256
Previous Message Fujii Masao 2017-11-17 17:57:23 Re: Speed up the removal of WAL files