Re: Treating work_mem as a shared resource (Was: Parallel Hash take II)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Serge Rielau <serge(at)rielau(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Oleg Golovanov <rentech(at)mail(dot)ru>
Subject: Re: Treating work_mem as a shared resource (Was: Parallel Hash take II)
Date: 2017-11-17 20:48:05
Message-ID: 19181.1510951685@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Geoghegan <pg(at)bowt(dot)ie> writes:
> On Fri, Nov 17, 2017 at 7:31 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I think this is basically a planning problem. For example, say we wanted to have work_mem_per_query instead of work_mem_per_node. There is an obvious design: consider memory use as an independent dimension of merit during path generation and comparison (less is better). Discard candidate paths whose memory use exceeds the work_mem_per_query budget unless there are no other alternatives. At the end of planning, pick the cheapest path that survived the memory-budget filter. Now, this has the problem that it would make planning more expensive (because we'd hang on to more paths for longer) but it solves a lot of other problems. If there's no memory pressure, we can use memory like mad even when it doesn't save much, but when we have to pick between using more memory for one part of the plan and using more memory for another part of the plan, the choice that does the best job reducing overall execution time will win. Awesome.

> I'd like to hear some opinions on the feasibility of this approach.

There is indeed a big planning problem here, but Robert's sketch is
missing an important component of it: work_mem is not an output of cost
estimates, it is an *input*. For example, we can sort or hash-join in
however much memory you want, but it's going to cost different amounts.

I think what we're actually looking for is to find the breakpoints in
the cost curve where it thinks it can switch to a different sorting
or hashing model, and then to emit paths that assume work_mem just
above each of those breakpoints. But the code isn't set up like that
now, not as to either planning or execution.

Another problem with formulating it that way is that it suddenly puts
a much higher premium on the planner's space estimates being right,
which is something I don't have much faith in. For instance, if the
planner thinks that 1000kB is just enough to hold a hash table, and
then when we run it we find out that we need a bit more space than that,
do we really want the executor to switch to a batched join? Probably not,
especially not if having set the node's work_mem to 1010kB instead
would've let it run to completion without batching. Addressing that
discrepancy might be where we need the dynamic "emergency memory request"
mechanism that Peter was postulating. But I'm not sure exactly how that
works, because at the point where the executor realizes it's about to
exceed the original space budget, it generally has little idea how much
more it would need in order to avoid spilling the sort to disk or
adding another round of batching.

So it's really unclear to me what either the planner or executor API
contracts for memory consumption ought to be if we're going to try to do
this differently. I agree there's a lot of potential for improvement if
we can find a better solution, but we're going to need to put some serious
thought into it.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2017-11-17 21:43:52 Re: PG10.1 autovac killed building extended stats
Previous Message Pavel Stehule 2017-11-17 20:42:30 Re: pspg - psql pager