Re: Treating work_mem as a shared resource (Was: Parallel Hash take II)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Serge Rielau <serge(at)rielau(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Oleg Golovanov <rentech(at)mail(dot)ru>
Subject: Re: Treating work_mem as a shared resource (Was: Parallel Hash take II)
Date: 2018-02-24 00:27:28
Message-ID: CA+TgmoaeCWQa2CEkF7nL5W85yA-Frp6Dm4ErPPYhczqqbYA04w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 23, 2018 at 6:06 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> If we switched to policy 3 and (say) work_mem were somehow
> automagically adjusted to be divided by number of participants at
> planning and execution time, then Parallel Hash wouldn't have to
> change at all to conform to the policy. It would use at most work_mem
> per Parallel Hash node, no matter how many workers and no matter which
> of its strategies it picked (either it receives a budget of work_mem /
> participants, and then multiplies it by participants to create a
> no-partition hash table combining the participants' budgets, or it
> lets each participant chew on smaller hash tables adding up to at most
> work_mem). Just the same total per-node budget as any other executor
> node gets.

That's true, but what you'd have instead is a whole lot of additional
planning overhead. Right now, if we choose to do a merge-join or a
parallel-oblivious hash join or a nested loop with a materialize node
on the inner side, we can join the parallel-aware path on the outer
side to the same parallel-oblivious path on the inner side that we
would use if we decided against parallel query altogether. If you
wanted to all of the copies of a node across all parallel participants
to stick to work_mem as a budget, then you'd need one set of paths for
each rel planned with the default work_mem setting and a second set
planned with less work_mem. And if you imagine a future where we
create various paths for the same relation with various different
numbers of workers, then you'd need to have even more different sets
of paths for each relation.

If we're OK with making planning more expensive to solve this problem,
then I think we should forget about #3 and go straight to #2. What we
would do is just teach add_path() that "amount of memory used" is
another independent dimension of merit, so that a more expensive plan
might be kept if it uses less memory. Then if at the end of planning
you want to pick the fastest plan that uses less than X amount of
memory, or if you want to pick the plan for which weight * cost +
weight * memory usage is minimal, or whatever it is you want, you can.

I think the only one from your list that's really boil-the-ocean is
#1. For that one, you presumably need to create multiple plans and
switch between them based on how much memory is available right now
and maybe how much you think will be available in the near future and
I guess impose some kind of admission control when system memory gets
too low...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-02-24 00:34:08 Re: Online enabling of checksums
Previous Message Thomas Munro 2018-02-23 23:06:19 Re: Treating work_mem as a shared resource (Was: Parallel Hash take II)