Re: Treating work_mem as a shared resource (Was: Parallel Hash take II)

From: Serge Rielau <serge(at)rielau(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Oleg Golovanov <rentech(at)mail(dot)ru>
Subject: Re: Treating work_mem as a shared resource (Was: Parallel Hash take II)
Date: 2017-11-16 16:50:55
Message-ID: dd279d57-3946-4756-aca0-8a42dd213ee9@rielau.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have been pondering how to deal with work_mem for a couple of months myself and had very similar thoughts.
As far as I can tell the problem goes beyond work_mem though: 1. There are several hash operations such as set-ops, hashed subplans, and hash aggregates which today are not spilling at all. We have solved them partially so far and, once complete, think the fixes can be pushed into community PG if there is desire for it 2. We also worry about large objects which can bloat a backend 3. Others random allocations I fear I just don’t know about. 4. OS are chronically poor in trading memory between processes even after the memory is freed unless it’s returned to the OS in big contiguous chunks.
Just as you have, we have also considered holistic provisioning of work_mem across all consumers, but we find that to be too complex. Having an “emergency fund” in shared memory is also an option, but I find it too limiting. Also this approach what was done at DB2 when I was there and it proved cumbersome.
So I’m currently pressing forward with a much more fundamental approach: Pushing Top Transaction Context and its children into shared memory. To avoid fragmentation and serialization on latches I have defined the concept of “a context cluster”. The root of the cluster is the sole true allocator of memory. Child contexts allocate blocks as pallocs from the cluster root. Basically memory management goes recursive and children live within the root. The root (TopTransactionContext) allocates big blocks. e.g. 8MB at a time. Within a transaction PG operates as usual with freelists and all turning over these same 8MB or allocating more if needed. But at the end of every transaction big chunks of memory become available to share with other transactions again. A few places where we reparent contexts need to detect that this can’t be done between or in/out of clusters and do deep copies if needed, but there are few of those. Come to think of it all the cases I encountered so far were SFDC specific…
I’m also moving the e-state from the Portal Heap to the Top Transaction Context. At the end of the day the assumption is that most transactions only need one block from shared memory, and I can probably pin it to the backend, further reducing contention. If there is an Out Of Memory situation - should be very rare - there are multiple ways to deal with it. If there is no dead-lock we can simply wait. If there is one rolling back the transaction that encountered the OOM is the obvious - if not optimal solution. Finding the biggest consumer and sending it a signal to back of would be another way to do it.
My goal is to run a backend with 50-100MB with all local caches controlled for size. Transaction Memory with e-states included should sized for 8MB/backend plus a fixed “spill” of some GB.
Yes, this is invasive and I’m sure to debug this for a while given my limited knowledge of the engine. I may yet fail spectacularly. On the other hand it’s conceptually pretty straight forward.
Cheers Serge Rielau SFDC

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabrízio de Royes Mello 2017-11-16 17:00:31 Re: pgsql: Add hooks for session start and session end
Previous Message Tom Lane 2017-11-16 16:37:05 _WINSOCK_DEPRECATED_NO_WARNINGS