Re: Proposal for fixing intra-query memory leaks

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal for fixing intra-query memory leaks
Date: 2000-06-13 00:15:01
Message-ID: 200006130015.UAA12057@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

FYI, Tom, is this still relivant?

> This issue seems to have been on the back burner for a while,
> but I think we need to put it on the front burner again for 7.1.
> Here is a think-piece I just did. I'd appreciate comments,
> particularly about possible interactions with TOAST --- Jan,
> did you have any particular plan in mind for freeing datums created
> by de-TOASTing?
>
> regards, tom lane
>
>
> Proposal for memory allocation fixes 29-Apr-2000
> ------------------------------------
>
> We know that Postgres has serious problems with memory leakage during
> large queries that process a lot of pass-by-reference data. There is
> no provision for recycling memory until end of query. This needs to be
> fixed, even more so with the advent of TOAST which will allow very
> large chunks of data to be passed around in the system. Furthermore,
> 7.1 is an ideal time for fixing it since TOAST and the function-manager
> interface changes will require visiting a lot of the same code that needs
> to be cleaned up. So, here is a proposal.
>
>
> Background
> ----------
>
> We already do most of our memory allocation in "memory contexts", which
> are usually AllocSets as implemented by backend/utils/mmgr/aset.c.
> (Is there any value in allowing for other memory context types? We could
> save some cycles by getting rid of a level of indirection here.) What
> we need to do is create more contexts and define proper rules about when
> they can be freed.
>
> The basic operations on a memory context are:
>
> * create a context
>
> * delete a context (including freeing all the memory allocated therein)
>
> * reset a context (free all memory allocated in the context, but not the
> context object itself)
>
> Given a context, one can allocate a chunk of memory within it, free a
> previously allocated chunk, or realloc a previously allocated chunk larger
> or smaller. (These operations correspond directly to standard C's
> malloc(), free(), and realloc() routines.) At all times there is a
> "current" context denoted by the CurrentMemoryContext global variable.
> The backend macros palloc(), pfree(), prealloc() implicitly allocate space
> in that context. The MemoryContextSwitchTo() operation selects a new
> current context (and returns the previous context, so that the caller can
> restore the previous context before exiting).
>
> Note: there is no really good reason for pfree() to be tied to the current
> memory context; it ought to be possible to pfree() a chunk of memory no
> matter which context it was allocated from. Currently we cannot do that
> because of the possibility that there is more than one kind of memory
> context. If they were all AllocSets then the problem goes away, which is
> one reason I'd like to eliminate the provision for other kinds of
> contexts.
>
> The main advantage of memory contexts over plain use of malloc/free is
> that the entire contents of a memory context can be freed easily, without
> having to request freeing of each individual chunk within it. This is
> both faster and more reliable than per-chunk bookkeeping. We already use
> this fact to clean up at transaction end: by resetting all the active
> contexts, we reclaim all memory. What we need are additional contexts
> that can be reset or deleted at strategic times within a query, such as
> after each tuple.
>
>
> Additions to the memory-context mechanism
> -----------------------------------------
>
> If we are going to have more contexts, we need more mechanism for keeping
> track of them; else we risk leaking whole contexts under error conditions.
> We can do this as follows:
>
> 1. There will be two kinds of contexts, "permanent" and "temporary".
> Permanent contexts are never reset or deleted except by explicit caller
> command (in practice, they probably won't ever be, period). There will
> not be very many of these --- perhaps only the existing TopMemoryContext
> and CacheMemoryContext. We should avoid having very much code run with
> CurrentMemoryContext pointing at a permanent context, since any forgotten
> palloc() represents a permanent memory leak.
>
> 2. Temporary contexts are remembered by the context manager and are
> guaranteed to be deleted at transaction end. (If we ever have nested
> transactions, we'd probably want to tie each temporary context to a
> particular transaction, but for now that's not necessary.) Most activity
> will happen in temporary contexts.
>
> 3. When a context is created, an existing context can be specified as its
> parent; thus a tree of contexts is created. Resetting or deleting any
> particular context resets or deletes all its direct and indirect children
> as well. This feature allows us to manage a lot of contexts without fear
> that some will be leaked; we just have to make sure everything descends
> from one context that we remember to zap at transaction end.
>
> In practice, point #2 doesn't require any special support in the context
> manager as long as it supports point #3. We simply start a new context
> for each transaction and delete it at transaction end. All temporary
> contexts created within the transaction must be direct or indirect
> children of this "transaction top context".
>
> Note: it would probably be possible to adapt the existing "portal" memory
> management mechanism to do what we need. I am instead proposing setting
> up a totally new mechanism, because the portal code strikes me as
> extremely crufty and unwieldy. It may be that we can eventually remove
> portals entirely, or perhaps reimplement them with this mechanism
> underneath.
>
>
> Top-level (permanent) memory contexts
> -------------------------------------
>
> We currently have TopMemoryContext and CacheMemoryContext as permanent
> memory contexts. The existing usages of these are probably OK, although
> it might be a good idea to examine usages of TopMemoryContext to see if
> they should go somewhere else.
>
> It might also be a good idea to set up a permanent ErrorMemoryContext that
> elog() can switch into for processing an error; this would ensure that
> there is at least ~8K of memory available for error processing, even if
> we've run out otherwise. (ErrorMemoryContext could be reset, but not
> deleted, after each successful error recovery.)
>
> We will also create a global variable TransactionTopMemoryContext, which
> is valid at all times. Memory recovery at end of transaction is done by
> deleting and immediately recreating this context. All transaction-local
> contexts are created as children of TransactionTopMemoryContext, so that
> they go away at transaction end too. (If we implement nested
> transactions, it could be that TransactionTopMemoryContext will itself be
> a child of some outer transaction's top context, but that's beyond the
> scope of this proposal.)
>
>
> Transaction-local memory contexts
> ---------------------------------
>
> Relatively little stuff should get allocated directly in
> TransactionTopMemoryContext; the bulk of the action should happen in
> sub-contexts. I propose the following:
>
> QueryTopMemoryContext: this child of TransactionTopMemoryContext is
> created at the start of each query cycle and deleted upon successful
> completion. (On error, of course, it goes away because it is a child of
> TransactionTopMemoryContext.) The query input buffer is allocated in this
> context, as well as anything else that should live just till end of query.
>
> ParsePlanMemoryContext: this child of QueryTopMemoryContext is working
> space for the parse/rewrite/plan/optimize pipeline. After completion
> of planning, the final query plan is copied via copyObject() back into
> QueryTopMemoryContext, and then the ParsePlanMemoryContext can be deleted.
> This allows us to recycle the (perhaps large) amount of memory used by
> planning before actual query execution starts.
>
> Execution per-run memory contexts: at startup, the executor will create a
> child of QueryTopMemoryContext to hold data that should live until
> ExecEndPlan; an example is the plan-node-local execution state. Some plan
> node types may want to create shorter-lived contexts that are children of
> their parent's per-run context. For example, a subplan node would create
> its own "per run" context so that memory could be freed at completion of
> each invocation of the subplan.
>
> Execution per-tuple memory contexts: each per-run context will have a
> child context that the executor will reset (not delete) each time through
> the node's per-tuple loop. This per-tuple context will be the active
> CurrentMemoryContext most of the time during execution.
>
> By resetting the per-tuple context, we will be able to free memory after
> each tuple is processed, rather than only after the whole plan is
> processed. This should solve our memory leakage problems pretty well;
> yet we do not need to add very much new bookkeeping logic to do it.
> In particular, we do *not* need to try to keep track of individual values
> palloc'd during expression evaluation.
>
> Note we assume that resetting a context is a cheap operation. This is
> true already, and we can make it even more true with a little bit of
> tuning in aset.c.
>
>
> Coding rules required
> ---------------------
>
> Functions that return pass-by-reference values will be required always
> to palloc the returned space in the caller's memory context (ie, the
> context that was CurrentMemoryContext at the time of call). It is not
> OK to pass back an input pointer, even if we are returning an input value
> verbatim, because we do not know the lifespan of the context the input
> pointer points to. An example showing why this is necessary is provided
> by aggregate-function execution. The aggregate function executor must
> retain state values returned by state-transition functions from one tuple
> to the next. Yet it does not want to keep them till end of run; that
> would be a memory leak. The solution nodeAgg.c will use is to have two
> per-tuple memory contexts that are used alternately. At each tuple,
> an old state value existing in one context is passed to the state
> transition function, which will return its result in the other context
> (since that'll be where CurrentMemoryContext points). Then the first
> context is reset and used as the target for the next cycle. This solution
> works as long as the transition function always returns a newly palloc'd
> datum, and never simply returns a pointer to its input data.
>
> Thus, a function must use the passed-in CurrentMemoryContext for
> allocating its result data, and can use it for any temporary storage it
> needs as well. pfree'ing such temporary data before return is possible
> but not essential.
>
> Executor routines that switch the active CurrentMemoryContext may need
> to copy data into their caller's current memory context before returning.
> I think there will be relatively little need for that, if we use a
> convention of resetting the per-tuple context at the *start* of an
> execution cycle rather than at its end. With that rule, an execution
> node can return a tuple that is palloc'd in its per-tuple context, and
> the tuple will remain good until the node is called for another tuple
> or told to end execution. This is pretty much the same state of affairs
> that exists now, since a scan node can return a direct pointer to a tuple
> in a disk buffer that is only guaranteed to remain good that long.
>
> A more common reason for copying data will be to transfer a result from
> per-tuple context to per-run context; for example, a Unique node will
> save the last distinct tuple value in its per-run context, requiring a
> copy step. (Actually, Unique could use the same trick with two per-tuple
> contexts as described above for Agg, but there will probably be other
> cases where doing an extra copy step is the right thing.)
>
>
> Other notes
> -----------
>
> It might be that the executor per-run contexts described above should
> be tied directly to executor "EState" nodes, that is, one context per
> EState. I'm not real clear on the lifespan of EStates or the situations
> where we have just one or more than one, so I'm not sure. Comments?
>
> With so many contexts running around, I think it will be almost essential
> to allow pfree() to work on chunks belonging to contexts other than the
> current one. If we don't get rid of the notion of multiple allocation
> context types then some other work will have to be expended to make this
> possible. Also, should we allow prealloc() to work on a chunk not
> belonging to the current context? I'm less excited about allowing that,
> but it may prove useful.
>

--
Bruce Momjian | http://www.op.net/~candle
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chris Bitmead 2000-06-13 00:16:24 Re: ALTER TABLE DROP COLUMN
Previous Message Bruce Momjian 2000-06-13 00:14:00 Re: [ANNOUNCE] Delphi's components for direct access to PostgreSQL