Re: [HACKERS] aggregation memory leak and fix

From: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us (Tom Lane)
Cc: riedel+(at)CMU(dot)EDU, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] aggregation memory leak and fix
Date: 1999-03-21 19:20:39
Message-ID: 199903211920.OAA28744@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> writes:
> > My only quick solution would seem to be to add a new "expression" memory
> > context, that can be cleared after every tuple is processed, clearing
> > out temporary values allocated inside an expression.
>
> Right, this whole problem of growing backend memory use during a large
> SELECT (or COPY, or probably a few other things) is one of the things
> that we were talking about addressing by revising the memory management
> structure.
>
> I think what we want inside the executor is a distinction between
> storage that must live to the end of the statement and storage that is
> only needed while processing the current tuple. The second kind of
> storage would go into a separate context that gets flushed every so
> often. (It could be every tuple, or every dozen or hundred tuples
> depending on what seems the best tradeoff of cycles against memory
> usage.)
>
> I'm not sure that just two contexts is enough, either. For example in
> SELECT field1, SUM(field2) GROUP BY field1;
> the working memory for the SUM aggregate could not be released after
> each tuple, but perhaps we don't want it to live for the whole statement
> either --- in that case we'd need a per-group context. (This particular
> example isn't very convincing, because the same storage for the SUM
> *could* be recycled from group to group. But I don't know whether it
> actually *is* reused or not. If fresh storage is palloc'd for each
> instantiation of SUM then we have a per-group leak in this scenario.
> In any case, I'm not sure all aggregate functions have constant memory
> requirements that would let them recycle storage across groups.)
>
> What we need to do is work out what the best set of memory context
> definitions is, and then decide on a strategy for making sure that
> lower-level routines allocate their return values in the right context.
> It'd be nice if the lower-level routines could still call palloc() and
> not have to worry about this explicitly --- otherwise we'll break not
> only a lot of our own code but perhaps a lot of user code. (User-
> specific data types and SPI code all use palloc, no?)

Let me make an argument here.

Let's suppose that we want to free all the memory used as expression
intermediate values after each row is processed.

It is my understanding that all these are created in utils/adt/*.c
files, and that the entry point to all those functions via
fmgr()/fmgr_c().

So, if we go into an expression memory context before calling
fmgr/fmgr_c in the executor, and return to the normal context after the
function call, all our intermediates are trapped in the expression
memory context.

At the end of each row, we just free the expression memory context. In
almost all cases, the data is stored in tuples, and we can free it. In
a few cases like aggregates, we have to save off the value we need to
keep before freeing the expression context. In fact, you could even
optimize the cleanup to only do free'ing if some expression memory was
allocated. In most cases, it is not.

In fact the nodeAgg.c patch that I backed out attempted to do that,
though because there wasn't code that checked if the Datum was
pg_type.typbyval, it didn't work 100%.

In fact, a quick look at the executor shows:

#$ grep "fmgr(" *.c |detab -t 4
execUtils.c: predString = fmgr(F_TEXTOUT, &indexStruct->indpred);
nodeGroup.c: val1 = fmgr(typoutput, attr1, typelem,
nodeGroup.c: val2 = fmgr(typoutput, attr2, typelem,
nodeUnique.c: val1 = fmgr(typoutput, attr1, typelem,
nodeUnique.c: val2 = fmgr(typoutput, attr2, typelem,
spi.c: return (fmgr(foutoid, val, typelem,

#$ grep "fmgr_c(" *.c |detab -t 4
execQual.c: return (Datum) fmgr_c(&fcache->func, (FmgrValues *)argV, isNull);
nodeAgg.c: value1[aggno] = (Datum) fmgr_c(&aggfns->xfn1,
nodeAgg.c: value2[aggno] = (Datum) fmgr_c(&aggfns->xfn2,
nodeAgg.c: value1[aggno] = (Datum) fmgr_c(&aggfns->finalfn,

The fmgr(out*) calls are probably not an issue, because they are already
cleaned up. The only issue are the fmgr_c calls. execQual is the MAJOR
one for all expressions, and the nodeAgg calls would have to have some
saving of the last entry done.

Seems pretty straight-forward to me. The fact is we have a pretty clean
memory allocation system. Not sure if a redesign is necessary if we can
clean up any per-tuple allocations, and I think this would do the trick.

Comments?

--
Bruce Momjian | http://www.op.net/~candle
maillist(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Silvia_Brown1 1999-03-21 20:17:30 Find Out What The Future Holds For You?
Previous Message Tom Lane 1999-03-21 19:15:02 Re: [HACKERS] min() and max() causing aborts