Re: [HACKERS] aggregation memory leak and fix

From: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
To: riedel+(at)CMU(dot)EDU (Erik Riedel)
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] aggregation memory leak and fix
Date: 1999-03-19 22:22:42
Message-ID: 199903192223.RAA06691@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> when I run this against 6.4.2, the postgres process grows to upwards of
> 1 GB of memory (at which point something overflows and it dumps core) -
> I watch it grow through 200 MB, 400 MB, 800 MB, dies somewhere near 1 GB
> of allocated memory).
>
> If I take out a few of the "sum" expressions it gets better, removing
> sum_disk_price and sum_charge causes it to be only 600 MB and the query
> actually (eventually) completes. Takes about 10 minutes on my 500 MHz
> machine with 256 MB core and 4 GB of swap.

Wow, that is large.

> The problem seems to be the memory allocation mechanism. Looking at a
> call trace, it is doing some kind of "sub query" plan for each row in
> the database. That means it does ExecEval and postquel_function and
> postquel_execute and all their friends for each row in the database.
> Allocating a couple hundred bytes for each one.

I will admit we really haven't looked at executor issues recently. We
have had few bug reports in that area, so we normally have just left it
alone. I was aware of some memory allocation issues with aggregates,
but I thought I fixed them in 6.5. Obviously not.

> The problem is that none of these allocations are freed - they seem to
> depend on the AllocSet to free them at the end of the transaction. This
> means it isn't a "true" leak, because the bytes are all freed at the
> (very) end of the transaction, but it does mean that the process grows
> to unreasonable size in the meantime. There is no need for this,
> because the individual expression results are aggregated as it goes
> along, so the intermediate nodes can be freed.

Yes, but a terrible over-allocation.

> I spent half a day last week chasing down the offending palloc() calls
> and execution stacks sufficiently that I think I found the right places
> to put pfree() calls.
>
> As a result, I have changes in the files:
>
> src/backend/executor/execUtils.c
> src/backend/executor/nodeResult.c
> src/backend/executor/nodeAgg.c
> src/backend/executor/execMain.c
>
> patches to these files are attached at the end of this message. These
> files are based on the 6.5.0 snapshot downloaded from ftp.postgreql.org
> on 11 March 1999.
>
> Apologies for sending patches to a non-released version. If anyone has
> problems applying the patches, I can send the full files (I wanted to
> avoid sending a 100K shell archive to the list). If anyone cares about
> reproducing my exact problem with the above table, I can provide the 100
> MB pg_dump file for download as well.

No apologies necessary. Glad to have someone digging into that area of
the code. We will gladly apply your patches to 6.5. However, I request
that you send context diffs(diff -c). Normal diffs are just too
error-prone in application. Send them, and I will apply them right
away.

> Secondary Issue: the reason I did not use the 6.4.2 code to make my
> changes is because the AllocSet calls in that one were particularly
> egregious - they only had the skeleton of the allocsets code that exists
> in the 6.5 snapshots, so they were calling malloc() for all of the 8 and
> 16 byte allocations that the above query causes.

Glad you used 6.5. Makes it easier to merge them into our next release.

> Using the fixed code reduces the maximum memory requirement on the above
> query to about 210 MB, and reduces the runtime to (an acceptable) 1.5
> minutes - a factor of more than 6x improvement on my 256 MB machine.
>
> Now the biggest part of the execution time is in the sort before the
> aggregation (which isn't strictly needed, but that is an optimization
> for another day).

Not sure why that is there? Perhaps for GROUP BY processing?

>
> Open Issue: there is still a small "leak" that I couldn't eliminate, I
> think I chased it down to the constvalue allocated in
> execQual::ExecTargetList(), but I couldn't figure out where to properly
> free it. 8 bytes leaked was much better than 750 bytes, so I stopped
> banging my head on that particular item.

Can you give me the exact line? Is it the palloc(1)?

> Secondary Open Issue: what I did have to do to get down to 210 MB of
> core was reduce the minimum allocation size in AllocSet to 8 bytes from
> 16 bytes. That reduces the 8 byte leak above to a true 8 byte, rather
> than a 16 byte leak. Otherwise, I think the size was 280 MB (still a
> big improvement on 1000+ MB). I only changed this in my code and I am
> not including a changed mcxt.c for that.

Maybe Jan, our memory optimizer, can discuss the 8 vs. 16 byte issue.

--
Bruce Momjian | http://www.op.net/~candle
maillist(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Davis 1999-03-19 22:32:48 Additional requests for version 6.5
Previous Message Michael Davis 1999-03-19 22:16:06 RE: [HACKERS] Associative Operators? (Was: Re: [NOVICE] Out of f rying pan, into fire)