From: | Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | riedel+(at)CMU(dot)EDU (Erik Riedel) |
Cc: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] aggregation memory leak and fix |
Date: | 1999-03-19 22:22:42 |
Message-ID: | 199903192223.RAA06691@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> when I run this against 6.4.2, the postgres process grows to upwards of
> 1 GB of memory (at which point something overflows and it dumps core) -
> I watch it grow through 200 MB, 400 MB, 800 MB, dies somewhere near 1 GB
> of allocated memory).
>
> If I take out a few of the "sum" expressions it gets better, removing
> sum_disk_price and sum_charge causes it to be only 600 MB and the query
> actually (eventually) completes. Takes about 10 minutes on my 500 MHz
> machine with 256 MB core and 4 GB of swap.
Wow, that is large.
> The problem seems to be the memory allocation mechanism. Looking at a
> call trace, it is doing some kind of "sub query" plan for each row in
> the database. That means it does ExecEval and postquel_function and
> postquel_execute and all their friends for each row in the database.
> Allocating a couple hundred bytes for each one.
I will admit we really haven't looked at executor issues recently. We
have had few bug reports in that area, so we normally have just left it
alone. I was aware of some memory allocation issues with aggregates,
but I thought I fixed them in 6.5. Obviously not.
> The problem is that none of these allocations are freed - they seem to
> depend on the AllocSet to free them at the end of the transaction. This
> means it isn't a "true" leak, because the bytes are all freed at the
> (very) end of the transaction, but it does mean that the process grows
> to unreasonable size in the meantime. There is no need for this,
> because the individual expression results are aggregated as it goes
> along, so the intermediate nodes can be freed.
Yes, but a terrible over-allocation.
> I spent half a day last week chasing down the offending palloc() calls
> and execution stacks sufficiently that I think I found the right places
> to put pfree() calls.
>
> As a result, I have changes in the files:
>
> src/backend/executor/execUtils.c
> src/backend/executor/nodeResult.c
> src/backend/executor/nodeAgg.c
> src/backend/executor/execMain.c
>
> patches to these files are attached at the end of this message. These
> files are based on the 6.5.0 snapshot downloaded from ftp.postgreql.org
> on 11 March 1999.
>
> Apologies for sending patches to a non-released version. If anyone has
> problems applying the patches, I can send the full files (I wanted to
> avoid sending a 100K shell archive to the list). If anyone cares about
> reproducing my exact problem with the above table, I can provide the 100
> MB pg_dump file for download as well.
No apologies necessary. Glad to have someone digging into that area of
the code. We will gladly apply your patches to 6.5. However, I request
that you send context diffs(diff -c). Normal diffs are just too
error-prone in application. Send them, and I will apply them right
away.
> Secondary Issue: the reason I did not use the 6.4.2 code to make my
> changes is because the AllocSet calls in that one were particularly
> egregious - they only had the skeleton of the allocsets code that exists
> in the 6.5 snapshots, so they were calling malloc() for all of the 8 and
> 16 byte allocations that the above query causes.
Glad you used 6.5. Makes it easier to merge them into our next release.
> Using the fixed code reduces the maximum memory requirement on the above
> query to about 210 MB, and reduces the runtime to (an acceptable) 1.5
> minutes - a factor of more than 6x improvement on my 256 MB machine.
>
> Now the biggest part of the execution time is in the sort before the
> aggregation (which isn't strictly needed, but that is an optimization
> for another day).
Not sure why that is there? Perhaps for GROUP BY processing?
>
> Open Issue: there is still a small "leak" that I couldn't eliminate, I
> think I chased it down to the constvalue allocated in
> execQual::ExecTargetList(), but I couldn't figure out where to properly
> free it. 8 bytes leaked was much better than 750 bytes, so I stopped
> banging my head on that particular item.
Can you give me the exact line? Is it the palloc(1)?
> Secondary Open Issue: what I did have to do to get down to 210 MB of
> core was reduce the minimum allocation size in AllocSet to 8 bytes from
> 16 bytes. That reduces the 8 byte leak above to a true 8 byte, rather
> than a 16 byte leak. Otherwise, I think the size was 280 MB (still a
> big improvement on 1000+ MB). I only changed this in my code and I am
> not including a changed mcxt.c for that.
Maybe Jan, our memory optimizer, can discuss the 8 vs. 16 byte issue.
--
Bruce Momjian | http://www.op.net/~candle
maillist(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Davis | 1999-03-19 22:32:48 | Additional requests for version 6.5 |
Previous Message | Michael Davis | 1999-03-19 22:16:06 | RE: [HACKERS] Associative Operators? (Was: Re: [NOVICE] Out of f rying pan, into fire) |