memory context debugging

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: memory context debugging
Date: 2010-01-07 10:56:12
Message-ID: 603c8f071001070256s57434b3fk8026c93be7ec7ae2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 6, 2010 at 11:14 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Wed, Jan 6, 2010 at 10:13 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>>> What tools do we have for identifying memory leaks?
>>>
>>> User complaints :-(
>
>> YGTBFKM.
>
> Not really.  Given the memory context architecture, leaks are simply not
> a big deal in 99% of the system.  We just need a few coding rules like
> "don't run random code in CacheMemoryContext" ;-)
>
>> It seems like we should have a tool that dumps out every memory
>> context in the system, with the number of allocations and frees and
>> number of bytes allocated and freed since the last reset.  Maybe the
>> time of the last reset.  You could run that before and after doing
>> whatever it is that might leak and compare.
>
> Once you've identified a place that "might leak" and a test case that
> would exercise it, you've already done most of the work.  What you're
> describing sounds to me like a lot of work for not much return.
>
> Furthermore, if you do have a leaking test case and you don't know
> exactly where the leak is coming from, numbers about how big the leak is
> aren't any help in finding the cause.  What you really want is numbers
> that are per palloc call site, which would not be simple to get.  I have
> occasionally wondered about hooking up something similar to valgrind for
> this; but the problem is that it would drown you in false positives
> because of the number of places where we intentionally leave stuff to be
> cleaned up at context reset.

About 10 years ago I worked on a C++ project and they had a tool,
whose details I no longer remember, that dumped out memory allocation
data and it was an invaluable debugging tool not only for detecting
leaks but also for figuring out which parts of the code were
memory-intensive. With what we have today, it sounds like you can't
even do something like "run the regression tests and then check
whether anything leaked into a context that doesn't get reset", which
IMO ought to be routine testing. It's true that detecting leaks into
statement or tuple level contexts is probably a little more
challenging because of the reliance on context resets, but without a
tool it's REALLY hard.

Saying that once you've identified a place that might leak and a test
case you've already done most of the work does not seem true to me.
What is the next step, at that point? Visual inspection of the code?
Even for someone who knows the code inside out, that's only feasible
if you're pretty sure that there is a leak there. If you just want to
test for leaks, it's a poor way to do it. And if you aren't familiar
with every detail of the code (such as, ahem, the points where cache
invalidations can happen) then it's even harder.

Getting information by palloc call site doesn't see all that
difficult, actually, though it would require some rejiggering of our
macros. Basically you just need to set things up so that when
memory-context debugging is enabled, the actual allocator
(MemoryContextAlloc in our case, I think) gets __FILE__ and __LINE__
as arguments. Then you just make some sort of very simple data
structure (like an array of structs) where you record data for each
new combination that comes in. This would be associated with the
context, not global, so that you can clean them up easily when the
context is reset. You also need the same thing for free. Then you
write a function that prints out the contents of the array; it's
useful to sort it by bytes allocated. So then if you want to see if
you have any statement-lifetime memory leaks, for example, you dump
the per-statement context just before it gets reset and look at how
many bytes/allocations you have from each call site. And then you
say... "wait a minute, that call site shouldn't have been allocating
in this context"... or "wow, i didn't realize that got so big"... or
whatever the case may be. It doesn't remove the need for manual
analysis, but it gives you a big jump on where to start analyzing.

...Robert

Browse pgsql-hackers by date

  From Date Subject
Next Message Markus Wanner 2010-01-07 11:08:31 Re: Serializable Isolation without blocking
Previous Message Tim Bunce 2010-01-07 10:47:44 Re: Status of plperl inter-sp calling