Re: Why our Valgrind reports suck

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Why our Valgrind reports suck
Date: 2025-05-09 17:44:52
Message-ID: 454130.1746812692@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2025-05-08 22:04:06 -0400, Tom Lane wrote:
>> A nearby thread [1] reminded me to wonder why we seem to have
>> so many false-positive leaks reported by Valgrind these days.

> Huh. We use the memory pool client requests to inform valgrind about memory
> contexts. I seem to recall that that "hid" many leak warnings from valgrind. I
> wonder if we somehow broke (or weakened) that.

The problem with dynahash has been there since day one, so I think
we've just gotten used to ignoring "leak" reports associated with
hash tables --- I have, anyway. But the other triggers I listed
have appeared within the last five-ish years, if memory serves,
and we just didn't notice because of the existing dynahash noise.

> We currently don't reset TopMemoryContext at exit, which, obviously, does
> massively increase the number of leaks. But OTOH, without that there's not a
> whole lot of value in the leak check...

Hmm. Yeah, we could just reset or delete TopMemoryContext, but
surely that would be counterproductive. It would mask any real
leak of palloc'd blocks. I'm a little suspicious of your other
idea of shutting down the caches, for the same reason: I wonder
if it wouldn't hide leaks rather than help find them.

One thing I noticed while reading the Valgrind manual is that
they describe a facility for "two level" tracking of custom
allocators such as ours. Apparently, what you're really supposed
to do is use VALGRIND_MEMPOOL_ALLOC to mark the malloc blocks
that the allocator works in, and VALGRIND_MALLOCLIKE_BLOCK to
mark the sub-blocks handed out by the allocator. I wonder if this
feature postdates our implementation of Valgrind support, and I
wonder even more if using it would improve our results.

I did experiment with marking context headers as accessible with
VALGRIND_MEMPOOL_ALLOC, and that made the complaints about
MemoryContextCopyAndSetIdentifier strings go away, confirming
that Valgrind is simply not considering the context->ident
pointers. Unfortunately it also added a bunch of other failures,
so that evidently is Not The Right Thing. I suspect what is
going on is related to this bit in valgrind.h:

For Memcheck users: if you use VALGRIND_MALLOCLIKE_BLOCK to carve out
custom blocks from within a heap block, B, that has been allocated with
malloc/calloc/new/etc, then block B will be *ignored* during leak-checking
-- the custom blocks will take precedence.

We're not using VALGRIND_MALLOCLIKE_BLOCK (yet, anyway), but I'm
suspecting that Valgrind probably also ignores heap blocks that
match VALGRIND_CREATE_MEMPOOL requests, except for the portions
thereof that are covered by VALGRIND_MEMPOOL_ALLOC requests.

Anyway, I'm now feeling motivated to go try some experiments.
Watch this space ...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sami Imseih 2025-05-09 17:47:19 Re: queryId constant squashing does not support prepared statements
Previous Message Peter Geoghegan 2025-05-09 17:30:01 Re: Adding skip scan (including MDAM style range skip scan) to nbtree