Re: Getting better results from valgrind leak tracking

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Getting better results from valgrind leak tracking
Date: 2021-03-17 02:31:01
Message-ID: 20210317023101.anvejcfotwka6gaa@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

David, there's a question about a commit of yours below, hence adding
you.

On 2021-03-16 19:36:10 -0400, Tom Lane wrote:
> Out of curiosity I poked at this for a little while.

Cool.

> It doesn't appear to me that we leak much at all, at least not if you
> are willing to take "still reachable" blocks as not-leaked.

Well, I think for any sort of automated testing - which I think would be
useful - we'd really need *no* leaks. I know that I get a few bleats
whenever I forget to set --leak-check=no. It's also not just postgres
itself, but some of the helper tools...

And it's not just valgrind, also gcc/clang sanitizers...

> What I found out is that we have a lot of issues that seem to devolve
> to valgrind not being sure that a block is referenced. I identified
> two main causes of that:
>
> (1) We have a pointer, but it doesn't actually point right at the start
> of the block. A primary culprit here is lists of thingies that use the
> slist and dlist infrastructure. As an experiment, I moved the dlist_node
> fields of some popular structs to the beginning, and verified that that
> silences associated complaints. I'm not sure that we want to insist on
> put-the-link-first as policy (although if we did, it could provide some
> notational savings perhaps). However, unless someone knows of a way to
> teach valgrind about this situation, there may be no other way to silence
> those leakage complaints. A secondary culprit is people randomly applying
> CACHELINEALIGN or the like to a palloc'd address, so that the address we
> have isn't pointing right at the block start.

Hm, do you still have a backtrace / suppression for one of those? I
didn't see any in a quick (*) serial installcheck I just ran. Or I
wasn't able to pinpoint them to this issue.

I think the run might have shown a genuine leak:

==2048803== 16 bytes in 1 blocks are definitely lost in loss record 139 of 906
==2048803== at 0x89D2EA: palloc (mcxt.c:975)
==2048803== by 0x2392D3: heap_beginscan (heapam.c:1198)
==2048803== by 0x264E8F: table_beginscan_strat (tableam.h:918)
==2048803== by 0x265994: systable_beginscan (genam.c:453)
==2048803== by 0x83C2D1: SearchCatCacheMiss (catcache.c:1359)
==2048803== by 0x83C197: SearchCatCacheInternal (catcache.c:1299)
==2048803== by 0x83BE9A: SearchCatCache1 (catcache.c:1167)
==2048803== by 0x85876A: SearchSysCache1 (syscache.c:1134)
==2048803== by 0x84CDB3: RelationInitTableAccessMethod (relcache.c:1795)
==2048803== by 0x84F807: RelationBuildLocalRelation (relcache.c:3554)
==2048803== by 0x303C9D: heap_create (heap.c:395)
==2048803== by 0x305790: heap_create_with_catalog (heap.c:1291)
==2048803== by 0x41A327: DefineRelation (tablecmds.c:885)
==2048803== by 0x6C96B6: ProcessUtilitySlow (utility.c:1131)
==2048803== by 0x6C948A: standard_ProcessUtility (utility.c:1034)
==2048803== by 0x6C865F: ProcessUtility (utility.c:525)
==2048803== by 0x6C7409: PortalRunUtility (pquery.c:1159)
==2048803== by 0x6C7636: PortalRunMulti (pquery.c:1305)
==2048803== by 0x6C6B11: PortalRun (pquery.c:779)
==2048803== by 0x6C05AB: exec_simple_query (postgres.c:1173)
==2048803==
{
<insert_a_suppression_name_here>
Memcheck:Leak
match-leak-kinds: definite
fun:palloc
fun:heap_beginscan
fun:table_beginscan_strat
fun:systable_beginscan
fun:SearchCatCacheMiss
fun:SearchCatCacheInternal
fun:SearchCatCache1
fun:SearchSysCache1
fun:RelationInitTableAccessMethod
fun:RelationBuildLocalRelation
fun:heap_create
fun:heap_create_with_catalog
fun:DefineRelation
fun:ProcessUtilitySlow
fun:standard_ProcessUtility
fun:ProcessUtility
fun:PortalRunUtility
fun:PortalRunMulti
fun:PortalRun
fun:exec_simple_query
}

Since 56788d2156fc heap_beginscan() allocates
scan->rs_base.rs_private =
palloc(sizeof(ParallelBlockTableScanWorkerData));
in heap_beginscan(). But doesn't free it in heap_endscan().

In most of the places heap scans are begun inside transient contexts,
but not always. In the above trace for example
RelationBuildLocalRelation switched to CacheMemoryContext, and nothing
switched to something else.

I'm a bit confused about the precise design of rs_private /
ParallelBlockTableScanWorkerData, specifically why it's been added to
TableScanDesc, instead of just adding it to HeapScanDesc? And why is it
allocated unconditionally, instead of just for parallel scans?

I don't think this is a false positive, even though it theoretically
could be freed by resetting CacheMemoryContext (see below)?

I saw a lot of false positives from autovacuum workers, because
AutovacMemCxt is never deleted, and because table_toast_map is created
in TopMemoryContext. Adding an explicit
MemoryContextDelete(AutovacMemCxt) and parenting table_toast_map in that
shut that up.

Which brings me to the question why allocations in CacheMemoryContext,
AutovacMemCxt are considered to be "lost", even though they're still
"reachable" via a context reset. I think valgrind ends up treating
memory allocated via memory contexts that still exist at process end as
lost, regardless of being reachable via the the memory pool (from
valgrinds view). Which I guess actually makes sense, for things like
TopMemoryContext and CacheContext - anything not reachable by means
other than a context reset is effectively lost for those.

For autovac launcher / worker it seems like a sensible thing to just
delete AutovacMemCxt.

> (2) The only pointer to the start of a block is actually somewhere within
> the block. This is common in dynahash tables, where we allocate a slab
> of entries in a single palloc and then thread them together. Each entry
> should have exactly one referencing pointer, but that pointer is more
> likely to be elsewhere within the same palloc block than in the external
> hash bucket array. AFAICT, all cases except where the slab's first entry
> is pointed to by a hash bucket pointer confuse valgrind to some extent.
> I was able to hack around this by preventing dynahash from allocating
> more than one hash entry per palloc, but I wonder if there's a better way.

Hm. For me the number of leaks seem to stay the same with or without
your changes related to this. Is this a USE_VALGRIND build?

I'm using valgrind-3.16.1

> Attached is a very crude hack, not meant for commit, that hacks things
> up enough to greatly reduce the number of complaints with
> "--leak-check=full".
>
> One thing I've failed to silence so far is a bunch of entries like
>
> ==00:00:03:56.088 3467702== 1,861 bytes in 67 blocks are definitely lost in loss record 1,290 of 1,418
> ==00:00:03:56.088 3467702== at 0x950650: MemoryContextAlloc (mcxt.c:827)
> ==00:00:03:56.088 3467702== by 0x951710: MemoryContextStrdup (mcxt.c:1179)
> ==00:00:03:56.088 3467702== by 0x91C86E: RelationInitIndexAccessInfo (relcache.c:1444)
> ==00:00:03:56.088 3467702== by 0x91DA9C: RelationBuildDesc (relcache.c:1200)
>
> which is complaining about the memory context identifiers for system
> indexes' rd_indexcxt contexts. Those are surely not being leaked in
> any real sense. I suspect that this has something to do with valgrind
> not counting the context->ident fields as live pointers, but I don't
> have enough valgrind-fu to fix that.

Yea. I suspect it's related to the fact that we mark the memory as a
valgrind mempool, I'll try to investigate.

I do see a bunch of leaks bleats below fun:plpgsql_compile that I don't
yet understand. E.g.

==2054558== 32 bytes in 1 blocks are definitely lost in loss record 284 of 913
==2054558== at 0x89D389: palloc (mcxt.c:975)
==2054558== by 0x518732: new_list (list.c:134)
==2054558== by 0x518C0C: lappend (list.c:341)
==2054558== by 0x83CAE7: SearchCatCacheList (catcache.c:1691)
==2054558== by 0x859A9C: SearchSysCacheList (syscache.c:1447)
==2054558== by 0x313192: FuncnameGetCandidates (namespace.c:975)
==2054558== by 0x313D91: FunctionIsVisible (namespace.c:1450)
==2054558== by 0x7C2891: format_procedure_extended (regproc.c:375)
==2054558== by 0x7C27C3: format_procedure (regproc.c:324)
==2054558== by 0xA7693E1: do_compile (pl_comp.c:348)
==2054558== by 0xA769130: plpgsql_compile (pl_comp.c:224)

and

==2054558== 30 bytes in 4 blocks are definitely lost in loss record 225 of 913
==2054558== at 0x89D389: palloc (mcxt.c:975)
==2054558== by 0x3ADDAE: downcase_identifier (scansup.c:52)
==2054558== by 0x3ADD85: downcase_truncate_identifier (scansup.c:39)
==2054558== by 0x3AB5E4: core_yylex (scan.l:1032)
==2054558== by 0xA789B2D: internal_yylex (pl_scanner.c:321)
==2054558== by 0xA7896E3: plpgsql_yylex (pl_scanner.c:152)
==2054558== by 0xA780015: plpgsql_yyparse (pl_gram.c:1945)
==2054558== by 0xA76A652: do_compile (pl_comp.c:788)
==2054558== by 0xA769130: plpgsql_compile (pl_comp.c:224)
==2054558== by 0xA78948F: plpgsql_validator (pl_handler.c:539)

Based on the quick look I had (dinner is calling) I didn't yet
understand how plpgsql_compile_tmp_cxt error handling works.

Greetings,

Andres Freund

(*) or not so quick, I had to figure out why valgrind was so slow. It
turned out that I had typed shared_buffers=32MB into
shared_buffers=32GB...

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message torikoshia 2021-03-17 02:35:45 Re: Should we improve "PID XXXX is not a PostgreSQL server process" warning for pg_terminate_backend(<<postmaster_pid>>)?
Previous Message Tomas Vondra 2021-03-17 02:28:00 Re: Index Skip Scan (new UniqueKeys)