Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
Date: 2014-05-18 18:49:10
Message-ID: 24689.1400438950@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2014-05-17 22:55:14 +0200, Tomas Vondra wrote:
>> And another memory context stats for a session executing CREATE INDEX,
>> while having allocated The interesting thing is there are ~11k lines
>> that look exactly like this:
>>
>> pg_namespace_oid_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used

> Heh. That certainly doesn't look right. I bet it's the contexts from
> RelationInitIndexAccessInfo().

> Looks like there generally are recursion 'troubles' with system
> indexes. RelationClearRelation() will mark them as invalid causing any
> further lookup to do RelationBuildDesc() calls. Which will again rebuild
> the same relation if it's used somewhere inside RelationBuildDesc() or
> RelationInitIndexAccessInfo(). From a quick look it looks like it should
> resolve itself after some time. Even freeing the superflous memory
> contexts. But I am not sure if there scenarios where that won't
> happen...

I've identified a couple of memory leakage scenarios in relcache.c:

1. The RelationCacheInsert macro just blithely ignores the possibility
that hash_search(HASH_ENTER) returns found = true. There's a comment
/* used to give notice if found -- now just keep quiet */
which AFAICT was there when we got the code from Berkeley. Well, it
turns out that that's a can-happen scenario, at least for a few system
catalogs and indexes that are referenced during relcache build. The
scenario is that we come in through RelationIdGetRelation, don't find
a cache entry for, say, pg_index, and set about building one in
RelationBuildDesc. Somewhere in the guts of that we have need to read
pg_index, whereupon there's a recursive call of RelationIdGetRelation,
which still doesn't find the entry, so we again call RelationBuildDesc
which builds an entirely separate Relation structure. The recursion
does terminate thanks to the recursion-stopping provisions in relcache.c,
so the inner call finishes and enters its completed Relation structure
into the RelationIdCache hashtable. Control returns out to the outer
invocation of RelationBuildDesc, which finishes, and enters its completed
Relation structure into the RelationIdCache hashtable --- overwriting the
link to the Relation made by the inner invocation. That Relation and
subsidiary data are now unreferenced and permanently leaked in
CacheMemoryContext. If it's an index you can see its leaked subsidiary
rd_indexcxt in a memory dump, which is what we're looking at above.

AFAICT, the inner invocation's Relation should always have zero reference
count by the time we get back to the outer invocation. Therefore it
should be possible for RelationCacheInsert to just delete the
about-to-be-unreachable Relation struct. I'm experimenting with a patch
that adds logic like this to RelationCacheInsert:

if (found)
{
Relation oldrel = idhentry->reldesc;
idhentry->reldesc = RELATION;
if (RelationHasReferenceCountZero(oldrel))
RelationDestroyRelation(oldrel, false);
else
elog(WARNING, "leaking still-referenced duplicate relation");
}

and so far it looks good.

2. There's a much smaller leak in AttrDefaultFetch: it doesn't
bother to pfree the result of TextDatumGetCString(). This leakage
occurs in the caller's context not CacheMemoryContext, so it's only
query lifespan not session lifespan, and it would not ordinarily be
significant --- but with the CLOBBER_CACHE logic enabled, we rebuild
some relcache entries a damn lot of times within some queries, so the
leak adds up.

With both of these things fixed, I'm not seeing any significant memory
bloat during the first parallel group of the regression tests. I don't
think I'll have the patience to let it run much further than that
(the uuid and enum tests are still running after an hour :-().

BTW, it strikes me that we could probably improve the runtime of the
CLOBBER tests noticeably if we were to nail AttrDefaultIndexId,
IndexIndrelidIndexId, and ConstraintRelidIndexId into cache. I see
no very good reason not to do that; it should help performance a bit
in normal cases too.

While I'm at it: I could not help noticing RememberToFreeTupleDescAtEOX,
which was not there last time I looked at this code. Isn't that broken
by design? It's basically a deliberately induced transaction-lifespan
memory leak, and AFAICS if it does anything at all, it's supporting
incorrect calling code. There should *never* be any situation where it's
not okay to free a tupledesc with zero refcount. And the comment
justifying it is insane on its face:

* If we Rebuilt a relcache entry during a transaction then its
* possible we did that because the TupDesc changed as the result of
* an ALTER TABLE that ran at less than AccessExclusiveLock. It's
* possible someone copied that TupDesc, in which case the copy would
* point to free'd memory. So if we rebuild an entry we keep the

If someone copied the tupdesc, how would freeing the original result
in the copy being damaged? And even more to the point, if anyone is
doing anything that involves inspecting a table's rowtype, how could
it possibly be safe for someone else to be altering the rowtype with
less than exclusive lock? Unless I hear an adequate defense of that
code PDQ, it's going away.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2014-05-18 20:08:41 Re: 9.4 release notes
Previous Message Sergey Muraviov 2014-05-18 17:46:26 Re: wrapping in extended mode doesn't work well with default pager