Re: Memory leak from ExecutorState context?

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Memory leak from ExecutorState context?
Date: 2023-03-01 19:29:11
Message-ID: 3013398b-316c-638f-2a73-3783e8e2ef02@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/1/23 18:48, Jehan-Guillaume de Rorthais wrote:
> Hi Tomas,
>
> On Tue, 28 Feb 2023 20:51:02 +0100
> Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>> On 2/28/23 19:06, Jehan-Guillaume de Rorthais wrote:
>>> * HashBatchContext goes up to 1441MB after 240s then stay flat until the end
>>> (400s as the last record)
>>
>> That's interesting. We're using HashBatchContext for very few things, so
>> what could it consume so much memory? But e.g. the number of buckets
>> should be limited by work_mem, so how could it get to 1.4GB?
>>
>> Can you break at ExecHashIncreaseNumBatches/ExecHashIncreaseNumBuckets
>> and print how many batches/butches are there?
>
> I did this test this morning.
>
> Batches and buckets increased really quickly to 1048576/1048576.
>

OK. I think 1M buckets is mostly expected for work_mem=64MB. It means
buckets will use 8MB, which leaves ~56B per tuple (we're aiming for
fillfactor 1.0).

But 1M batches? I guess that could be problematic. It doesn't seem like
much, but we need 1M files on each side - 1M for the hash table, 1M for
the outer relation. That's 16MB of pointers, but the files are BufFile
and we keep 8kB buffer for each of them. That's ~16GB right there :-(

In practice it probably won't be that bad, because not all files will be
allocated/opened concurrently (especially if this is due to many tuples
having the same value). Assuming that's what's happening here, ofc.

> ExecHashIncreaseNumBatches was really chatty, having hundreds of thousands of
> calls, always short-cut'ed to 1048576, I guess because of the conditional block
> «/* safety check to avoid overflow */» appearing early in this function.
>

Hmmm, that's a bit weird, no? I mean, the check is

/* safety check to avoid overflow */
if (oldnbatch > Min(INT_MAX / 2, MaxAllocSize / (sizeof(void *) * 2)))
return;

Why would it stop at 1048576? It certainly is not higher than INT_MAX/2
and with MaxAllocSize = ~1GB the second value should be ~33M. So what's
happening here?

> I disabled the breakpoint on ExecHashIncreaseNumBatches a few time to make the
> query run faster. Enabling it at 19.1GB of memory consumption, it stayed
> silent till the memory exhaustion (around 21 or 22GB, I don't remember exactly).
>
> The breakpoint on ExecHashIncreaseNumBuckets triggered some times at beginning,
> and a last time close to the end of the query execution.
>
>>> Any idea? How could I help to have a better idea if a leak is actually
>>> occurring and where exactly?
>>
>> Investigating memory leaks is tough, especially for generic memory
>> contexts like ExecutorState :-( Even more so when you can't reproduce it
>> on a machine with custom builds.
>>
>> What I'd try is this:
>>
>> 1) attach breakpoints to all returns in AllocSetAlloc(), printing the
>> pointer and size for ExecutorState context, so something like
>>
>> break aset.c:783 if strcmp("ExecutorState",context->header.name) == 0
>> commands
>> print MemoryChunkGetPointer(chunk) size
>> cont
>> end
>>
>> 2) do the same for AllocSetFree()
>>
>> 3) Match the palloc/pfree calls (using the pointer address), to
>> determine which ones are not freed and do some stats on the size.
>> Usually there's only a couple distinct sizes that account for most of
>> the leaked memory.
>
> So here is what I end up with this afternoon, using file, lines and macro from
> REL_11_18:
>
> set logging on
> set pagination off
>
> break aset.c:781 if strcmp("ExecutorState",context.name) == 0
> commands 1
> print (((char *)(chunk)) + sizeof(struct AllocChunkData))
> print chunk->size
> cont
> end
>
> break aset.c:820 if strcmp("ExecutorState",context.name) == 0
> commands 2
> print (((char *)(chunk)) + sizeof(struct AllocChunkData))
> print chunk->size
> cont
> end
>
> break aset.c:979 if strcmp("ExecutorState",context.name) == 0
> commands 3
> print (((char *)(chunk)) + sizeof(struct AllocChunkData))
> print chunk->size
> cont
> end
>
> break AllocSetFree if strcmp("ExecutorState",context.name) == 0
> commands 4
> print pointer
> cont
> end
>
> So far, gdb had more than 3h of CPU time and is eating 2.4GB of memory. The
> backend had only 3'32" of CPU time:
>
> VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 2727284 2.4g 17840 R 99.0 7.7 181:25.07 gdb
> 9054688 220648 103056 t 1.3 0.7 3:32.05 postmaster
>
> Interestingly, the RES memory of the backend did not explode yet, but VIRT is
> already high.
>
> I suppose the query will run for some more hours, hopefully, gdb will not
> exhaust the memory in the meantime...
>
> You'll find some intermediate stats I already collected in attachment:
>
> * break 1, 2 and 3 are from AllocSetAlloc, break 4 is from AllocSetFree.
> * most of the non-free'd chunk are allocated since the very beginning, before
> the 5000's allocation call for almost 1M call so far.
> * 3754 of them have a chunk->size of 0
> * it seems there's some buggy stats or data:
> # this one actually really comes from the gdb log
> 0x38a77b8: break=3 num=191 sz=4711441762604810240 (weird sz)
> # this one might be a bug in my script
> 0x2: break=2 num=945346 sz=2 (weird address)
> * ignoring the weird size requested during the 191st call, the total amount
> of non free'd memory is currently 5488MB
>
> I couldn't print "size" as it is optimzed away, that's why I tracked
> chunk->size... Is there anything wrong with my current run and gdb log?
>

There's definitely something wrong. The size should not be 0, and
neither it should be > 1GB. I suspect it's because some of the variables
get optimized out, and gdb just uses some nonsense :-(

I guess you'll need to debug the individual breakpoints, and see what's
available. It probably depends on the compiler version, etc. For example
I don't see the "chunk" for breakpoint 3, but "chunk_size" works and I
can print the chunk pointer with a bit of arithmetics:

p (block->freeptr - chunk_size)

I suppose similar gympastics could work for the other breakpoints.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2023-03-01 19:34:08 Re: Memory leak from ExecutorState context?
Previous Message Gregory Stark (as CFM) 2023-03-01 19:26:21 Re: Allow +group in pg_ident.conf