Re: Memory leak from ExecutorState context?

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Memory leak from ExecutorState context?
Date: 2023-03-02 00:30:27
Message-ID: 41c5766d-ed71-b70c-bbbc-d3396c462d62@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/2/23 00:18, Jehan-Guillaume de Rorthais wrote:
> Hi,
>
> On Wed, 1 Mar 2023 20:29:11 +0100
> Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>> On 3/1/23 18:48, Jehan-Guillaume de Rorthais wrote:
>>> On Tue, 28 Feb 2023 20:51:02 +0100
>>> Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>>> On 2/28/23 19:06, Jehan-Guillaume de Rorthais wrote:
>>>>> * HashBatchContext goes up to 1441MB after 240s then stay flat until the
>>>>> end (400s as the last record)
>>>>
>>>> That's interesting. We're using HashBatchContext for very few things, so
>>>> what could it consume so much memory? But e.g. the number of buckets
>>>> should be limited by work_mem, so how could it get to 1.4GB?
>>>>
>>>> Can you break at ExecHashIncreaseNumBatches/ExecHashIncreaseNumBuckets
>>>> and print how many batches/butches are there?
>>>
>>> I did this test this morning.
>>>
>>> Batches and buckets increased really quickly to 1048576/1048576.
>>
>> OK. I think 1M buckets is mostly expected for work_mem=64MB. It means
>> buckets will use 8MB, which leaves ~56B per tuple (we're aiming for
>> fillfactor 1.0).
>>
>> But 1M batches? I guess that could be problematic. It doesn't seem like
>> much, but we need 1M files on each side - 1M for the hash table, 1M for
>> the outer relation. That's 16MB of pointers, but the files are BufFile
>> and we keep 8kB buffer for each of them. That's ~16GB right there :-(
>>
>> In practice it probably won't be that bad, because not all files will be
>> allocated/opened concurrently (especially if this is due to many tuples
>> having the same value). Assuming that's what's happening here, ofc.
>
> And I suppose they are close/freed concurrently as well?
>

Yeah. There can be different subsets of the files used, depending on
when the number of batches start to explode, etc.

>>> ExecHashIncreaseNumBatches was really chatty, having hundreds of thousands
>>> of calls, always short-cut'ed to 1048576, I guess because of the
>>> conditional block «/* safety check to avoid overflow */» appearing early in
>>> this function.
>>
>> Hmmm, that's a bit weird, no? I mean, the check is
>>
>> /* safety check to avoid overflow */
>> if (oldnbatch > Min(INT_MAX / 2, MaxAllocSize / (sizeof(void *) * 2)))
>> return;
>>
>> Why would it stop at 1048576? It certainly is not higher than INT_MAX/2
>> and with MaxAllocSize = ~1GB the second value should be ~33M. So what's
>> happening here?
>
> Indeed, not the good suspect. But what about this other short-cut then?
>
> /* do nothing if we've decided to shut off growth */
> if (!hashtable->growEnabled)
> return;
>
> [...]
>
> /*
> * If we dumped out either all or none of the tuples in the table, disable
> * further expansion of nbatch. This situation implies that we have
> * enough tuples of identical hashvalues to overflow spaceAllowed.
> * Increasing nbatch will not fix it since there's no way to subdivide the
> * group any more finely. We have to just gut it out and hope the server
> * has enough RAM.
> */
> if (nfreed == 0 || nfreed == ninmemory)
> {
> hashtable->growEnabled = false;
> #ifdef HJDEBUG
> printf("Hashjoin %p: disabling further increase of nbatch\n",
> hashtable);
> #endif
> }
>
> If I guess correctly, the function is not able to split the current batch, so
> it sits and hopes. This is a much better suspect and I can surely track this
> from gdb.
>

Yes, this would make much more sense - it'd be consistent with the
hypothesis that this is due to number of batches exploding (it's a
protection exactly against that).

You specifically mentioned the other check earlier, but now I realize
you've been just speculating it might be that.

> Being able to find what are the fields involved in the join could help as well
> to check or gather some stats about them, but I hadn't time to dig this yet...
>

It's going to be tricky, because all parts of the plan may be doing
something, and there may be multiple hash joins. So you won't know if
you're executing the part of the plan that's causing issues :-(

But I have another idea - put a breakpoint on makeBufFile() which is the
bit that allocates the temp files including the 8kB buffer, and print in
what context we allocate that. I have a hunch we may be allocating it in
the ExecutorState. That'd explain all the symptoms.

BTW with how many batches does the hash join start?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-03-02 00:34:40 Re: Non-superuser subscription owners
Previous Message Tom Lane 2023-03-02 00:26:30 Re: Making empty Bitmapsets always be NULL