Re: Memory leak from ExecutorState context?

From: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
To: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Memory leak from ExecutorState context?
Date: 2023-04-20 16:42:40
Message-ID: 7ea0562c-25d1-32e0-40b5-48127a6f49ce@garret.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11.04.2023 8:14 PM, Jehan-Guillaume de Rorthais wrote:
> On Sat, 8 Apr 2023 02:01:19 +0200
> Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> wrote:
>
>> On Fri, 31 Mar 2023 14:06:11 +0200
>> Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> wrote:
>>
>> [...]
>>
>> After rebasing Tomas' memory balancing patch, I did some memory measures
>> to answer some of my questions. Please, find in attachment the resulting
>> charts "HJ-HEAD.png" and "balancing-v3.png" to compare memory consumption
>> between HEAD and Tomas' patch. They shows an alternance of numbers
>> before/after calling ExecHashIncreaseNumBatches (see the debug patch). I
>> didn't try to find the exact last total peak of memory consumption during the
>> join phase and before all the BufFiles are destroyed. So the last number
>> might be underestimated.
> I did some more analysis about the total memory consumption in filecxt of HEAD,
> v3 and v4 patches. My previous debug numbers only prints memory metrics during
> batch increments or hash table destruction. That means:
>
> * for HEAD: we miss the batches consumed during the outer scan
> * for v3: adds twice nbatch in spaceUsed, which is a rough estimation
> * for v4: batches are tracked in spaceUsed, so they are reflected in spacePeak
>
> Using a breakpoint in ExecHashJoinSaveTuple to print "filecxt->mem_allocated"
> from there, here are the maximum allocated memory for bufFile context for each
> branch:
>
> batches max bufFiles total spaceAllowed rise
> HEAD 16384 199966960 ~194MB
> v3 4096 65419456 ~78MB
> v4(*3) 2048 34273280 48MB nbatch*sizeof(PGAlignedBlock)*3
> v4(*4) 1024 17170160 60.6MB nbatch*sizeof(PGAlignedBlock)*4
> v4(*5) 2048 34273280 42.5MB nbatch*sizeof(PGAlignedBlock)*5
>
> It seems account for bufFile in spaceUsed allows a better memory balancing and
> management. The precise factor to rise spaceAllowed is yet to be defined. *3 or
> *4 looks good, but this is based on a single artificial test case.
>
> Also, note that HEAD is currently reporting ~4MB of memory usage. This is by
> far wrong with the reality. So even if we don't commit the balancing memory
> patch in v16, maybe we could account for filecxt in spaceUsed as a bugfix?
>
> Regards,

Thank you for the patch.
I  faced with the same problem (OOM caused by hash join).
I tried to create simplest test reproducing the problem:

create table t(pk int, val int);
insert into t values (generate_series(1,100000000),0);
set work_mem='64kB';
explain (analyze,buffers) select count(*) from t t1 join t t2 on
(t1.pk=t2.pk);

There are three workers and size of each exceeds 1.3Gb.

Plan is the following:

 Finalize Aggregate  (cost=355905977972.87..355905977972.88 rows=1
width=8) (actual time=2
12961.033..226097.513 rows=1 loops=1)
   Buffers: shared hit=32644 read=852474 dirtied=437947 written=426374,
temp read=944407 w
ritten=1130380
   ->  Gather  (cost=355905977972.65..355905977972.86 rows=2 width=8)
(actual time=212943.
505..226097.497 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=32644 read=852474 dirtied=437947
written=426374, temp read=94
4407 written=1130380
         ->  Partial Aggregate (cost=355905976972.65..355905976972.66
rows=1 width=8) (ac
tual time=212938.410..212940.035 rows=1 loops=3)
               Buffers: shared hit=32644 read=852474 dirtied=437947
written=426374, temp r
ead=944407 written=1130380
               ->  Parallel Hash Join (cost=1542739.26..303822614472.65
rows=20833345000002 width=0) (actual time=163268.274..207829.524
rows=33333333 loops=3)
                     Hash Cond: (t1.pk = t2.pk)
                     Buffers: shared hit=32644 read=852474
dirtied=437947 written=426374, temp read=944407 written=1130380
                     ->  Parallel Seq Scan on t t1
(cost=0.00..859144.78 rows=41666678 width=4) (actual
time=0.045..30828.051 rows=33333333 loops=3)
                           Buffers: shared hit=16389 read=426089 written=87
                     ->  Parallel Hash (cost=859144.78..859144.78
rows=41666678 width=4) (actual time=82202.445..82202.447 rows=33333333
loops=3)
                           Buckets: 4096 (originally 4096)  Batches:
32768 (originally 8192)  Memory Usage: 192kB
                           Buffers: shared hit=16095 read=426383
dirtied=437947 written=426287, temp read=267898 written=737164
                           ->  Parallel Seq Scan on t t2
(cost=0.00..859144.78 rows=41666678 width=4) (actual
time=0.054..12647.534 rows=33333333 loops=3)
                                 Buffers: shared hit=16095 read=426383
dirtied=437947 writ
ten=426287
 Planning:
   Buffers: shared hit=69 read=38
 Planning Time: 2.819 ms
 Execution Time: 226113.292 ms
(22 rows)

-----------------------------

So we have increased number of batches to 32k.
I applied your patches 0001-0004 but unfortunately them have not reduced
memory consumption - still size of each backend is more than 1.3Gb.

I wonder what can be the prefered solution of the problem?
We have to limit size of hash table which we can hold in memory.
And number of batches is calculate as inner relation size divided by
hash table size.
So for arbitrary large inner relation we can get arbitrary large numer
of batches which may consume arbitrary larger amount of memory.
We should either prohibit further increase of number of batches - it
will not solve the problem completely but at least in the test above prevent
increase of number of batches from 8k to 32k, either prohibit use of
hash join in this case at all (assign very high cost to this path).

Also I winder why do we crete so larger number of files for each batch?
Can it be reduced?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-04-20 16:43:48 Re: LLVM strip -x fails
Previous Message Justin Pryzby 2023-04-20 15:50:45 Re: Wrong results from Parallel Hash Full Join