Re: accounting for memory used for BufFile during hash joins

From: Hubert Zhang <hzhang(at)pivotal(dot)io>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: accounting for memory used for BufFile during hash joins
Date: 2019-05-28 09:39:58
Message-ID: CAB0yremREmLkEb9A=3CpRDpXr4mPepGz2kQtonteBvp80mSwog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tomas,

Here is the patch, it's could be compatible with your patch and it focus on
when to regrow the batch.

On Tue, May 28, 2019 at 3:40 PM Hubert Zhang <hzhang(at)pivotal(dot)io> wrote:

> On Sat, May 4, 2019 at 8:34 AM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
> wrote:
>
>> The root cause is that hash join treats batches as pretty much free, but
>> that's not really true - we do allocate two BufFile structs per batch,
>> and each BufFile is ~8kB as it includes PGAlignedBuffer.
>>
>> The OOM is not very surprising, because with 524288 batches it'd need
>> about 8GB of memory, and the system only has 8GB RAM installed.
>>
>> The second patch tries to enforce work_mem more strictly. That would be
>> impossible if we were to keep all the BufFile structs in memory, so
>> instead it slices the batches into chunks that fit into work_mem, and
>> then uses a single "overflow" file for slices currently not in memory.
>> These extra slices can't be counted into work_mem, but we should need
>> just very few of them. For example with work_mem=4MB the slice is 128
>> batches, so we need 128x less overflow files (compared to per-batch).
>>
>>
> Hi Tomas
>
> I read your second patch which uses overflow buf files to reduce the total
> number of batches.
> It would solve the hash join OOM problem what you discussed above: 8K per
> batch leads to batch bloating problem.
>
> I mentioned in another thread:
>
> https://www.postgresql.org/message-id/flat/CAB0yrekv%3D6_T_eUe2kOEvWUMwufcvfd15SFmCABtYFOkxCFdfA%40mail.gmail.com
> There is another hashjoin OOM problem which disables splitting batches too
> early. PG uses a flag hashtable->growEnable to determine whether to split
> batches. Once one splitting failed(all the tuples are assigned to only one
> batch of two split ones) The growEnable flag would be turned off forever.
>
> The is an opposite side of batch bloating problem. It only contains too
> few batches and makes the in-memory hash table too large to fit into memory.
>
> Here is the tradeoff: one batch takes more than 8KB(8KB makes sense, due
> to performance), in-memory hash table takes memory as well and splitting
> batched may(not must) reduce the in-memory hash table size but introduce
> more batches(and thus more memory usage 8KB*#batch).
> Can we conclude that it would be worth to splitting if satisfy:
> (The reduced memory of in-memory hash table) - (8KB * number of new
> batches) > 0
>
> So I'm considering to combine our patch with your patch to fix join OOM
> problem. No matter the OOM is introduced by (the memory usage of
> in-memory hash table) or (8KB * number of batches).
>
> nbatch_inmemory in your patch could also use the upper rule to redefine.
>
> What's your opinion?
>
> Thanks
>
> Hubert Zhang
>

--
Thanks

Hubert Zhang

Attachment Content-Type Size
0001-Allow-to-continue-to-split-batch-when-tuples-become-.patch application/octet-stream 4.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Gierth 2019-05-28 09:48:08 Re: Remove useless associativity/precedence from parsers
Previous Message Fabien COELHO 2019-05-28 07:59:37 Re: Why does pg_checksums -r not have a long option?