Re: accounting for memory used for BufFile during hash joins

From: Hubert Zhang <hzhang(at)pivotal(dot)io>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: accounting for memory used for BufFile during hash joins
Date: 2019-05-28 07:40:01
Message-ID: CAB0yre=e8ysPyoUvZqjKYAxc6-VB=JKHL-7XKZSxy0FT5vY7BQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, May 4, 2019 at 8:34 AM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

> The root cause is that hash join treats batches as pretty much free, but
> that's not really true - we do allocate two BufFile structs per batch,
> and each BufFile is ~8kB as it includes PGAlignedBuffer.
>
> The OOM is not very surprising, because with 524288 batches it'd need
> about 8GB of memory, and the system only has 8GB RAM installed.
>
> The second patch tries to enforce work_mem more strictly. That would be
> impossible if we were to keep all the BufFile structs in memory, so
> instead it slices the batches into chunks that fit into work_mem, and
> then uses a single "overflow" file for slices currently not in memory.
> These extra slices can't be counted into work_mem, but we should need
> just very few of them. For example with work_mem=4MB the slice is 128
> batches, so we need 128x less overflow files (compared to per-batch).
>
>
Hi Tomas

I read your second patch which uses overflow buf files to reduce the total
number of batches.
It would solve the hash join OOM problem what you discussed above: 8K per
batch leads to batch bloating problem.

I mentioned in another thread:

https://www.postgresql.org/message-id/flat/CAB0yrekv%3D6_T_eUe2kOEvWUMwufcvfd15SFmCABtYFOkxCFdfA%40mail.gmail.com
There is another hashjoin OOM problem which disables splitting batches too
early. PG uses a flag hashtable->growEnable to determine whether to split
batches. Once one splitting failed(all the tuples are assigned to only one
batch of two split ones) The growEnable flag would be turned off forever.

The is an opposite side of batch bloating problem. It only contains too few
batches and makes the in-memory hash table too large to fit into memory.

Here is the tradeoff: one batch takes more than 8KB(8KB makes sense, due to
performance), in-memory hash table takes memory as well and splitting
batched may(not must) reduce the in-memory hash table size but introduce
more batches(and thus more memory usage 8KB*#batch).
Can we conclude that it would be worth to splitting if satisfy:
(The reduced memory of in-memory hash table) - (8KB * number of new
batches) > 0

So I'm considering to combine our patch with your patch to fix join OOM
problem. No matter the OOM is introduced by (the memory usage of in-memory
hash table) or (8KB * number of batches).

nbatch_inmemory in your patch could also use the upper rule to redefine.

What's your opinion?

Thanks

Hubert Zhang

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2019-05-28 07:55:04 Re: Remove page-read callback from XLogReaderState.
Previous Message Amit Langote 2019-05-28 06:59:28 Re: Fix inconsistencies for v12