Re: accounting for memory used for BufFile during hash joins

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: accounting for memory used for BufFile during hash joins
Date: 2019-05-06 21:58:09
Message-ID: CAAKRu_ZTy=a2jqj0++jDwke1_ssJTBB0ezU2t8OwogmXDbYdYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 3, 2019 at 5:34 PM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

>
> The second patch tries to enforce work_mem more strictly. That would be
> impossible if we were to keep all the BufFile structs in memory, so
> instead it slices the batches into chunks that fit into work_mem, and
> then uses a single "overflow" file for slices currently not in memory.
> These extra slices can't be counted into work_mem, but we should need
> just very few of them. For example with work_mem=4MB the slice is 128
> batches, so we need 128x less overflow files (compared to per-batch).
>
> I want to see if I understand the implications of the per-slice-overflow
patch
for execution of hashjoin:
For each bucket in the hashtable, when attempting to double the number of
batches, if the memory that the BufFile structs will occupy once this is
done
will exceed the work_mem, split each batch into slices that fit into memory.
This means that, for each probe-side tuple hashing to that bucket, you have
to
load every slice of each batch separately into memory to ensure correct
results.
Is this right?

>
> I'm not entirely sure which of those approaches is the right one. The
> first one is clearly just a "damage control" for cases where the hash
> side turned out to be much larger than we expected. With good estimates
> we probably would not have picked a hash join for those (that is, we
> should have realized we can't keep work_mem and prohibit hash join).
>
> The second patch however makes hash join viable for some of those cases,
> and it seems to work pretty well (there are some numbers in the message
> posted to pgsql-performance thread). So I kinda like this second one.
>
> So, my initial reaction after taking a look at the patches is that I
prefer the
first approach--increasing the resize threshhold. The second patch, the
per-slice-overflow patch, adds a major new mechanic to hashjoin in order to
address what is, based on my understanding, an edge case.

--
Melanie Plageman

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2019-05-06 22:07:04 Re: range_agg
Previous Message Peter Geoghegan 2019-05-06 21:51:38 Re: _bt_split(), and the risk of OOM before its critical section