From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Tomas Vondra <tv(at)fuzzy(dot)cz> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: tweaking NTUP_PER_BUCKET |
Date: | 2014-07-11 10:38:03 |
Message-ID: | CA+U5nM+Tz_upfK7_s=hWOaKTC42Cb725sB_1A7KyMCQpGEZkCg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 11 July 2014 10:23, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
> On 11 Červenec 2014, 9:27, Simon Riggs wrote:
>> On 9 July 2014 18:54, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>>
>>> (1) size the buckets for NTUP_PER_BUCKET=1 (and use whatever number
>>> of batches this requires)
>>
>> If we start off by assuming NTUP_PER_BUCKET = 1, how much memory does
>> it save to recalculate the hash bucket at 10 instead?
>> Resizing sounds like it will only be useful of we only just overflow our
>> limit.
>>
>> If we release next version with this as a hardcoded change, my
>> understanding is that memory usage for hash joins will leap upwards,
>> even if the run time of queries reduces. It sounds like we need some
>> kind of parameter to control this. "We made it faster" might not be
>> true if we run this on servers that are already experiencing high
>> memory pressure.
>
> Sure. We certainly don't want to make things worse for environments with
> memory pressure.
>
> The current implementation has two issues regarding memory:
>
> (1) It does not include buckets into used memory, i.e. it's not included
> into work_mem (so we may overflow work_mem). I plan to fix this, to make
> work_mem a bit more correct, as it's important for cases with
> NTUP_PER_BUCKET=1.
>
> (2) There's a significant palloc overhead, because of allocating each
> tuple separately - see my message from yesterday, where I observed the
> batch memory context to get 1.4GB memory for 700MB of tuple data. By
> densely packing the tuples, I got down to ~700MB (i.e. pretty much no
> overhead).
>
> The palloc overhead seems to be 20B (on x86_64) per tuple, and eliminating
> this it more than compensates for ~8B per tuple, required for
> NTUP_PER_BUCKET=1. And fixing (1) makes it more correct / predictable.
>
> It also improves the issue that palloc overhead is not counted into
> work_mem at all (that's why I got ~1.4GB batch context with work_mem=1GB).
>
> So in the end this should give us much lower memory usage for hash joins,
> even if we switch to NTUP_PER_BUCKET=1 (although that's pretty much
> independent change). Does that seem reasonable?
Yes, that alleviates my concern. Thanks.
> Regarding the tunable to control this - I certainly don't want another GUC
> no one really knows how to set right. And I think it's unnecessary thanks
> to the palloc overhead / work_mem accounting fix, described above.
Agreed, nor does anyone.
> The one thing I'm not sure about is what to do in case of reaching the
> work_mem limit (which should only happen with underestimated row count /
> row width) - how to decide whether to shrink the hash table or increment
> the number of batches. But this is not exclusive to NTUP_PER_BUCKET=1, it
> may happen with whatever NTUP_PER_BUCKET value you choose.
>
> The current code does not support resizing at all, so it always increments
> the number of batches, but I feel an "interleaved" approach might be more
> appropriate (nbuckets/2, nbatches*2, nbuckets/2, nbatches*2, ...). It'd be
> nice to have some cost estimates ('how expensive is a rescan' vs. 'how
> expensive is a resize'), but I'm not sure how to get that.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Magnus Hagander | 2014-07-11 11:10:06 | Re: Missing autocomplete for CREATE DATABASE |
Previous Message | Fujii Masao | 2014-07-11 10:21:32 | Re: No exact/lossy block information in EXPLAIN ANALYZE for a bitmap heap scan |