Re: bad estimation together with large work_mem generates terrible slow hash joins

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: bad estimation together with large work_mem generates terrible slow hash joins
Date: 2014-10-09 20:28:53
Message-ID: 1412886533.77040.YahooMailNeo@web122306.mail.ne1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
> On 9.10.2014 16:55, Kevin Grittner wrote:

>> I've tried various other tests using \timing rather than EXPLAIN, and
>> the patched version looks even better in those cases. I have seen up
>> to 4x the performance for a query using the patched version, higher
>> variability in run time without the patch, and have yet to devise a
>> benchmark where the patched version came out slower (although I admit
>> to not being as good at creating such cases as some here).
>
> Nice. Thanks for the testing!
>
> The only case I've been able to come up with is when the hash table fits
> into work_mem only thanks to not counting the buckets. The new code will
> start batching in this case.

Hmm. If you look at the timings in my posting from 2014-10-01 I
had cases where the patched version started with one batch and went
to two, while the unpatched used just one batch, and the patched
version was still more than twice as fast. I'm sure the "on disk"
batch was fully cached, however; it might work out differently if
disk speed actually came into the picture.

> That is mostly luck, however, because it depends on the cardinality
> estimates, and the best way to fix it is increasing work_mem (which is
> safer thanks to reducing the overhead).
>
> Also, Robert proposed a way to mitigate this, if we realize we'd have to
> do batching during the initial sizing, we can peek whether reducing the
> number of buckets (to 1/2 or maybe 1/4) would help. I believe this is a
> good approach, and will look into that after pgconf.eu (i.e. early
> November), unless someone else is interested.

Sure, but it would be good to confirm that it's actually needed first.

>> When given a generous work_mem setting the patched version often uses
>> more of what it is allowed than the unpatched version (which is
>> probably one of the reasons it tends to do better). If someone has
>> set a high work_mem and has gotten by only because the configured
>> amount is not all being used when it would benefit performance, they
>> may need to adjust work_mem down to avoid memory problems. That
>> doesn't seem to me to be a reason to reject the patch.
>
> I'm not entirely sure I understand this paragraph. What do you mean by
> "configured amount is not all being used when it would benefit
> performance"? Can you give an example?

Again, the earlier post showed that while the unpatched used 3516kB
whether it had work_mem set to 4MB or 1GB, the patched version used
3073kB when work_mem was set to 4MB and 4540kB when work_mem was
set to 1GB. The extra memory allowed the patched version to stay
at 1 batch, improving performance over the other setting.

> The only thing I can think of is the batching behavior described above.
>
>> This is in "Waiting on Author" status only because I never got an
>> answer about why the debug code used printf() rather the elog() at
>> a DEBUG level. Other than that, I would say this patch is Ready
>> for Committer. Tomas? You there?
>
> I think I responded to that on October 2, quoting:
>
> ===================================================================
> On 2.10.2014 09:50, Tomas Vondra wrote:
>> On 2.10.2014, 2:20, Kevin Grittner wrote:
>>>
>>> The patch applied and built without problem, and pass `make
>>> check-world`. The only thing that caught my eye was the addition of
>>> debug code using printf() instead of logging at a DEBUG level. Is
>>> there any reason for that?
>>
>> Not really. IIRC the main reason it that the other code in
>> nodeHash.c uses the same approach.
>===================================================================

Ah, I never received that email. That tends to happen every now
and then. :-(

> I believe it's safe to switch the logging to elog(). IMHO the printf
> logging is there from some very early version of the code, before elog
> was introduced. Or something like that.

I guess it might be best to remain consistent in this patch and
change that in a separate patch. I just wanted to make sure you
didn't see any reason not to do so.

With that addressed I will move this to Ready for Committer. Since
both Heikki and Robert spent time on this patch earlier, I'll give
either of them a shot at committing it if they want; otherwise I'll
do it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-10-09 20:46:11 Re: pg_upgrade, locale and encoding
Previous Message MauMau 2014-10-09 20:24:28 Re: [9.4 bug] The database server hangs with write-heavy workload on Windows