Re: Performance degradation in TPC-H Q18

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance degradation in TPC-H Q18
Date: 2017-03-01 04:02:22
Message-ID: 20170301040222.3hp5s7rvbuow2mtr@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017-03-01 09:21:47 +0530, Robert Haas wrote:
> On Wed, Mar 1, 2017 at 8:48 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> >> BTW, another option to consider is lowering the target fillfactor.
> >> IIRC, Kuntal mentioned to me that cranking it down seemed to fix the
> >> issue. Obviously, it's better to detect when we need a lower
> >> fillfactor than to always use a lower one, but obviously the tighter
> >> you pack the hash table, the more likely it is that you're going to
> >> have these kinds of problems.
> >
> > Yea, that'd be one approach, but I feel better dynamically increasing
> > the fillfactor like in the patch. Even with a lower fillfactor you could
> > see issues, and as you say a higher fillfactor is nice [TM]; after the
> > patch I played with *increasing* the fillfactor, without being able to
> > measure a downside.
>
> I am going to bet that increasing the fillfactor would be a mistake.
> The expected length of a clump in the table is going to be about
> 1/(1-ff), which increases very quickly beyond the current value of
> 0.8. For example, if the actual fillfactor is 0.9 and the keys are
> perfectly distributed -- nine in a row and then one empty slot,
> lather, rinse, repeat -- probing for a key that's not there has a 10%
> chance of hitting an empty slot, a 10% chance of hitting the slot just
> before an empty slot, and so on. So the expected number of probes to
> find that there's no match is 4.5. However, it could easily happen
> that you have 3 or 4 empty slots in a row in one place and 27 or 36
> occupied slots in a row in another place, and that could cause all
> kinds of grief. Moreover, real-world hash functions on real-world
> data aren't always perfect, which increases the chances of things
> going off track. I'm not exactly sure how high a fillfactor is too
> high, but note that
> https://en.wikipedia.org/wiki/Hash_table#Open_addressing claims that
> "performance dramatically degrades when the load factor grows beyond
> 0.7 or so", which isn't cited but the same text appears in
> https://www.unf.edu/~wkloster/3540/wiki_book2.pdf for whatever that's
> worth.

That's without the "trick" of "robin hood" hashing though:
https://en.wikipedia.org/wiki/Hash_table#Robin_Hood_hashing
http://codecapsule.com/2013/11/17/robin-hood-hashing-backward-shift-deletion/
https://www.pvk.ca/Blog/2013/11/26/the-other-robin-hood-hashing/

it probably depends a bit on the scenario how high you want to have the
fillfactor - for hashaggs a high fill factor would probably be good
(they're often "read" heavy), for bitmapscans less so.

But I guess there's not much point in immediately increasing the
fillfactor, can do that in a second step.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2017-03-01 04:02:55 Re: \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)
Previous Message Peter Eisentraut 2017-03-01 03:58:06 Re: perlcritic