Re: Performance degradation in TPC-H Q18

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance degradation in TPC-H Q18
Date: 2017-03-01 03:51:47
Message-ID: CA+TgmoZrytS7ZTYXWqU9yznCsoMc3h-UV7jWAoWZcUwcGO5+DQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 1, 2017 at 8:48 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> BTW, another option to consider is lowering the target fillfactor.
>> IIRC, Kuntal mentioned to me that cranking it down seemed to fix the
>> issue. Obviously, it's better to detect when we need a lower
>> fillfactor than to always use a lower one, but obviously the tighter
>> you pack the hash table, the more likely it is that you're going to
>> have these kinds of problems.
>
> Yea, that'd be one approach, but I feel better dynamically increasing
> the fillfactor like in the patch. Even with a lower fillfactor you could
> see issues, and as you say a higher fillfactor is nice [TM]; after the
> patch I played with *increasing* the fillfactor, without being able to
> measure a downside.

I am going to bet that increasing the fillfactor would be a mistake.
The expected length of a clump in the table is going to be about
1/(1-ff), which increases very quickly beyond the current value of
0.8. For example, if the actual fillfactor is 0.9 and the keys are
perfectly distributed -- nine in a row and then one empty slot,
lather, rinse, repeat -- probing for a key that's not there has a 10%
chance of hitting an empty slot, a 10% chance of hitting the slot just
before an empty slot, and so on. So the expected number of probes to
find that there's no match is 4.5. However, it could easily happen
that you have 3 or 4 empty slots in a row in one place and 27 or 36
occupied slots in a row in another place, and that could cause all
kinds of grief. Moreover, real-world hash functions on real-world
data aren't always perfect, which increases the chances of things
going off track. I'm not exactly sure how high a fillfactor is too
high, but note that
https://en.wikipedia.org/wiki/Hash_table#Open_addressing claims that
"performance dramatically degrades when the load factor grows beyond
0.7 or so", which isn't cited but the same text appears in
https://www.unf.edu/~wkloster/3540/wiki_book2.pdf for whatever that's
worth.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-03-01 03:58:06 Re: perlcritic
Previous Message Andres Freund 2017-03-01 03:49:23 Re: Performance degradation in TPC-H Q18