Quick Links

Re: Comment fix and question about dshash.c

From:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Antonin Houska <ah(at)cybertec(dot)at>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Comment fix and question about dshash.c
Date:	2018-10-28 02:27:11
Message-ID:	CAEepm=3f5Uvu=6GB2PXOPEOtQ-EgqSKOxZ_emonhLsLYK5uAZg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sun, Oct 28, 2018 at 1:22 AM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> On 10/27/2018 01:51 PM, Antonin Houska wrote:
> > Antonin Houska <ah(at)cybertec(dot)at> wrote:
> >> Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> >>> On Sat, Oct 27, 2018 at 12:30 AM Antonin Houska <ah(at)cybertec(dot)at> wrote:
> >>> Are you saying there is a bug in this logic (which is nbuckets * 0.75
> >>> expressed as integer maths), or saying that 0.75 is not a good maximum
> >>> load factor? I looked around at a couple of general purpose hash
> >>> tables and saw that some used 0.75 and some used 1.0, as a wasted
> >>> space-vs-collision trade-off. If I have my maths right[1], with 0.75
> >>> you expect to have 75 entries in ~53 buckets, but with 1.0 you expect
> >>> to have 100 entries in ~64 buckets.
> >>
> >> I don't know how exactly you apply the [1] formula (what is "n" and what is
> >> "N" in your case?), but my consideration was much simpler: For example, if
> >> BUCKETS_PER_PARTITION returns 8 (power of 2 is expected here and also more
> >> convenient), then MAX_COUNT_PER_PARTITION returns 8 / 2 + 8 / 4 = 6. Thus the
> >> hashtable gets resized if we're going to add the 7th entry to the partition,
> >> i.e. we the number of entries in the partition is lower than the number of
> >> buckets. Is that o.k.?

n balls = the keys being hashed and insert
N bins = the hash table buckets

Unless we know of some special properties of the hash function and
input data, I believe we have to treat the hashes like uniformly
distributed random numbers when making predictions, hence the
balls-into-bins probability stuff that can tell you about the expected
distribution.

The expected number of occupied bins for 1 million balls into 1 million bins:

select 1000000.0 * (1.0 - pow(1.0 - (1.0 / 1000000), 1000000.0));
-> 632120.7427683549057142050000000

The same thing by counting distinct positive random numbers modulo 1 million:

select count(distinct (random() * 200000000)::int % 1000000) from
generate_series(1, 1000000) ss;
-> 632373, 632246, 631954, ...

Distinct hashes of the first 1 million integers (arbitrary keys)
modulo 1 million (buckets):

select count(distinct abs(hashoid(n)) % 1000000) from
generate_series(1, 1000000) ss(n);
-> 632115

(For a moment I was confused about getting a higher number until I
realised I need abs() to avoid doubling the effective number of
buckets...)

> > Well, it may be o.k. I've just checked what the fill factor means in hash
> > index and it's also the number of entries divided by the number of buckets.
> >
>
> Using load factor ~0.75 is definitely the right thing to do. One way to
> interpret it is "average chain length" (or whatever is the approach in
> that particular hash table implementations) and one of the main points
> of hash tables is to eliminate linear scans. We could pick a value
> closer to 1.0, but experience says it's not worth it - it's easy to get
> a annoyingly long chains due to hash collisions, in exchange for fairly
> minimal space savings.

Using the hashes of the first 1 million integers, let's try putting
them into different numbers of buckets, and see how many buckets we
get with each chain length:

WITH entries AS (SELECT abs(hashoid(generate_series(1, 1000000))) %
NBUCKETS AS bucket),
lengths AS (SELECT bucket, COUNT(*) AS chain_length FROM entries
GROUP BY bucket)
SELECT chain_length, COUNT(*) AS count
FROM lengths GROUP BY chain_length ORDER BY 2 DESC;

NBUCKETS = 1000000 (load factor 1.0)

chain_length | count
--------------+--------
1 | 367537
2 | 184580
3 | 61109
4 | 15197
5 | 3079
6 | 509
7 | 95
8 | 7
9 | 2

NBUCKETS = 1333333 (load factor 0.75)

chain_length | count
--------------+--------
1 | 472012
2 | 177174
3 | 44348
4 | 8361
5 | 1243
6 | 143
7 | 9
8 | 2

NBUCKETS = 2000000 (load factor 0.5)

chain_length | count
--------------+--------
1 | 606451
2 | 151741
3 | 25226
4 | 3148
5 | 323
6 | 28
7 | 2

> That being said, I wonder if we should tweak NTUP_PER_BUCKET=1 in hash
> joins to a lower value. IIRC we ended up with 1.0 because originally it
> was set to 10.0, and we reduced it to 1.0 in 9.5 which gave us nice
> speedup. But I don't recall if we tried using even lower value (probably
> not). Maybe we don't need to do that because we only build the hash
> table at the very end, when we exactly know how many entries will it
> contain, so we don't need to do lookups and inserts at the same time,
> and we don't need to grow the hash table (at least in the non-parallel
> case). And we end up with 0.75 load factor on average, due to the
> doubling (the sizes are essentially uniformly distributed between
> 0.5+epsilon and 1.0-epsilon).

Yeah, it would indeed be interesting to experiment with load factors
lower than 1.0. Every link in the chain is a cache miss. In future
we should work on mitigating that by prefetching during both building
and probing (see nearby thread), but I suspect you can only do that
effectively for the bucket header and perhaps the first tuple in the
chain; if I'm right then longer chains will become even more expensive
relative to short ones.

By the way, aside from wasting memory, when you add extra buckets you
make full/right outer hash joins slower because they scan all the
buckets.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Re: Comment fix and question about dshash.c at 2018-10-27 12:22:45 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Kapila	2018-10-28 05:55:01	Re: [RFC] Removing "magic" oids
Previous Message	Narayanan V	2018-10-28 01:10:14	Re: PostgreSQL Limits and lack of documentation about them.