Re: Division in dynahash.c due to HASH_FFACTOR

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Division in dynahash.c due to HASH_FFACTOR
Date: 2020-09-04 12:04:36
Message-ID: 20200904120436.dh5fvtzvcpaw54hm@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 04, 2020 at 07:01:41AM +0000, Jakub Wartak wrote:
>
>Greetins hackers,
>
>I have mixed feelings if this welcome contribution as the potential
>gain is relatively small in my tests, but still I would like to point
>out that HASH_FFACTOR functionality from dynahash.c could be removed or
>optimized (default fill factor is always 1, there's not a single place
>that uses custom custom fill factor other than DEF_FFACTOR=1 inside
>PostgreSQL repository). Because the functionality is present there
>seems to be division for every buffer access [BufTableLookup()] / or
>every smgropen() call (everything call to hash_search() is affected,
>provided it's not ShmemInitHash/HASH_PARTITION). This division is
>especially visible via perf on single process StartupXLOG WAL recovery
>process on standby in heavy duty 100% CPU conditions , as the top1 is
>inside hash_search:
>
>   0x0000000000888751 <+449>:   idiv   r8
>   0x0000000000888754 <+452>:   cmp    rax,QWORD PTR [r15+0x338] <<-- in perf annotate shows as 30-40%, even on default -O2, probably CPU pipelining for idiv above
>
>I've made a PoC test to skip that division assuming ffactor would be gone:
> if (!IS_PARTITIONED(hctl) && !hashp->frozen &&
>-                       hctl->freeList[0].nentries / (long) (hctl->max_bucket + 1) >= hctl->ffactor &&
>+                       hctl->freeList[0].nentries >= (long) (hctl->max_bucket + 1) &&
>
>For a stream of WAL 3.7GB I'm getting consistent improvement of ~4%, (yes I know it's small, that's why I'm having mixed feelings):
>gcc -O3: 104->100s
>gcc -O2: 108->104s
>pgbench -S -c 16 -j 4 -T 30 -M prepared: stays more or less the same (-s 100), so no positive impact there
>

Hmm, so it's the division that makes the difference? In that case,
wouldn't it make sense to just compute a threshold every time the hash
table is resized, and then do just the comparison. That is, compute

nentries_threshold = (long) (hctl->max_bucket + 1) * hctl->ffactor;

and then do just

hctl->freeList[0].nentries >= nentries_threshold

Of course, this assumes the threshold is calculated only rarely, maybe
that's not the case.

Also, what if you lower the fill factor, e.g. to 0.8? Does that improve
performance or not? I can't find any discussion about this in the
archives, but the dynamic hashing paper [1] seems to suggest it does
make a difference (see the tables at the end, but I haven't re-read the
paper recently so maybe it's irrelevant). Anyway, it'd be foolish to
remove the ffactor parameter to save on division only to lose the
ability to lower the factor and save more than that ...

As for the 4% - it's not much, but it's also not negligible. I'm sure
we've done micro-optimizations for smaller gains. The big question is
whether this is statistically significant, or whether it's just due to
random effects. It could easily be caused by changes in layout of the
binary, for example - that can happen quite easily. See [2] and [3].

The talk actually mentions a tool meant to eliminate this bias by
randomization, but I've never tried using it on PostgreSQL so I'm not
sure how compatible it is :-(

[1] https://www.csd.uoc.gr/~hy460/pdf/Dynamic%20Hash%20Tables.pdf
[2] https://www.cis.upenn.edu/~cis501/previous/fall2016/papers/producing-wrong-data.pdf
[3] https://www.youtube.com/watch?v=r-TLSBdHe1A

>After removing HASH_FFACTOR PostgreSQL still compiles...  Would
>removing it break some external API/extensions ? I saw several
>optimization for the "idiv" where it could be optimized e.g. see
>https://github.com/ridiculousfish/libdivide Or maybe there is some
>other idea to expose bottlenecks of BufTableLookup() ? I also saw
>codepath PinBuffer()->GetPrivateRefCountEntry() -> dynahash that could
>be called pretty often I have no idea what kind of pgbench stresstest
>could be used to demonstrate the gain (or lack of it).
>
>-Jakub Wartak.
>

I don't think whit would break a lot of stuff, but I'm kinda dubious
it's a measurable improvement for common workloads ...

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2020-09-04 12:21:23 Re: Fix for configure error in 9.5/9.6 on macOS 11.0 Big Sur
Previous Message Laurenz Albe 2020-09-04 11:59:47 Re: Change a constraint's index - ALTER TABLE ... ALTER CONSTRAINT ... USING INDEX ...