Division in dynahash.c due to HASH_FFACTOR

From: Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Division in dynahash.c due to HASH_FFACTOR
Date: 2020-09-04 07:01:41
Message-ID: VI1PR0701MB696044FC35013A96FECC7AC8F62D0@VI1PR0701MB6960.eurprd07.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Greetins hackers,

I have mixed feelings if this welcome contribution as the potential gain is relatively small in my tests, but still I would like to point out that HASH_FFACTOR functionality from dynahash.c could be removed or optimized (default fill factor is always 1, there's not a single place that uses custom custom fill factor other than DEF_FFACTOR=1 inside PostgreSQL repository). Because the functionality is present there seems to be division for every buffer access [BufTableLookup()] / or every smgropen() call (everything call to hash_search() is affected, provided it's not ShmemInitHash/HASH_PARTITION). This division is especially visible via perf on single process StartupXLOG WAL recovery process on standby in heavy duty 100% CPU conditions , as the top1 is inside hash_search:
   0x0000000000888751 <+449>:   idiv   r8
   0x0000000000888754 <+452>:   cmp    rax,QWORD PTR [r15+0x338] <<-- in perf annotate shows as 30-40%, even on default -O2, probably CPU pipelining for idiv above

I've made a PoC test to skip that division assuming ffactor would be gone:
if (!IS_PARTITIONED(hctl) && !hashp->frozen &&
-                       hctl->freeList[0].nentries / (long) (hctl->max_bucket + 1) >= hctl->ffactor &&
+                       hctl->freeList[0].nentries >= (long) (hctl->max_bucket + 1) &&

For a stream of WAL 3.7GB I'm getting consistent improvement of ~4%, (yes I know it's small, that's why I'm having mixed feelings):
gcc -O3: 104->100s
gcc -O2: 108->104s
pgbench -S -c 16 -j 4 -T 30 -M prepared: stays more or less the same (-s 100), so no positive impact there

After removing HASH_FFACTOR PostgreSQL still compiles...  Would removing it break some external API/extensions ? I saw several optimization for the "idiv" where it could be optimized e.g. see https://github.com/ridiculousfish/libdivide Or maybe there is some other idea to expose bottlenecks of BufTableLookup() ? I also saw codepath PinBuffer()->GetPrivateRefCountEntry() -> dynahash that could be called pretty often I have no idea what kind of pgbench stresstest could be used to demonstrate the gain (or lack of it).

-Jakub Wartak.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2020-09-04 08:12:51 Re: POC: rational number type (fractions)
Previous Message Craig Ringer 2020-09-04 06:55:01 Re: [PATCH] Detect escape of ErrorContextCallback stack pointers (and from PG_TRY() )