Re: Use simplehash.h instead of dynahash in SMgr

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Rowley <dgrowley(at)gmail(dot)com>
Subject: Re: Use simplehash.h instead of dynahash in SMgr
Date: 2021-06-21 14:15:26
Message-ID: CAApHDvowgRaQupC=L37iZPUzx1z7-N8deD7TxQSm8LR+f4L3-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'd been thinking of this patch again. When testing with simplehash,
I found that the width of the hash bucket type was fairly critical for
getting good performance from simplehash.h. With simplehash.h I
didn't manage to narrow this any more than 16 bytes. I needed to store
the 32-bit hash value and a pointer to the data. On a 64-bit machine,
with padding, that's 16-bytes. I've been thinking about a way to
narrow this down further to just 8 bytes and also solve the stable
pointer problem at the same time...

I've come up with a new hash table implementation that I've called
generichash. It works similarly to simplehash in regards to the
linear probing, only instead of storing the data in the hash bucket,
we just store a uint32 index that indexes off into an array. To keep
the pointers in that array stable, we cannot resize the array as the
table grows. Instead, I just allocate another array of the same size.
Since these arrays are always sized as powers of 2, it's very fast to
index into them using the uint32 index that's stored in the bucket.
Unused buckets just store the special index of 0xFFFFFFFF.

I've also proposed to use this hash table implementation over in [1]
to speed up LockReleaseAll(). The 0001 patch here is just the same as
the patch from [1].

The 0002 patch includes using a generichash hash table for SMgr.

The performance using generichash.h is about the same as the
simplehash.h version of the patch. Although, the test was not done on
the same version of master.

Master (97b713418)
drowley(at)amd3990x:~$ tail -f pg.log | grep "redo done"
CPU: user: 124.85 s, system: 6.83 s, elapsed: 131.74 s
CPU: user: 115.01 s, system: 4.76 s, elapsed: 119.83 s
CPU: user: 122.13 s, system: 6.41 s, elapsed: 128.60 s
CPU: user: 113.85 s, system: 6.11 s, elapsed: 120.02 s
CPU: user: 121.40 s, system: 6.28 s, elapsed: 127.74 s
CPU: user: 113.71 s, system: 5.80 s, elapsed: 119.57 s
CPU: user: 113.96 s, system: 5.90 s, elapsed: 119.92 s
CPU: user: 122.74 s, system: 6.21 s, elapsed: 129.01 s
CPU: user: 122.00 s, system: 6.38 s, elapsed: 128.44 s
CPU: user: 113.06 s, system: 6.14 s, elapsed: 119.25 s
CPU: user: 114.42 s, system: 4.35 s, elapsed: 118.82 s

Median: 120.02 s

master + v1 + v2

drowley(at)amd3990x:~$ tail -n 0 -f pg.log | grep "redo done"
CPU: user: 107.75 s, system: 4.61 s, elapsed: 112.41 s
CPU: user: 108.07 s, system: 4.49 s, elapsed: 112.61 s
CPU: user: 106.89 s, system: 5.55 s, elapsed: 112.49 s
CPU: user: 107.42 s, system: 5.64 s, elapsed: 113.12 s
CPU: user: 106.85 s, system: 4.42 s, elapsed: 111.31 s
CPU: user: 107.36 s, system: 4.76 s, elapsed: 112.16 s
CPU: user: 107.20 s, system: 4.47 s, elapsed: 111.72 s
CPU: user: 106.94 s, system: 5.89 s, elapsed: 112.88 s
CPU: user: 115.32 s, system: 6.12 s, elapsed: 121.49 s
CPU: user: 108.02 s, system: 4.48 s, elapsed: 112.54 s
CPU: user: 106.93 s, system: 4.54 s, elapsed: 111.51 s

Median: 112.49 s

So about a 6.69% speedup

David

[1] https://www.postgresql.org/message-id/CAApHDvoKqWRxw5nnUPZ8+mAJKHPOPxYGoY1gQdh0WeS4+biVhg@mail.gmail.com

Attachment Content-Type Size
v1-0001-Add-a-new-hash-table-type-which-has-stable-pointe.patch application/octet-stream 51.9 KB
v1-0002-Use-generichash.h-hashtables-in-SMgr.patch application/octet-stream 5.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kenneth Marshall 2021-06-21 14:15:41 Re: disfavoring unparameterized nested loops
Previous Message Filip Gospodinov 2021-06-21 13:47:38 Fix pkg-config file for static linking