Re: Use simplehash.h instead of dynahash in SMgr

From: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use simplehash.h instead of dynahash in SMgr
Date: 2021-04-24 22:27:24
Message-ID: 4b7616c612b9c445a401025f74ed57bf@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley писал 2021-04-24 18:58:
> Hackers,
>
> Last year, when working on making compactify_tuples() go faster for
> 19c60ad69, I did quite a bit of benchmarking of the recovery process.
> The next thing that was slow after compactify_tuples() was the hash
> lookups done in smgropen().
>
> Currently, we use dynahash hash tables to store the SMgrRelation so we
> can perform fast lookups by RelFileNodeBackend. However, I had in mind
> that a simplehash table might perform better. So I tried it...
>
> The attached converts the hash table lookups done in smgr.c to use
> simplehash instead of dynahash.
>
> This does require a few changes in simplehash.h to make it work. The
> reason being is that RelationData.rd_smgr points directly into the
> hash table entries. This works ok for dynahash as that hash table
> implementation does not do any reallocations of existing items or move
> any items around in the table, however, simplehash moves entries
> around all the time, so we can't point any pointers directly at the
> hash entries and expect them to be valid after adding or removing
> anything else from the table.
>
> To work around that, I've just made an additional type that serves as
> the hash entry type that has a pointer to the SMgrRelationData along
> with the hash status and hash value. It's just 16 bytes (or 12 on
> 32-bit machines). I opted to keep the hash key in the
> SMgrRelationData rather than duplicating it as it keeps the SMgrEntry
> struct nice and small. We only need to dereference the SMgrRelation
> pointer when we find an entry with the same hash value. The chances
> are quite good that an entry with the same hash value is the one that
> we want, so any additional dereferences to compare the key are not
> going to happen very often.
>
> I did experiment with putting the hash key in SMgrEntry and found it
> to be quite a bit slower. I also did try to use hash_bytes() but
> found building a hash function that uses murmurhash32 to be quite a
> bit faster.
>
> Benchmarking
> ===========
>
> I did some of that. It made my test case about 10% faster.
>
> The test case was basically inserting 100 million rows one at a time
> into a hash partitioned table with 1000 partitions and 2 int columns
> and a primary key on one of those columns. It was about 12GB of WAL. I
> used a hash partitioned table in the hope to create a fairly
> random-looking SMgr hash table access pattern. Hopefully something
> similar to what might happen in the real world.
>
> Over 10 runs of recovery, master took an average of 124.89 seconds.
> The patched version took 113.59 seconds. About 10% faster.
>
> I bumped shared_buffers up to 10GB, max_wal_size to 20GB and
> checkpoint_timeout to 60 mins.
>
> To make the benchmark more easily to repeat I patched with the
> attached recovery_panic.patch.txt. This just PANICs at the end of
> recovery so that the database shuts down before performing the end of
> recovery checkpoint. Just start the database up again to do another
> run.
>
> I did 10 runs. The end of recovery log message reported:
>
> master (aa271209f)
> CPU: user: 117.89 s, system: 5.70 s, elapsed: 123.65 s
> CPU: user: 117.81 s, system: 5.74 s, elapsed: 123.62 s
> CPU: user: 119.39 s, system: 5.75 s, elapsed: 125.20 s
> CPU: user: 117.98 s, system: 4.39 s, elapsed: 122.41 s
> CPU: user: 117.92 s, system: 4.79 s, elapsed: 122.76 s
> CPU: user: 119.84 s, system: 4.75 s, elapsed: 124.64 s
> CPU: user: 120.60 s, system: 5.82 s, elapsed: 126.49 s
> CPU: user: 118.74 s, system: 5.71 s, elapsed: 124.51 s
> CPU: user: 124.29 s, system: 6.79 s, elapsed: 131.14 s
> CPU: user: 118.73 s, system: 5.67 s, elapsed: 124.47 s
>
> master + v1 patch
> CPU: user: 106.90 s, system: 4.45 s, elapsed: 111.39 s
> CPU: user: 107.31 s, system: 5.98 s, elapsed: 113.35 s
> CPU: user: 107.14 s, system: 5.58 s, elapsed: 112.77 s
> CPU: user: 105.79 s, system: 5.64 s, elapsed: 111.48 s
> CPU: user: 105.78 s, system: 5.80 s, elapsed: 111.63 s
> CPU: user: 113.18 s, system: 6.21 s, elapsed: 119.45 s
> CPU: user: 107.74 s, system: 4.57 s, elapsed: 112.36 s
> CPU: user: 107.42 s, system: 4.62 s, elapsed: 112.09 s
> CPU: user: 106.54 s, system: 4.65 s, elapsed: 111.24 s
> CPU: user: 113.24 s, system: 6.86 s, elapsed: 120.16 s
>
> I wrote this patch a few days ago. I'm only posting it now as I know a
> couple of other people have expressed an interest in working on this.
> I didn't really want any duplicate efforts, so thought I'd better post
> it now before someone else goes and writes a similar patch.
>
> I'll park this here and have another look at it when the PG15 branch
> opens.
>
> David

Hi, David

It is quite interesting result. Simplehash being open-addressing with
linear probing is friendly for cpu cache. I'd recommend to define
SH_FILLFACTOR with value lower than default (0.9). I believe 0.75 is
suitable most for such kind of hash table.

> + /* rotate hashkey left 1 bit at each step */
> + hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0);
> + hashkey ^= murmurhash32((uint32) rnode->node.dbNode);

Why do you use so strange rotation expression? I know compillers are
able
to translage `h = (h << 1) | (h >> 31)` to single rotate instruction.
Do they recognize construction in your code as well?

Your construction looks more like "multiplate-modulo" operation in 32bit
Galois field . It is widely used operation in cryptographic, but it is
used modulo some primitive polynomial, and 0x100000001 is not such
polynomial. 0x1000000c5 is, therefore it should be:

hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 0xc5 : 0);
or
hashkey = (hashkey << 1) | ((uint32)((int32)hashkey >> 31) & 0xc5);

But why don't just use hash_combine(uint32 a, uint32 b) instead (defined
in hashfn.h)? Yep, it could be a bit slower, but is it critical?

> - * smgrclose() -- Close and delete an SMgrRelation object.
> + * smgrclose() -- Close and delete an SMgrRelation object but don't
> + * remove from the SMgrRelationHash table.

I believe `smgrclose_internal()` should be in this comment.

Still I don't believe it worth to separate smgrclose_internal from
smgrclose. Is there measurable performance improvement from this
change? Even if there is, it will be lesser with SH_FILLFACTOR 0.75 .

As well I don't support modification simplehash.h for
SH_ENTRY_INITIALIZER,
SH_ENTRY_CLEANUP and SH_TRUNCATE. The initialization could comfortably
live in smgropen and the cleanup in smgrclose. And then SH_TRUNCATE
doesn't mean much.

Summary:

regards,
Yura Sokolov

Attachment Content-Type Size
recovery_panic.patch.txt text/plain 412 bytes
v1-0001-Use-simplehash.h-hashtables-in-SMgr.patch application/octet-stream 13.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2021-04-25 02:23:26 Re: Use simplehash.h instead of dynahash in SMgr
Previous Message Peter Geoghegan 2021-04-24 20:39:03 Re: decoupling table and index vacuum