|From:||Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>|
|To:||David Rowley <dgrowleyml(at)gmail(dot)com>|
|Cc:||PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>|
|Subject:||Re: Use simplehash.h instead of dynahash in SMgr|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
David Rowley писал 2021-04-24 18:58:
> Last year, when working on making compactify_tuples() go faster for
> 19c60ad69, I did quite a bit of benchmarking of the recovery process.
> The next thing that was slow after compactify_tuples() was the hash
> lookups done in smgropen().
> Currently, we use dynahash hash tables to store the SMgrRelation so we
> can perform fast lookups by RelFileNodeBackend. However, I had in mind
> that a simplehash table might perform better. So I tried it...
> The attached converts the hash table lookups done in smgr.c to use
> simplehash instead of dynahash.
> This does require a few changes in simplehash.h to make it work. The
> reason being is that RelationData.rd_smgr points directly into the
> hash table entries. This works ok for dynahash as that hash table
> implementation does not do any reallocations of existing items or move
> any items around in the table, however, simplehash moves entries
> around all the time, so we can't point any pointers directly at the
> hash entries and expect them to be valid after adding or removing
> anything else from the table.
> To work around that, I've just made an additional type that serves as
> the hash entry type that has a pointer to the SMgrRelationData along
> with the hash status and hash value. It's just 16 bytes (or 12 on
> 32-bit machines). I opted to keep the hash key in the
> SMgrRelationData rather than duplicating it as it keeps the SMgrEntry
> struct nice and small. We only need to dereference the SMgrRelation
> pointer when we find an entry with the same hash value. The chances
> are quite good that an entry with the same hash value is the one that
> we want, so any additional dereferences to compare the key are not
> going to happen very often.
> I did experiment with putting the hash key in SMgrEntry and found it
> to be quite a bit slower. I also did try to use hash_bytes() but
> found building a hash function that uses murmurhash32 to be quite a
> bit faster.
> I did some of that. It made my test case about 10% faster.
> The test case was basically inserting 100 million rows one at a time
> into a hash partitioned table with 1000 partitions and 2 int columns
> and a primary key on one of those columns. It was about 12GB of WAL. I
> used a hash partitioned table in the hope to create a fairly
> random-looking SMgr hash table access pattern. Hopefully something
> similar to what might happen in the real world.
> Over 10 runs of recovery, master took an average of 124.89 seconds.
> The patched version took 113.59 seconds. About 10% faster.
> I bumped shared_buffers up to 10GB, max_wal_size to 20GB and
> checkpoint_timeout to 60 mins.
> To make the benchmark more easily to repeat I patched with the
> attached recovery_panic.patch.txt. This just PANICs at the end of
> recovery so that the database shuts down before performing the end of
> recovery checkpoint. Just start the database up again to do another
> I did 10 runs. The end of recovery log message reported:
> master (aa271209f)
> CPU: user: 117.89 s, system: 5.70 s, elapsed: 123.65 s
> CPU: user: 117.81 s, system: 5.74 s, elapsed: 123.62 s
> CPU: user: 119.39 s, system: 5.75 s, elapsed: 125.20 s
> CPU: user: 117.98 s, system: 4.39 s, elapsed: 122.41 s
> CPU: user: 117.92 s, system: 4.79 s, elapsed: 122.76 s
> CPU: user: 119.84 s, system: 4.75 s, elapsed: 124.64 s
> CPU: user: 120.60 s, system: 5.82 s, elapsed: 126.49 s
> CPU: user: 118.74 s, system: 5.71 s, elapsed: 124.51 s
> CPU: user: 124.29 s, system: 6.79 s, elapsed: 131.14 s
> CPU: user: 118.73 s, system: 5.67 s, elapsed: 124.47 s
> master + v1 patch
> CPU: user: 106.90 s, system: 4.45 s, elapsed: 111.39 s
> CPU: user: 107.31 s, system: 5.98 s, elapsed: 113.35 s
> CPU: user: 107.14 s, system: 5.58 s, elapsed: 112.77 s
> CPU: user: 105.79 s, system: 5.64 s, elapsed: 111.48 s
> CPU: user: 105.78 s, system: 5.80 s, elapsed: 111.63 s
> CPU: user: 113.18 s, system: 6.21 s, elapsed: 119.45 s
> CPU: user: 107.74 s, system: 4.57 s, elapsed: 112.36 s
> CPU: user: 107.42 s, system: 4.62 s, elapsed: 112.09 s
> CPU: user: 106.54 s, system: 4.65 s, elapsed: 111.24 s
> CPU: user: 113.24 s, system: 6.86 s, elapsed: 120.16 s
> I wrote this patch a few days ago. I'm only posting it now as I know a
> couple of other people have expressed an interest in working on this.
> I didn't really want any duplicate efforts, so thought I'd better post
> it now before someone else goes and writes a similar patch.
> I'll park this here and have another look at it when the PG15 branch
It is quite interesting result. Simplehash being open-addressing with
linear probing is friendly for cpu cache. I'd recommend to define
SH_FILLFACTOR with value lower than default (0.9). I believe 0.75 is
suitable most for such kind of hash table.
> + /* rotate hashkey left 1 bit at each step */
> + hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0);
> + hashkey ^= murmurhash32((uint32) rnode->node.dbNode);
Why do you use so strange rotation expression? I know compillers are
to translage `h = (h << 1) | (h >> 31)` to single rotate instruction.
Do they recognize construction in your code as well?
Your construction looks more like "multiplate-modulo" operation in 32bit
Galois field . It is widely used operation in cryptographic, but it is
used modulo some primitive polynomial, and 0x100000001 is not such
polynomial. 0x1000000c5 is, therefore it should be:
hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 0xc5 : 0);
hashkey = (hashkey << 1) | ((uint32)((int32)hashkey >> 31) & 0xc5);
But why don't just use hash_combine(uint32 a, uint32 b) instead (defined
in hashfn.h)? Yep, it could be a bit slower, but is it critical?
> - * smgrclose() -- Close and delete an SMgrRelation object.
> + * smgrclose() -- Close and delete an SMgrRelation object but don't
> + * remove from the SMgrRelationHash table.
I believe `smgrclose_internal()` should be in this comment.
Still I don't believe it worth to separate smgrclose_internal from
smgrclose. Is there measurable performance improvement from this
change? Even if there is, it will be lesser with SH_FILLFACTOR 0.75 .
As well I don't support modification simplehash.h for
SH_ENTRY_CLEANUP and SH_TRUNCATE. The initialization could comfortably
live in smgropen and the cleanup in smgrclose. And then SH_TRUNCATE
doesn't mean much.
|Next Message||David Rowley||2021-04-25 02:23:26||Re: Use simplehash.h instead of dynahash in SMgr|
|Previous Message||Peter Geoghegan||2021-04-24 20:39:03||Re: decoupling table and index vacuum|