Re: Use simplehash.h instead of dynahash in SMgr

From: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use simplehash.h instead of dynahash in SMgr
Date: 2021-04-25 17:03:11
Message-ID: 83ee87f0ba44c117bafc456844abc79e@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley писал 2021-04-25 16:36:
> On Sun, 25 Apr 2021 at 18:48, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
> wrote:
>> If you read comments in SH_START_ITERATE, you'll see:
>>
>> * Search for the first empty element. As deletions during
>> iterations
>> are
>> * supported, we want to start/end at an element that cannot be
>> affected
>> * by elements being shifted.
>>
>> * Iterate backwards, that allows the current element to be deleted,
>> even
>> * if there are backward shifts
>>
>> Therefore, it is safe to delete during iteration, and it doesn't lead
>> nor
>> require loop restart.
>
> I had only skimmed that with a pre-loaded assumption that it wouldn't
> be safe. I didn't do a very good job of reading it as I failed to
> notice the lack of guarantees were about deleting items other than the
> current one. I didn't consider the option of finding a free bucket
> then looping backwards to avoid missing entries that are moved up
> during a delete.
>
> With that, I changed the patch to get rid of the SH_TRUNCATE and got
> rid of the smgrclose_internal which skips the hash delete. The code
> is much more similar to how it was now.
>
> In regards to the hashing stuff. I added a new function to
> pg_bitutils.h to rotate left and I'm using that instead of the other
> expression that was taken from nodeHash.c
>
> For the hash function, I've done some further benchmarking with:
>
> 1) The attached v2 patch
> 2) The attached + plus use_hash_combine.patch.txt which uses
> hash_combine() instead of pg_rotate_left32()ing the hashkey each time.
> 3) The attached v2 with use_hash_bytes.patch.txt applied.
> 4) Master
>
> I've also included the hash stats from each version of the hash
> function.
>
> I hope the numbers help indicate the reason I picked the hash function
> that I did.
>
> 1) v2 patch.
> Median = 113.375 s
>
> LOG: size: 4096, members: 2032, filled: 0.496094, total chain: 997,
> max chain: 5, avg chain: 0.490650, total_collisions: 422,
> max_collisions: 3, avg_collisions: 0.207677
>
> 2) v2 patch + use_hash_combine.patch.txt
> Median = 119.355 s
>
> LOG: size: 4096, members: 2032, filled: 0.496094, total chain: 971,
> max chain: 6, avg chain: 0.477854, total_collisions: 433,
> max_collisions: 4, avg_collisions: 0.213091
>
> 3) v2 patch + use_hash_bytes.patch.txt
> Median = 117.685 s
>
> LOG: size: 4096, members: 2032, filled: 0.496094, total chain: 900,
> max chain: 4, avg chain: 0.442913, total_collisions: 415,
> max_collisions: 4, avg_collisions: 0.204232
>
> 4) master
> Median = 124.49 s
>
> You can see that the bare v2 patch is quite a bit faster than any of
> the alternatives. We'd be better of with hash_bytes than using
> hash_combine() both for performance and for the seemingly better job
> the hash function does at reducing the hash collisions.
>
> David

Looks much better! Simpler is almost always better.

Minor remarks:

Comment for SH_ENTRY_INITIALIZER e. May be like:
- SH_ENTRY_INITIALIZER(a) - if defined, this macro is called for new
entries
before key or hash is stored in. For example, it can be used to make
necessary memory allocations.

`pg_rotate_left32(x, 1) == pg_rotate_right(x, 31)`, therefore there's
no need to add `pg_rotate_left32` at all. More over, for hash combining
there's no much difference between `pg_rotate_left32(x, 1)` and
`pg_rotate_right(x, 1)`. (To be honestly, there is a bit of difference
due to murmur construction, but it should not be very big.)

If your test so sensitive to hash function speed, then I'd suggest
to try something even simpler:

static inline uint32
relfilenodebackend_hash(RelFileNodeBackend *rnode)
{
uint32 h = 0;
#define step(x) h ^= (uint32)(x) * 0x85ebca6b; h = pg_rotate_right(h,
11); h *= 9;
step(rnode->node.relNode);
step(rnode->node.spcNode); // spcNode could be different for same
relNode only
// during table movement. Does it pay
to hash it?
step(rnode->node.dbNode);
step(rnode->backend); // does it matter to hash backend?
// It equals to InvalidBackendId for
non-temporary relations
// and temporary relations in same
database never have same
// relNode (have they?).
return murmurhash32(hashkey);
}

I'd like to see benchmark code. It quite interesting this place became
measurable at all.

regards,
Yura Sokolov.

Attachment Content-Type Size
use_hash_bytes.patch.txt text/plain 2.1 KB
v2-0001-Use-simplehash.h-hashtables-in-SMgr.patch application/octet-stream 10.6 KB
use_hash_combine.patch.txt text/plain 1.0 KB
recovery_panic.patch.txt text/plain 1.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-04-25 17:17:03 Re: compute_query_id and pg_stat_statements
Previous Message Julien Rouhaud 2021-04-25 16:17:42 Re: compute_query_id and pg_stat_statements