Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash

From: CharSyam <charsyam(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash
Date: 2026-04-11 01:17:07
Message-ID: CAMrLSE5DHxxBxhSMCnimZoMdsUjKagDF4PzgMoFMsZcp_=24cg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for the thoughtful review. I've attached an updated patch (v2)
addressing your feedback.

1. Comment added

I've added a comment in find_in_bucket() explaining the hash pre-check
assumption:

/*
* If the hash values don't match, the keys certainly don't match
* either, so we can skip the expensive key comparison. Matching
* hashes don't guarantee matching keys, so equal_keys() is still
* needed for confirmation.
*/

A short "See comment in find_in_bucket()" reference is also added in
delete_key_from_bucket().

2. No regressions with short keys / cheap comparisons

I ran an additional benchmark with short keys ("1" ~ "10000", 1-5 chars)
where strcmp cost is minimal:

Test | Before | After | Change
-------------------------+-----------+-----------+------------
INSERT 10000 entries | 13.26 ms | 6.44 ms | ~51% faster
LOOKUP 10000 hits | 7.94 ms | 5.03 ms | ~37% faster
LOOKUP 10000 misses | 8.08 ms | 4.80 ms | ~41% faster
LOOKUP 50000 hits (x5) | 33.05 ms | 24.06 ms | ~27% faster

For reference, here are the original string key results ("key_1" ~
"key_10000"):

Test | Before | After | Change
-------------------------+-----------+-----------+------------
INSERT 10000 entries | 14.99 ms | 9.46 ms | ~37% faster
LOOKUP 10000 hits | 10.40 ms | 5.52 ms | ~47% faster
LOOKUP 10000 misses | 8.41 ms | 4.95 ms | ~41% faster
LOOKUP 50000 hits (x5) | 33.48 ms | 26.44 ms | ~21% faster

No regressions in either scenario. Even with short keys, the integer
hash pre-check is always cheaper than the DSM address translation plus
compare function call it avoids.

Both tests ran on macOS (arm64) using the test_dsm_registry extension
with 10,000 entries.

3. Keeping the pre-check outside equal_keys()

I considered moving the hash comparison into equal_keys(), but I think
keeping it at the call site is a better fit for the following reasons:

- equal_keys() is a pure key comparison function. Mixing in hash
comparison would change its semantics and make the name misleading.

- To pass hashes into equal_keys(), we would need to add two hash
parameters (one for the search key, one for the item). The caller
still needs to extract item->hash before the call, so the call site
doesn't actually get simpler.

- There are only two call sites, both already modified in this patch,
so there is no real maintenance burden from having the check at each
call site.

- This pattern is consistent with other hash table implementations in
PostgreSQL (e.g., dynahash.c), which also compare hashes outside
the key equality function.

I'm happy to reconsider if you feel differently.

Regards,
charsyam

2026년 4월 11일 (토) 오전 3:10, Nathan Bossart <nathandbossart(at)gmail(dot)com>님이 작성:

> On Sat, Apr 11, 2026 at 01:09:33AM +0900, CharSyam wrote:
> > I believe this patch is ready for review. I look forward to any feedback
> > and am happy to make revisions.
>
> Sorry, I forgot to ask whether we could move the "pre-check" to within the
> equal_keys() function so that it needn't be added to every one of its
> callers.
>
> --
> nathan
>

Attachment Content-Type Size
0001-Use-cached-hash-to-skip-unnecessary-key-comparisons-v2.patch application/octet-stream 4.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tender Wang 2026-04-11 01:31:59 Re: pg17: XX000: no relation entry for relid 0
Previous Message Haibo Yan 2026-04-11 00:55:29 Re: [PATCH] Fix: Partitioned parent index remains invalid after child indexes are repaired