Re: Change GUC hashtable to use simplehash?

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gurjeet Singh <gurjeet(at)singh(dot)im>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Change GUC hashtable to use simplehash?
Date: 2024-01-01 23:55:00
Message-ID: CACJufxG6zR-t3dKoXfb_gqnHLH_CL2Pt3MPOOH2Hwx4t3e-eog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 26, 2023 at 4:01 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> 0001-0003 are same as earlier
> 0004 takes Jeff's idea and adds in an optimization from NetBSD's
> strlen (I said OpenBSD earlier, but it goes back further). I added
> stub code to simulate big-endian when requested at compile time, but a
> later patch removes it. Since it benched well, I made the extra effort
> to generalize it for other callers. After adding to the hash state, it
> returns the length so the caller can pass it to the finalizer.
> 0005 is the benchmark (not for commit) -- I took the parser keyword
> list and added enough padding to make every string aligned when the
> whole thing is copied to an alloc'd area.
>
> Each of the bench_*.sql files named below are just running the
> similarly-named function, all with the same argument, e.g. "select *
> from bench_pgstat_hash_fh(100000);", so not attached.
>
> Strings:
>
> -- strlen + hash_bytes
> pgbench -n -T 20 -f bench_hash_bytes.sql -M prepared | grep latency
> latency average = 1036.732 ms
>
> -- word-at-a-time hashing, with bytewise lookahead
> pgbench -n -T 20 -f bench_cstr_unaligned.sql -M prepared | grep latency
> latency average = 664.632 ms
>
> -- word-at-a-time for both hashing and lookahead (Jeff's aligned
> coding plus a technique from NetBSD strlen)
> pgbench -n -T 20 -f bench_cstr_aligned.sql -M prepared | grep latency
> latency average = 436.701 ms
>
> So, the fully optimized aligned case is worth it if it's convenient.
>
> 0006 adds a byteswap for big-endian so we can reuse little endian
> coding for the lookahead.
>
> 0007 - I also wanted to put numbers to 0003 (pgstat hash). While the
> motivation for that was cleanup, I had a hunch it would shave cycles
> and take up less binary space. It does on both accounts:
>
> -- 3x murmur + hash_combine
> pgbench -n -T 20 -f bench_pgstat_orig.sql -M prepared | grep latency
> latency average = 333.540 ms
>
> -- fasthash32 (simple call, no state setup and final needed for a single value)
> pgbench -n -T 20 -f bench_pgstat_fh.sql -M prepared | grep latency
> latency average = 277.591 ms
>
> 0008 - We can optimize the tail load when it's 4 bytes -- to save
> loads, shifts, and OR's. My compiler can't figure this out for the
> pgstat hash, with its fixed 4-byte tail. It's pretty simple and should
> help other cases:
>
> pgbench -n -T 20 -f bench_pgstat_fh.sql -M prepared | grep latency
> latency average = 226.113 ms

--- /dev/null
+++ b/contrib/bench_hash/bench_hash.c
@@ -0,0 +1,103 @@
+/*-------------------------------------------------------------------------
+ *
+ * bench_hash.c
+ *
+ * Copyright (c) 2023, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/test/modules/bench_hash/bench_hash.c
+ *
+ *-------------------------------------------------------------------------
+ */
you added this module to contrib module (root/contrib), your intention
(i guess) is to add in root/src/test/modules.
later I saw "0005 is the benchmark (not for commit)".

--- /dev/null
+++ b/src/include/common/hashfn_unstable.h
@@ -0,0 +1,213 @@
+/*
+Building blocks for creating fast inlineable hash functions. The
+unstable designation is in contrast to hashfn.h, which cannot break
+compatibility because hashes can be writen to disk and so must produce
+the same hashes between versions.
+
+ *
+ * Portions Copyright (c) 2018-2023, PostgreSQL Global Development Group
+ *
+ * src/include/common/hashfn_unstable.c
+ */
+
here should be "src/include/common/hashfn_unstable.h". typo: `writen`

In pgbench, I use --no-vacuum --time=20 -M prepared
My local computer is slow. but here is the test results:

select * from bench_cstring_hash_aligned(100000); 7318.893 ms
select * from bench_cstring_hash_unaligned(100000); 10383.173 ms
select * from bench_pgstat_hash(100000); 4474.989 ms
select * from bench_pgstat_hash_fh(100000); 9192.245 ms
select * from bench_string_hash(100000); 2048.008 ms

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2024-01-02 00:00:00 Re: pg_column_toast_chunk_id: a function to get a chunk ID of a TOASTed value
Previous Message Dagfinn Ilmari Mannsåker 2024-01-01 23:05:08 Re: Assorted typo fixes