Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

From: Joel Jacobson <joel(at)trustly(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Joerg Sonnenberger <joerg(at)bec(dot)de>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Date: 2019-03-21 19:46:21
Message-ID: CAASwCXff9pO=bqw2xZha-u5WZ9bRoZ-vSt2_AydQ73zm5LnDtg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 20, 2019 at 9:24 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Joel Jacobson <joel(at)trustly(dot)com> writes:
> > I've seen a performance trick in other hash functions [1]
> > to instead read multiple bytes in each iteration,
> > and then handle the remaining bytes after the loop.
> > [1] https://github.com/wangyi-fudan/wyhash/blob/master/wyhash.h#L29
>
> I can't get very excited about this, seeing that we're only going to
> be hashing short strings. I don't really believe your 30% number
> for short strings; and even if I did, there's no evidence that the
> hash functions are worth any further optimization in terms of our
> overall performance.
>

I went ahead and tested this approach anyway, since I need this algorithm
in a completely different project.

The benchmark below shows stats for three different keywords per length,
compiled with -O2:

$ c++ -O2 -std=c++14 -o bench_perfect_hash bench_perfect_hash.cc
$ ./bench_perfect_hash

keyword length char-a-time (ns) word-a-time (ns) diff (%)
as 2 3.30 2.62 -0.21
at 2 3.54 2.66 -0.25
by 2 3.30 2.59 -0.22
add 3 4.01 3.15 -0.21
all 3 4.04 3.11 -0.23
and 3 3.84 3.11 -0.19
also 4 4.50 3.17 -0.30
both 4 4.49 3.06 -0.32
call 4 4.95 3.42 -0.31
abort 5 6.09 4.02 -0.34
admin 5 5.26 3.65 -0.31
after 5 5.18 3.76 -0.27
access 6 5.97 3.91 -0.34
action 6 5.86 3.89 -0.34
always 6 6.10 3.77 -0.38
analyse 7 6.67 4.64 -0.30
analyze 7 7.09 4.87 -0.31
between 7 7.02 4.66 -0.34
absolute 8 7.49 3.82 -0.49
backward 8 7.13 3.88 -0.46
cascaded 8 7.23 4.17 -0.42
aggregate 9 8.04 4.49 -0.44
assertion 9 7.98 4.52 -0.43
attribute 9 8.03 4.44 -0.45
assignment 10 8.58 4.67 -0.46
asymmetric 10 9.07 4.57 -0.50
checkpoint 10 9.15 4.53 -0.51
constraints 11 9.58 5.14 -0.46
insensitive 11 9.62 5.30 -0.45
publication 11 10.30 5.60 -0.46
concurrently 12 10.36 4.81 -0.54
current_date 12 11.17 5.48 -0.51
current_role 12 11.15 5.10 -0.54
authorization 13 11.87 5.50 -0.54
configuration 13 11.50 5.51 -0.52
xmlattributes 13 11.72 5.66 -0.52
current_schema 14 12.17 5.58 -0.54
localtimestamp 14 11.78 5.46 -0.54
characteristics 15 12.77 5.97 -0.53
current_catalog 15 12.65 5.87 -0.54
current_timestamp 17 14.19 6.12 -0.57

> Also, as best I can tell, the approach you propose would result
> in an endianness dependence, meaning we'd have to have separate
> lookup tables for BE and LE machines. That's not a dealbreaker
> perhaps, but it is certainly another point on the "it's not worth it"
> side of the argument.
>

I can see how the same problem has been worked-around in e.g. pg_crc32.h:

#ifdef WORDS_BIGENDIAN
#define FIN_CRC32C(crc) ((crc) = pg_bswap32(crc) ^ 0xFFFFFFFF)
#else
#define FIN_CRC32C(crc) ((crc) ^= 0xFFFFFFFF)
#endif

So I used the same trick in PerfectHash.pm:

$f .= sprintf "#ifdef WORDS_BIGENDIAN\n";
$f .= sprintf "\t\tc4 = pg_bswap32(c4);\n";
$f .= sprintf "#endif\n";

I've also tried to measure the overall effect by hacking postgres.c:

+ struct timespec start, stop;
+ clock_gettime( CLOCK_REALTIME, &start);
+ for (int i=0; i<100000; i++)
+ {
+ List *parsetree_list2;
+ MemoryContext oldcontext2;
+
+ oldcontext2 = MemoryContextSwitchTo(MessageContext);
+ parsetree_list2 = pg_parse_query(query_string);
+ MemoryContextSwitchTo(oldcontext2);
+// MemoryContextReset(MessageContext);
+ CHECK_FOR_INTERRUPTS();
+ }
+ clock_gettime( CLOCK_REALTIME, &stop);
+ printf("Bench: %f\n", ( stop.tv_sec - start.tv_sec ) + (double)(
stop.tv_nsec - start.tv_nsec ) / 1000000000L );

I measured the time for a big query found here:
https://wiki.postgresql.org/wiki/Index_Maintenance

I might be doing something wrong, but it looks like thee overall effect is
a ~3% improvement.

Attachment Content-Type Size
perfect-hash-read-word-at-a-time.patch application/octet-stream 4.1 KB
bench_perfect_hash.cc application/octet-stream 2.1 KB
pgkeywords.h application/octet-stream 26.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-03-21 19:51:40 Re: [HACKERS] Block level parallel vacuum
Previous Message Robert Haas 2019-03-21 19:43:46 Re: propagating replica identity to partitions