Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joel Jacobson <joel(at)trustly(dot)com>
Cc: John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Joerg Sonnenberger <joerg(at)bec(dot)de>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Date: 2019-03-20 20:24:25
Message-ID: 18021.1553113465@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Joel Jacobson <joel(at)trustly(dot)com> writes:
> I've seen a performance trick in other hash functions [1]
> to instead read multiple bytes in each iteration,
> and then handle the remaining bytes after the loop.
> [1] https://github.com/wangyi-fudan/wyhash/blob/master/wyhash.h#L29

I can't get very excited about this, seeing that we're only going to
be hashing short strings. I don't really believe your 30% number
for short strings; and even if I did, there's no evidence that the
hash functions are worth any further optimization in terms of our
overall performance.

Also, as best I can tell, the approach you propose would result
in an endianness dependence, meaning we'd have to have separate
lookup tables for BE and LE machines. That's not a dealbreaker
perhaps, but it is certainly another point on the "it's not worth it"
side of the argument.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2019-03-20 20:30:01 Re: [survey] New "Stable" QueryId based on normalized query text
Previous Message legrand legrand 2019-03-20 20:05:06 Re: [survey] New "Stable" QueryId based on normalized query text