Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joerg Sonnenberger <joerg(at)bec(dot)de>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, John Naylor <jcnaylor(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Date: 2019-01-06 19:29:05
Message-ID: 23854.1546802945@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Joerg Sonnenberger <joerg(at)bec(dot)de> writes:
> On Mon, Jan 07, 2019 at 03:11:55AM +1300, David Rowley wrote:
>> What I'm most interested in is how long it took to generate the hash
>> function in hash2.c?

> It's within the noise floor of time(1) on my laptop, e.g. ~1ms.

I decided to do some simple performance measurements to see if we're
actually getting any useful results here. I set up a test case that
just runs raw_parser in a tight loop:

while (count-- > 0)
{
List *parsetree_list;
MemoryContext oldcontext;

oldcontext = MemoryContextSwitchTo(mycontext);
parsetree_list = pg_parse_query(query_string);
MemoryContextSwitchTo(oldcontext);
MemoryContextReset(mycontext);
CHECK_FOR_INTERRUPTS();
}

and exercised it on the contents of information_schema.sql.
I think that's a reasonably representative test case considering
that we're only examining the speed of the flex+bison stages.
(Since it's mostly DDL, including parse analysis might be a bit
unlike normal workloads, but for raw parsing it should be fine.)

On my workstation, in a non-cassert build, HEAD requires about 4700 ms
for 1000 iterations (with maybe 1% cross-run variation).

Applying the v8 patch I posted yesterday, the time drops to ~4520 ms
or about a 4% savings. So not a lot, but it's pretty repeatable,
and it shows that reducing the cache footprint of keyword lookup
is worth something.

I then tried plastering in Joerg's example hash function, as in the
attached delta patch on top of v8. This is *not* a usable patch;
it breaks plpgsql and ecpg, because ScanKeywordLookup no longer
works for non-core keyword sets. But that doesn't matter for the
information_schema test case, and on that I find the runtime drops
to ~3820 ms, or 19% better than HEAD. So this is definitely an
idea worth pursuing.

Some additional thoughts:

* It's too bad that the hash function doesn't have a return convention
that allows distinguishing "couldn't possibly match any keyword" from
"might match keyword 0". I imagine a lot of the zero entries in its
hashtable could be interpreted as the former, so at the cost of perhaps
a couple more if-tests we could skip work at the caller. As this patch
stands, we could only skip the strcmp() so it might not be worth the
trouble --- but if we use Joerg's |0x20 hack then we could hash before
downcasing, allowing us to skip the downcasing step if the word couldn't
possibly be a keyword. Likely that would be worth the trouble.

* We should extend the ScanKeywordList representation to include a
field holding the longest keyword length in the table, which
gen_keywordlist.pl would have no trouble providing. Then we could
skip downcasing and/or hashing for any word longer than that, replacing
the current NAMEDATALEN test, and thereby putting a tight bound on
the cost of downcasing and/or hashing.

* If we do hash first, then we could replace the downcasing loop and
strcmp with an integrated loop that downcases and compares one
character at a time, removing the need for the NAMEDATALEN-sized
buffer variable.

* I think it matters to the speed of the hashing loop that the
magic multipliers are hard-coded. (Examining the assembly code
shows that, at least on my x86_64 hardware, gcc prefers to use
shift-and-add sequences here instead of multiply instructions.)
So we probably can't have inlined hashing code --- I imagine the
hash generator needs the flexibility to pick different values of
those multipliers. I envision making this work by having
gen_keywordlist.pl emit a function definition for the hash step and
include a function pointer to it in ScanKeywordList. That extra
function call will make things fractionally slower than what I have
here, but I don't think it will change the conclusions any. This
approach would also greatly alleviate the concern I had yesterday
about ecpg's c_keywords.c having a second copy of the hashing code;
what it would have is its own generated function, which isn't much
of a problem.

* Given that the generator's runtime is negligible when coded in C,
I suspect that we might able to tolerate the speed hit from translating
it to Perl, and frankly I'd much rather do that than cope with the
consequences of including C code in our build process.

I'm eagerly awaiting seeing Joerg's code, but I think in the
meantime I'm going to go look up NetBSD's nbperf to get an idea
of how painful it might be to do in Perl. (Now, bearing in mind
that I'm not exactly fluent in Perl, there are probably other
people around here who could produce a better translation ...)

regards, tom lane

Attachment Content-Type Size
crude-hack-for-perfect-hash-performance-test.patch text/x-diff 9.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joerg Sonnenberger 2019-01-06 20:11:57 Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Previous Message Mitar 2019-01-06 19:27:36 Re: Adding a concept of TEMPORARY TABLESPACE for the use in temp_tablespaces