Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: John Naylor <jcnaylor(at)gmail(dot)com>
Cc: Joerg Sonnenberger <joerg(at)bec(dot)de>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Date: 2019-01-04 20:31:11
Message-ID: 23258.1546633871@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

John Naylor <jcnaylor(at)gmail(dot)com> writes:
> On 1/3/19, Joerg Sonnenberger <joerg(at)bec(dot)de> wrote:
>> I was pointed at your patch on IRC and decided to look into adding my
>> own pieces. What I can provide you is a fast perfect hash function
>> generator. I've attached a sample hash function based on the current
>> main keyword list. hash() essentially gives you the number of the only
>> possible match, a final strcmp/memcmp is still necessary to verify that
>> it is an actual keyword though. The |0x20 can be dropped if all cases
>> have pre-lower-cased the input already. This would replace the binary
>> search in the lookup functions. Returning offsets directly would be easy
>> as well. That allows writing a single string where each entry is prefixed
>> with a type mask, the token id, the length of the keyword and the actual
>> keyword text. Does that sound useful to you?

> Judging by previous responses, there is still interest in using
> perfect hash functions, so thanks for this. I'm not knowledgeable
> enough to judge its implementation, so I'll leave that for others.

We haven't actually seen the implementation, so it's hard to judge ;-).

The sample hash function certainly looks great. I'm not terribly on board
with using |0x20 as a substitute for lower-casing, but that's a minor
detail.

The foremost questions in my mind are:

* What's the generator written in? (if the answer's not "Perl", wedging
it into our build processes might be painful)

* What license is it under?

* Does it always suceed in producing a single-level lookup table?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-01-04 20:36:21 Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Previous Message Andres Freund 2019-01-04 20:29:40 Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)