Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

From: John Naylor <jcnaylor(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Date: 2018-12-27 16:19:51
Message-ID: CAJVSVGUUOTd2_Rv46FTcy9Yp6+xMwo0BwdSvPWCaLia_tesQjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/26/18, John Naylor <jcnaylor(at)gmail(dot)com> wrote:
> On 12/26/18, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I wonder if we could do something really simple like a lookup based on
>> the first character of the scan keyword. It looks to me like there are
>> 440 keywords right now, and the most common starting letter is 'c',
>> which is the first letter of 51 keywords. So dispatching based on the
>> first letter clips at least 3 steps off the binary search. I don't
>> know whether that's enough to be worthwhile, but it's probably pretty
>> simple to implement.

> I agree it'd be fairly simple to do, and might raise the bar for doing
> anything more complex.

I went ahead and did this for v4, but split out into a separate patch.
In addition, I used a heuristic to bypass binary search for the most
common keywords. Normally, the middle value is computed
mathematically, but I found that in each range of keywords beginning
with the same letter, there is often 1 or 2 common keywords that are
good first guesses, such as select, from, join, limit, where. I taught
the lookup to try those first, and then compute subsequent steps the
usual way.

Barring additional bikeshedding on 0001, I'll plan on implementing
offset-based lookup for the other keyword types and retire the old
ScanKeyword. Once that's done, we can benchmark and compare with the
optimizations in 0002.

-John Naylor

Attachment Content-Type Size
v4-0001-WIP-Use-offset-based-keyword-lookup.patch text/x-patch 22.8 KB
v4-0002-Dispatch-keyword-lookup-on-the-first-character.patch text/x-patch 6.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hugh Ranalli 2018-12-27 16:21:52 Re: BUG #15548: Unaccent does not remove combining diacritical characters
Previous Message Magnus Hagander 2018-12-27 14:55:58 Re: Offline enabling/disabling of data checksums